This document provides an overview of Semgrep’s proprietary cross-file (interfile) and cross-function (intrafile) taint analysis features through specific examples, such as its use in type inferences, class inheritance, constant propagation, and taint analysis. Differences between Semgrep and Semgrep Community Edition (CE) can be observed by viewing the examples in separate Playground tabs. See the following section for more information. Note that Semgrep cross-file analysis implies cross-function analysis as well.Documentation Index
Fetch the complete documentation index at: https://docs.semgrep.dev/llms.txt
Use this file to discover all available pages before exploring further.
Tips and tricks for an interactive experience
The following resources can help you test the code in the sections below. As you work through the examples in this document, try the following:- Ensure that the Pro toggle is enabled on the Playground page.
- Rules that make use of interfile analysis require the
interfile: truekey included under theoptionskey. - On the Playground, true positives that are detected by Semgrep’s cross-file analysis are marked with a purple star: .
- Rules that make use of interfile analysis require the
- Clone the Semgrep cross-file analysis testing repository:
- Follow the instructions in the subsequent sections of this document using this testing repository. To run Semgrep in the cloned testing repository with cross-file (interfile) analysis, enter:
- Follow the instructions in the subsequent sections of this document using this testing repository. To run Semgrep in the cloned testing repository with cross-file (interfile) analysis, enter:
Taint tracking
Semgrep CE allows you to search for the flow of any potentially exploitable input into an important sink using taint mode. For more information, see the taint mode documentation. In the examples below, see a comparison of Semgrep and Semgrep CE while searching for dangerous calls using data obtainedget_user_input call. The rule does this by specifying the source of taint as get_user_input(...) and the sink as dangerous(...);.
Java
Semgrep matchesdangerous(“Select * from “ + user_input), because user_input is obtained by calling get_user_input. However, on Semgrep CE, it does not match the similar call using still_user_input, because its analysis does not cross function boundaries to know that still_user_input is a wrapper function for user_input.
get_user_input over multiple jumps in multiple files.
JavaScript and TypeScript
Here, Semgrep CE matchesdangerous(“Select * from “ + user_input), because user_input is obtained by calling get_user_input. However, Semgrep CE does not match the similar call using still_user_input, because its analysis does not cross function boundaries to know that still_user_input is a wrapper function for user_input.
get_user_input over multiple jumps in multiple files.
You can run JavaScript examples in your cloned Semgrep testing repository by going to docs/taint_tracking/javascript and running the following command:
ES6 and CommonJS
The JavaScript and TypeScript ecosystems contain various ways for importing and exporting code. Semgrep can track dataflow through ES6 imports or exports and some CommonJS export paths. See Known limitations of cross file analysis.ES6
Semgrep can track data through the definition of exports for ES6:CommonJS
Semgrep can track data through the definition of exports for CommonJS when the function is defined inline:docs/taint_tracking/imports and running the following command:
Type inference and class inheritance
Class inheritance
This section compares the possible findings of a scan across multiple files using Semgrep CE and Semgrep. The fileapp.java includes two check functions that throw exceptions. This example looks for methods that throw a particular exception, ExampleException.
ExampleException but not BadRequest. Check other files in the docs/class_inheritance directory. In the context of all files, you can find that this match does not capture the whole picture. The BadRequest extends ExampleException:
File example_exception.java:
bad_request.java:
ExampleException is thrown, it is also good to find BadRequest, because BadRequest is a child of ExampleException. Unlike Semgrep CE, Semgrep can find BadRequest. Since Semgrep uses information from all the files in the directory it scans, it detects BadRequest and finds both thrown exceptions.
If you are following along with the cloned Semgrep testing repository, in the docs/class_inheritance directory, try the following commands to test the difference:
- Run Semgrep CE:
- Run Semgrep:
Using class inheritance with typed metavariables
Semgrep uses cross-file class inheritance information when matching typed metavariables. Continuing the example from the previous section, see the following example file, which has defined some exceptions and includes their logging:ExampleException being logged. Semgrep CE is not able to find instances of BadRequest being logged, unlike Semgrep. Allowing typed metavariables to access information from the entire program enables users to query any variable for its type and use that information in conjunction with the rest of the code resulting in more accurate findings.
NOTEFor a more realistic example where typed metavariables are used, see the following rule written by the Semgrep community to find code vulnerable to the log4j vulnerability.
docs/class_inheritance_with_typed_metavariables and entering the following command:
Constant propagation
Finding dangerous calls
Constant propagation provides a syntax for eliminating false positives in Semgrep rules. Even if a variable is set to a constant before being used in a function call several lines below, Semgrep knows that it must have that value and matches the function call. For example, this rule looks for non-constant values passed to thedangerous function:
Java
user_input or EMPLOYEE_TABLE_NAME.
Now consider an example a bit more complicated to illustrate what Semgrep can do. If the EMPLOYEE_TABLE_NAME is imported from a global constants file with the following content:
Global constants file:
docs/constant_propagation_dangerous_calls and run the following command:
JavaScript and TypeScript
user_input or EMPLOYEE_TABLE_NAME.
Now consider an example a bit more complicated to illustrate what Semgrep can do. If the EMPLOYEE_TABLE_NAME is imported from a global constants file with the following content:
Global constants file:
docs/constant_propagation_dangerous_calls and run the following command:
Propagating values
In the previous example, it only mattered whether the string was constant or not, so the example used”...”, but constant propagation also propagates the constant value. To illustrate the use of Semgrep with constant propagation, the rule from the previous section is changed to search for calls to dangerous("Employees");.
Java
dangerous, since these calls are selected from the Employees table, though each one obtains the table name differently.
To test this in the cloned Semgrep testing repository, go to docs/constant_propagation_propagating_values and run the following command:
JavaScript and TypeScript
dangerous, since these calls are selected from the Employees table, though each one obtains the table name differently.
To test this in the cloned Semgrep testing repository, go to docs/constant_propagation_propagating_values and run the following command: