Additionally, it provides several simple rule examples to illustrate the concepts and how you can make use of these Semgrep features when writing your own rules.Documentation Index
Fetch the complete documentation index at: https://docs.semgrep.dev/llms.txt
Use this file to discover all available pages before exploring further.
Language features that prevent injection through Boolean and integer types
Strong typing in Java, combined with its compile-time and runtime checks, reduces the likelihood that an integer or Boolean input will be exploited to perform injection-style attacks. Semgrep Pro can reduce false positives by leveraging these checks. Semgrep Community Edition (CE) matches based on patterns, which can result in false positives (FPs), but only proprietary Semgrep can detect Boolean and integer values and mark these as untainted, or safe, eliminating FPs.Example: int-bool-untainted
The following demo rule and code sample detects tainted data in sink().
int-bool-untainted. Open in interactive Playground.
- This example has two true positives: line 22 and line 28.
- Semgrep Pro is able to detect that line 24 and 30 are false positives. Semgrep CE can’t catch that distinction.
- Line 24 is a false positive because the data in the sink is an element of an integer list.
- Line 30 is a false positive because the data in the sink is an element in a set of Boolean values.
- The Semgrep rule uses the fields
taint_assume_safe_booleansandtaint_assume_safe_numbersto tell the engine that these types are safe and not tainted.
Semgrep understands the Java standard library and APIs
Java provides a wide array of standard classes and methods across its various libraries. These facilitate programming by offering ready-to-use methods for common tasks. Many of these take string inputs, and return integer or Boolean values. Thus, these statements returning integer or Boolean values are not considered tainted. Semgrep is able to make that distinction, preventing this type of false positive.Example: sqli-demo-bool_doesnt_taint
This demo rule detects SQL injection through a UserInputGenerator class. The class’s unsanitized user input is passed to SQLQueryRunner.
sqli-demo-bool_doesnt_taint. Open in interactive Playground.
- This example has two true positives: line 11 and line 20.
- Semgrep Pro is able to detect that line 14 and 17 are false positives. Semgrep CE can’t catch that distinction.
- Lines 14 and 17 are false positives because
input.endsWith("something")andinput.indexOf('u')return a Boolean and integer respectively. Semgrep Pro is able to understandendsWithandindexOfJava methods.
- Lines 14 and 17 are false positives because
- The Semgrep rule uses the fields
taint_assume_safe_booleansandtaint_assume_safe_numbersto tell the engine that these types are safe and not tainted.
Semgrep targets code in a parent class and its subclasses
Semgrep supports class inheritance in Java. You can use Semgrep to search across all subclasses. This specificity means that rules can better target your codebase, increasing true positive rates. This is achieved through themetavariable-type field, which can accept the name of any user-defined class.
The metavariable-type field is available in Semgrep CE. However, classes in Java are frequently defined across files (interfile), which is beyond the scope of Semgrep CE’s analysis. Use Semgrep Pro to perform cross-file analysis to ensure that Semgrep can detect all class and subclass definitions.
Example: detect-pattern-in-subclass
showLineNumbers
detect-pattern-in-subclass. Open in interactive Playground.
This demo rule detects patterns in instances of the user-defined parent class Foo and its subclasses.
- This example has two true positives: line 10 and line 24.
- The
patternsarray initially defines apattern: $CLASS.x.- Line 17,
baz.xfulfills this pattern. - However, the
metavariable-typespecifies atypeofFoo. - This specification narrows the match to line 10 because
Baris a subclass ofFoo, and line 25, which is an instance of theFooobject itself.
- Line 17,
Semgrep supports field and index sensitivity
Field sensitivity means that Semgrep can track taint for each field of an object independently. Given an objectC with properties C.x and C.y, if C.x is tainted, then Semgrep does not automatically mark C.y as tainted.
Similarly, index sensitivity means that Semgrep can track taint for each element of an array independently.
Example: unsafe-sql-concatenation-in-method-taint-field-sensitivity
This demo rule detects that C.x is tainted by way of the injection variable. It is able to differentiate C.y as untainted.
unsafe-sql-concatenation-in-method-taint-field-sensitivity. Open in interactive Playground.
- This example has one true positive on line 24 and one true negative on line 27.
- Line 17 of the rule tells Semgrep to match for the following pattern:
- This matches
private void LoggerTruePositives(String injection), specifically theinjectionvariable in the sample code.
- This matches
- The value of the injection variable is passed to
C.x, thus,C.xis tainted, butC.yis not.