Semgrep supports taint analysis, also known as taint tracking, through taint rules. Taint rules are specified by the inclusion ofDocumentation Index
Fetch the complete documentation index at: https://docs.semgrep.dev/llms.txt
Use this file to discover all available pages before exploring further.
mode: taint in your rule.
is a dataflow analysis that tracks the flow of untrusted, or tainted, data throughout the body of a function or method. Tainted data originates from tainted sources. If tainted data is not transformed or checked accordingly, or sanitized, taint analysis reports a finding whenever tainted data reaches a vulnerable function, called a sink. Tainted data flows from sources to sinks through propagators, such as assignments and function calls.
Create a rule
To create a taint tracking rule, includemode: taint in the rule’s YAML definition file. This enables the following operators:
| Operator | Required? |
|---|---|
pattern-sources | Yes |
pattern-propagators | No |
pattern-sanitizers | No |
pattern-sinks | No |
pattern-either operators, take a list of patterns that specify what is considered a source, a propagator, a sanitizer, or a sink.
You can use any pattern operator and you have the same expressive power as you would with a mode: search rule.
Sample rule and pattern matching
get_user_input(), which is the source of tainted data. You can think of what’s happening as Semgrep running the pattern get_user_input(...) on your code, identifying all instances where get_user_input is called, and labeling them as tainted.
The rule specifies the sanitizer sanitize_input(...), so any expression that matches that pattern is considered sanitized. In particular, the expression sanitize_input(data) is labeled as sanitized. Even if data is tainted, as it occurs inside a piece of sanitized code, it does not produce any findings.
Finally, the rule specifies that anything matching either html_output(...) or eval(...) should be regarded as a sink. There are two calls to html_output(data) that are both labeled as sinks. The first one in route1 is not reported because data is sanitized before reaching the sink, whereas the second one in route2 is reported because the data that reaches the sink is still tainted.
Find more examples of taint rules in the Semgrep Registry, including express-sandbox-code-injection.
Sources
You can specify a taint source using a pattern. Like a search-mode rule, you can start this pattern with one of the following keys:patternpatternspattern-eitherpattern-regex
| Option | Type | Default | Description |
|---|---|---|---|
exact | {false, true} | false | See Exact sources. |
by-side-effect | {false, true, only} | false | See Taint sources by side-effect. |
control (Pro) 🧪 | {false, true} | false | See Track control sources. |
Exact sources
Given the subsequent source specification and a piece of code, such assource(sink(x)), the call sink(x) is reported as a tainted sink.
source(...) matches all of source(sink(x)), and that makes Semgrep consider every subexpression in that piece of code as being a source. In particular, x is a source, and it is being passed into sink.
exact: true:
sink(x) inside source(sink(x)) isn’t reported as a tainted sink, unless x is tainted in another way.
If one of your rules relies on non-exact matching of sources, make this fact explicit with exact: false, even if it is the current default, so that your rule doesn’t break if you change the default.
Sanitizers
You can specify a taint sanitizer using a pattern. Like a search-mode rule, you can start the pattern with any of the following keys:patternpatternspattern-eitherpattern-regex
| Option | Type | Default | Description |
|---|---|---|---|
exact | {false, true} | false | See Exact sanitizers. |
by-side-effect | {false, true, only} | false | See Taint sanitizers by side-effect. |
Exact sanitizers
Given the sanitizer specification that follows and a piece of code, such assanitize(sink("taint")), Semgrep doesn’t report the call sink("taint").
sanitize(...) matches all of sanitize(sink("taint")), and that makes Semgrep consider every subexpression in that piece of code as sanitized. In particular, "taint" is considered sanitized.
exact: true:
sink("taint") inside sanitize(sink("taint")) is reported as a tainted sink.
NOTEIf any of your rules rely on non-exact matches, make this explicit by setting
exact: false in your rule definition, even if this is the default setting. This ensures that your rule doesn’t break if the default changes.Sinks
You can specify a taint sink using a pattern. Like a search-mode rule, you can start this pattern with one of the following keys:patternpatternspattern-eitherpattern-regex
| Option | Type | Default | Description |
|---|---|---|---|
exact | {false, true} | true | See Non-exact sinks. |
at-exit (Pro) 🧪 | {false, true} | false | See Restrict taint to at-exit sinks. |
Non-exact sinks
Given the following sink specification and a piece of code, such assink("foo" if tainted else "bar"), Semgrep doesn’t report the code as a tainted sink.
sink as the sink itself. In this case, the argument is "foo" if tainted else "bar", which evaluates to either "foo" or "bar". Since neither value is tainted, Semgrep does not flag the call.
exact: false:
tainted inside sink("foo" if tainted else "bar") is now reported as a tainted sink.
Findings
Taint findings are accompanied by a taint trace that explains how the taint flows from source to sink.Deduplication of findings
Semgrep tracks all possible ways that taint can reach a sink, but it only reports one taint trace, not all the possible options. You can use the following example to visualize this behavior:
Note that, even though
sink can be tainted via x or via y, the trace will only show you one of these possibilities. If you replace x = user_input with x = "safe", then Semgrep reports the taint trace via y.
Propagators 🧪
NOTECustom taint propagators is a Semgrep Pro feature.
patternpatternspattern-eitherpattern-regex
from) and the destination (to) of the taint to be propagated.
| Field | Type | Description |
|---|---|---|
from | metavariable | of propagation |
to | metavariable | Destination of propagation |
| Option | Type | Default | Description |
|---|---|---|---|
by-side-effect | {false, true} | true | See Propagate without side-effect. |
strcpy, its first argument gets the same taint:
INFOTaint propagators only work intraprocedurally, that is, within a function or method. You cannot use taint propagators to propagate taint across different functions/methods. For that, use interprocedural analysis.
Interprocedural analysis 🧪
INFOInterprocedural taint analysis is a Semgrep Pro feature.
user_input is passed to foo as input, and from there, flows to the sink at line 3 through a call chain involving three functions. Semgrep can track this flow and report the sink as tainted. Semgrep also provides an interprocedural taint trace that explains how exactly user_input reaches the sink(z) statement. To see this, click Open in Playground, then find the Matches panel and click dataflow.
--pro-intrafile when invoking Semgrep, Semgrep performs interprocedural (across functions), intra-file (within one file) analysis. In other words, Semgrep tracks taint across functions, but it will not cross file boundaries. This is supported for essentially every language, and performance is very close to that of intraprocedural taint analysis.
Using the CLI option --pro, Semgrep will perform interprocedural (across functions) as well as inter-file (across files) analysis. Inter-file analysis is only supported for a subset of languages. For a rule to run interfile, it also needs to set interfile: true: