This document describes the YAML rule syntax of Semgrep.Documentation Index
Fetch the complete documentation index at: https://docs.semgrep.dev/llms.txt
Use this file to discover all available pages before exploring further.
Schema
Required
All required fields must be present at the top level of a rule immediately under therules key.
| Field | Type | Description |
|---|---|---|
id | string | Unique, descriptive identifier, for example: no-unused-variable |
message | string | Message that includes why Semgrep matched this pattern and how to remediate it. See also Rule messages. |
severity | string | Severity can be LOW, MEDIUM, HIGH, or CRITICAL. It indicates the criticality of issues detected by a rule. Note: Semgrep Supply Chain uses CVE assignments for severity, while the rule author sets severity for Code and Secrets. The older levels ERROR, WARNING, and INFO match HIGH, MEDIUM, and LOW. Severity values remain backwards compatible. |
languages | array | See language extensions and tags. |
pattern* | string | Find code matching this expression |
patterns* | array | Logical AND of multiple patterns |
pattern-either* | array | Logical OR of multiple patterns |
pattern-regex* | string | Find code matching this PCRE2-compatible pattern in multiline mode |
INFOOnly one of the following keys are required:
pattern, patterns, pattern-either, pattern-regexLanguage extensions and languages key values
The following table includes languages supported by Semgrep, accepted file extensions for test files that accompany the rules, and valid values that Semgrep rules require in thelanguages key.
| Language | Extensions | languages key values |
|---|---|---|
| Apex (only in Semgrep Pro Engine) | .cls | apex |
| Bash | .bash, .sh | bash, sh |
| C | .c, .h | c |
| Cairo | .cairo | cairo |
| Circom | .circom | circom |
| Clojure | .clj, .cljs, .cljc, .edn | clojure |
| C++ | .cc, .cpp, .cxx, .c++, .pcc, .tpp, .C, .h, .hh, .hpp, .hxx, .inl, .ipp | cpp, c++ |
| C# | .cs | csharp, c# |
| Dart | .dart | dart |
| Dockerfile | .dockerfile, .Dockerfile, dockerfile, Dockerfile | dockerfile, docker |
| Elixir (only in Semgrep Pro Engine) | .ex, .exs | ex, elixir |
| Generic | .generic | generic |
| Go | .go | go, golang |
| Gosu (only in Semgrep Pro Engine) | .gs | gosu |
| Hack | .hack, .hck, .hh | hack |
| HTML | .htm, .html | html |
| Java | .java | java |
| JavaScript | .js, .jsx, .cjs, .mjs | js, javascript |
| JSON | .json, .ipynb | json |
| Jsonnet | .jsonnet, .libsonnet | jsonnet |
| JSX | .js, .jsx | js, javascript |
| Julia | .jl | julia |
| Kotlin | .kt, .kts, .ktm | kt, kotlin |
| Lisp | .lisp, .cl, .el | lisp |
| Lua | .lua | lua |
| Move on SUI | .move | move_on_sui |
| Move on Aptos | .move | move_on_aptos |
| OCaml | .ml, .mli | ocaml |
| PHP | .php, .tpl, .phtml | php |
| Prometheus Query Language | .promql | promql |
| Protocol Buffers | .proto | proto, protobuf, proto3 |
| Python | .py, .pyi | python, python2, python3, py |
| QL | .ql, .qll | ql |
| R | .r, .R | r |
| Ruby | .rb | ruby |
| Rust | .rs | rust |
| Scala | .scala | scala |
| Scheme | .scm, .ss | scheme |
| Solidity | .sol | solidity, sol |
| Swift | .swift | swift |
| Terraform | .tf, .hcl, .tfvars | tf, hcl, terraform |
| TypeScript | .ts, .tsx | ts, typescript |
| Vue | .vue | vue |
| XML | .xml, .plist | xml |
| YAML | .yml, .yaml | yaml |
INFOTo see the maturity level of each supported language, see the following references:
Optional
| Field | Type | Description |
|---|---|---|
options | object | Options object to turn on or turn off matching features |
fix | object | Simple search-and-replace capability |
metadata | object | Arbitrary user-provided data; attach data to rules without affecting Semgrep behavior |
min-version | string | Minimum Semgrep version compatible with the rule |
max-version | string | Maximum Semgrep version compatible with the rule |
paths | object | Paths to include or exclude when running the rule |
patterns or pattern-either field.
| Field | Type | Description |
|---|---|---|
pattern-inside | string | Keep findings that lie inside this pattern |
patterns field.
| Field | Type | Description |
|---|---|---|
metavariable-regex | map | Search metavariables for Python re compatible expressions; regex matching is left anchored |
metavariable-pattern | map | Match metavariables with a pattern formula |
metavariable-comparison | map | Compare metavariables against basic Python expressions |
metavariable-name | map | Match metavariables against constraints on what they name |
pattern-not | string | Logical NOT - remove findings matching this expression |
pattern-not-inside | string | Keep findings that do not lie inside this pattern |
pattern-not-regex | string | Filter results using a PCRE2-compatible pattern in multiline mode |
Operators
pattern
The pattern operator looks for code matching its expression. This can be basic expressions like $X == $X or unwanted function calls like hashlib.md5(...).
patterns
The patterns operator performs a logical AND operation on one or more child patterns. This is useful for chaining multiple patterns together where all patterns must be true.
patterns operator evaluation strategy
The order in which the child patterns are declared in a patterns operator does not affect the final result. A patterns operator is always evaluated in the same way:
- Semgrep evaluates all positive patterns, including
pattern-insides,patterns,pattern-regexes, andpattern-eithers. Each range matched by one of these patterns is intersected with the ranges matched by the other operators. The result is a set of positive ranges. The positive ranges carry metavariable bindings. For example, in one range,$Xcan be bound to the function callfoo(), and in another range$Xcan be bound to the expressiona + b. - Semgrep evaluates all negative patterns, including
pattern-not-insides,pattern-nots, andpattern-not-regexes. This provides a set of negative ranges which are used to filter the positive ranges. This results in a strict subset of the positive ranges computed in the previous step. - Semgrep evaluates all conditionals, including
metavariable-regexes,metavariable-patterns, andmetavariable-comparisons. These conditional operators can only examine the metavariables bound in the positive ranges in step 1 and have been filtered through the negative patterns in step 2. Note that metavariables bound by negative patterns are not available here. - Semgrep applies all
focus-metavariables by computing the intersection of each positive range with the range of the metavariable on which you want to focus. Again, the only metavariables available to focus on are those bound by positive patterns.
pattern-either
The pattern-either operator performs a logical OR operation on one or more child patterns. This is useful for chaining multiple patterns together where any may be true.
hashlib.md5 or hashlib.sha1. Depending on their usage, these hashing functions are considered insecure.
pattern-regex
The pattern-regex operator searches files for substrings matching the given Perl-Compatible Regular Expressions (PCRE) pattern. PCRE is a full-featured regular expression (regex) library that is widely compatible with Perl, as well as with the respective regex libraries of Python, JavaScript, Go, Ruby, and Java. This is useful for migrating existing regular expression code search capability to Semgrep. Patterns are compiled in multiline mode. For example, ^ and $ match at the beginning and end of lines, respectively, in addition to the beginning and end of input.
Example: pattern-regex combined with other pattern operators
Example: pattern-regex used as a standalone, top-level operator
INFOSingle (
') and double (") quotes behave differently in YAML syntax. Single quotes are typically preferred when using backslashes (\) with pattern-regex.pattern-not-regex
The pattern-not-regex operator filters results using a PCRE2 regular expression in multiline mode. This is most useful when combined with regular-expression-only rules, providing an easy way to filter findings without having to use negative lookaheads. pattern-not-regex works with regular pattern clauses, too.
The syntax for this operator is the same as pattern-regex.
This operator filters findings that have any overlap with the supplied regular expression. For example, if you use pattern-regex to detect Foo==1.1.1 and it also detects Foo-Bar==3.0.8 and Bar-Foo==3.0.8, you can use pattern-not-regex to filter the unwanted findings.
focus-metavariable
The focus-metavariable operator focuses on, or zooms in on, the code region matched by a single metavariable or a list of metavariables. For example, to find all functions’ arguments annotated with the type bad, you may write the following pattern:
$ARG, use focus-metavariable.
focus-metavariable: $ARG is not the same as pattern: $ARG! Using pattern: $ARG finds all the uses of the parameter x, which is not the desired behavior! (Note that pattern: $ARG does not match the formal parameter declaration, because in this context $ARG only matches expressions.)
focus-metavariable: $X is not a pattern in itself. It does not perform any matching; it only focuses the matching on the code already bound to $X by other patterns. On the other hand, pattern: $X matches $X against your code (and in this context, $X only matches expressions)!
Including multiple focus metavariables using set intersection semantics
Include morefocus-metavariable keys with different metavariables under the pattern to match results only for the overlapping region of all the focused code:
INFOTo make a list of multiple focus metavariables using set union semantics that matches the metavariables regardless of their position in code, see Including multiple focus metavariables using set union semantics documentation.
metavariable-regex
The metavariable-regex operator searches metavariables for a PCRE2 regular expression. This is useful for filtering results based on a metavariable’s value. It requires the metavariable and regex keys and can be combined with other pattern operators.
.* at the beginning of the regex. To match the end of a string, use $. The following example, using the same expression as above but anchored on the right, finds no matches:
INFOInclude quotes in your regular expression when using
metavariable-regex to search string literals. For more details, see include-quotes code snippet.metavariable-pattern
The metavariable-pattern operator matches metavariables with a pattern formula. This is useful for filtering results based on a metavariable’s value. It requires the metavariable key, and precisely one key of pattern, patterns, pattern-either, or pattern-regex. This operator can be nested as well as combined with other operators.
For example, the metavariable-pattern can be used to filter out matches that do not match specific criteria:
INFOIn this case, it is possible to start a
patterns AND operation with a pattern-not, because there is an implicit pattern: ... that matches the content of the metavariable.metavariable-pattern is also helpful in combination with pattern-either:
INFOThe metavariable should be bound to an expression, a statement, or a list of statements, for this test to be meaningful. A metavariable bound to a list of function arguments, a type, or a pattern always evaluates to false.
metavariable-pattern with nested language
If the metavariable’s content is a string, then it is possible to use metavariable-pattern to match this string as code by specifying the target language via the language key. See the following examples of metavariable-pattern:
EXAMPLES OF
METAVARIABLE-PATTERN- Match JavaScript code inside HTML in the following Semgrep Playground example.
- Filter regex matches in the following Semgrep Playground example.
Example: Match JavaScript code inside HTML
Example: Filter regex matches
metavariable-comparison
The metavariable-comparison operator compares metavariables against a basic Python comparison expression. This is useful for filtering results based on a metavariable’s numeric value.
The metavariable-comparison operator is a mapping that requires the metavariable and comparison keys. It can be combined with other pattern operators in the following Semgrep Playground example.
This matches code such as set_port(80) or set_port(443), but not set_port(8080).
Comparison expressions support simple arithmetic as well as composition with Boolean operators to allow for more complex matching. This is particularly useful for checking that metavariables are divisible by particular values, such as enforcing that a specific value is even or odd.
set_port(80), but it no longer matches set_port(443) or set_port(8080).
The comparison key accepts a Python expression using:
- Boolean, string, integer, and float literals.
- Boolean operators
not,or, andand. - Arithmetic operators
+,-,*,/, and%. - Comparison operators
==,!=,<,<=,>, and>=. - Function
int()to convert strings into integers. - Function
str()to convert numbers into strings. - Function
today()that gets today’s date as a float representing epoch time. - Function
strptime()that converts strings in the format"yyyy-mm-dd"to a float representing the date in epoch time. - Lists, together with the
in, andnot ininfix operators. - Strings, together with the
inandnot ininfix operators, for substring containment. - Function
re.match()to match a regular expression (without the optionalflagsargument). - Function
lower()converts strings to lower case. - Function
upper()converts strings to upper case.
$MVAR, which Semgrep evaluates as follows:
- If
$MVARbinds to a literal, then that literal is the value assigned to$MVAR. - If
$MVARbinds to a code variable that is a constant, and constant propagation is enabled (as it is by default), then that constant is the value assigned to$MVAR. - Otherwise, the code bound to the
$MVARis kept unevaluated, and its string representation can be obtained using thestr()function, as instr($MVAR). For example, if$MVARbinds to the code variablex,str($MVAR)evaluates to the string literal"x".
Legacy metavariable-comparison keys
INFOYou can avoid using the legacy keys described below (
base: int and strip: bool) by using the int() function, as in int($ARG) > 0o600 or int($ARG) > 2147483647.metavariable-comparison operator also takes optional base: int and strip: bool keys. These keys set the integer base the metavariable value should be interpreted as and remove quotes from the metavariable value, respectively.
0700, but it does not detect 0400.
', ", and `) from both ends of the metavariable content. As a result, Semgrep detects "2147483648", but it does not detect "2147483646". This is useful when you expect strings to contain integer or float data.
metavariable-name
The metavariable-name operator adds a constraint to the types of identifiers a metavariable can match. Currently, the only constraint supported is on the module or namespace from which an identifier originates. This is useful for filtering results in languages that don’t have a native syntax for fully qualified names, or languages where module names may contain characters that are not legal in identifiers, such as JavaScript or TypeScript.
modules key, which takes a list of module names.
pattern-not
The pattern-not operator is the opposite of the pattern operator. It finds code that does not match its expression. This is useful for eliminating common false positives.
pattern-not accepts a patterns or pattern-either property and negates everything inside the property.
pattern-inside
The pattern-inside operator keeps matched findings that reside within its expression. This is useful for finding code within other pieces of code, such as functions or if blocks.
pattern-not-inside
The pattern-not-inside operator keeps matched findings that do not reside within its expression. It is the opposite of pattern-inside. This is useful for finding code that’s missing a corresponding cleanup action like disconnect, close, or shutdown. It’s also helpful in finding problematic code that isn’t inside code that mitigates the issue.
open(...) pattern and not a following close() pattern.
The $F metavariable ensures that the same variable name is used in the open and close calls. The ellipsis operator allows any arguments to be passed to open and any sequence of code statements to be executed between the open and close calls. The rule ignores how open is called or what happens up to a close call; it only needs to make sure close is called.
Metavariable matches
matching operates differently for logical AND (patterns) and logical OR (pattern-either) parent operators. Behavior is consistent across all child operators: pattern, pattern-not, pattern-regex, pattern-inside, pattern-not-inside.
Metavariables in logical ANDs
values must be identical across sub-patterns when performing logical AND operations with thepatterns operator.
Example:
Metavariables in logical ORs
matching does not affect the matching of logical OR operations with thepattern-either operator.
Example:
Metavariables in complex logic
matching still affects subsequent logical ORs if the parent is a logical AND. Example:options
Enable, disable, or modify the following matching features:
| Option | Default | Description |
|---|---|---|
ac_matching | true | Matching modulo associativity and commutativity, treat Boolean AND/OR as associative, and bitwise AND/OR/XOR as both associative and commutative. |
attr_expr | true | Expression patterns (for example: f($X)) matches attributes (for example: @f(a)). |
commutative_boolop | false | Treat Boolean AND/OR as commutative even if not semantically accurate. |
constant_propagation | true | Constant propagation, including intraprocedural flow-sensitive constant propagation. |
decorators_order_matters | false | Match non-keyword attributes (for example: decorators in Python) in order, instead of the order-agnostic default. Keyword attributes (for example: static, inline, etc) are not affected. |
generic_comment_style | none | In generic mode, assume that comments follow the specified syntax. They are then ignored for matching purposes. Allowed values for comment styles are:
|
generic_ellipsis_max_span | 10 | In generic mode, this is the maximum number of newlines that an ellipsis operator ... can match, or equivalently, the maximum number of lines covered by the match minus one. The default value is 10 (newlines) for performance reasons. Increase it with caution. Note that the same effect as 20 can be achieved without changing this setting and by writing ... ... in the pattern instead of .... Setting it to 0 is useful with line-oriented languages (for example, INI or key-value pairs in general) to prevent a match from extending to the next line of code. Available since Semgrep 0.96. For more information about generic mode, see Generic pattern matching documentation. |
implicit_return | true | Return statement patterns (for example return $E) match expressions that may be evaluated last in a function as if there was a return keyword in front of those expressions. Only applies to certain expression-based languages, such as Ruby and Julia. |
interfile | false | Set this value to true for Semgrep to run this rule with cross-function and cross-file analysis. It is required for rules that use cross-function, cross-file analysis. |
symmetric_eq | false | Treat equal operations as symmetric (for example: a == b is equal to b == a). |
taint_assume_safe_functions | false | Experimental option which are be subject to future changes. Used in taint analysis. Assume that function calls do not propagate taint from their arguments to their output. Otherwise, Semgrep always assumes that functions may propagate taint. Can replace not-conflicting sanitizers added in v0.69.0 in the future. |
taint_assume_safe_indexes | false | Used in taint analysis. Assume that an array-access expression is safe even if the index expression is tainted. Otherwise, Semgrep assumes that, for example, a[i] is tainted if i is tainted, even if a` is not. Enabling this option is recommended for high-signal rules, whereas disabling it is preferred for audit rules. Currently, it is disabled by default to maintain backward compatibility, but this may change in the near future after further evaluation. |
vardef_assign | true | Assignment patterns (for example $X = $E) match variable declarations (for example var x = 1;). |
xml_attrs_implicit_ellipsis | true | Any XML/JSX/HTML element patterns have implicit ellipsis for attributes (for example: <div /> matches <div foo="1">. |
fix
The fix top-level key allows simple pattern fixes by suggesting an alternative for each match. Run semgrep with --autofix to apply the changes to the files.
Example:
fix and --autofix see Rule-defined fix documentation.
metadata
Provide additional information for a rule with the metadata: key, such as a related CWE, likelihood, or OWASP.
Example:
--json.
Rules with category: security have additional metadata requirements. See Including fields required by security category for more information.
min-version and max-version
Each rule supports optional fields min-version and max-version specifying
minimum and maximum Semgrep versions. If the Semgrep
version being used doesn’t satisfy these constraints,
the rule is skipped without causing a fatal error.
Example rule:
min-version and max-version to ensure that either the older or the
newer rule is used, but not both. The rules would look like this:
min-version/max-version feature has been available since Semgrep 1.38.0. It is intended primarily for publishing rules that rely on
newly released features without causing errors in older Semgrep
installations.
category
Provide a category for users of the rule. For example: best-practice, correctness, maintainability. For more information, see Semgrep Registry rule requirements.
paths
Exclude a rule in paths
To ignore a specific rule on specific files, set thepaths: key with
one or more filters. The patterns apply to the full file paths
relative to the project root.
Example:
semgrep -f rule.yaml project/, the preceding rule runs on files inside project/, but no results are returned for:
- any file with a
.jinja2file extension - any file whose name ends in
_test.go, such asproject/backend/server_test.go - any file inside
project/testsor its subdirectories - any file matching the
project/static/*.jsglob pattern
NOTEThe glob syntax is from Python’s
wcmatch and is used to match against the given file and all its parent directories.Limit a rule to paths
Conversely, to run a rule only on specific files, set apaths: key with one or more of these filters:
semgrep -f rule.yaml project/, this rule runs on files inside project/, but results are returned only for:
- files whose name ends in
_test.go, such asproject/backend/server_test.go - files inside
project/server,project/schemata, or their subdirectories - files matching the
project/static/*.jsglob pattern - all files with the
.jsextension, arbitrary depth inside the tests folder
NOTEWhen mixing inclusion and exclusion filters, the exclusion ones take precedence.
project/schemata/scan.py but not from project/schemata/scan_internal.py.
Additional examples
This section contains more complex rules that perform advanced code searching.Complete useless comparison
pattern-either, patterns, pattern, and pattern-inside to carefully consider different cases, and employs pattern-not-inside and pattern-not to exclude specific unnecessary comparisons.