class documentation
Parses GBD query strings and compiles them to SQL WHERE fragments.
The GBD query language is a small filter DSL:
- Boolean logic: and, or, not
- Comparisons: =, !=, <, >, <=, >=
- Pattern matching: like / unlike with optional leading/trailing % wildcard
- Feature references: feature, context:feature, or database:feature
- Right-hand side: unquoted or single/double-quoted strings, integers/floats, or parenthesised arithmetic terms (+, -, *, /)
1:1 vs 1:n feature translation (see get_sql):
- 1:1 features (FeatureInfo.default != None) are stored as columns of the central features table; comparisons are emitted inline.
- 1:n features (FeatureInfo.default is None) are stored in a separate {name}(hash, value) table; equality / inequality / like comparisons are wrapped in a subquery so they act as set-membership tests (IN / NOT IN). Numeric inequality and arithmetic-term comparisons on 1:n features fall through to an inline comparison, meaning "at least one value satisfies the condition". See Issues.md #2 and #3.
String values are interpolated directly into the SQL string without escaping. See Issues.md #1; SQL injection risk.
| Method | __init__ |
Parse query into an internal AST. |
| Method | get |
Return the set of feature names referenced anywhere in the query. |
| Method | get |
Recursively compile the parsed AST into a SQL WHERE fragment. |
| Constant | GRAMMAR |
Undocumented |
| Class Variable | model |
Undocumented |
| Instance Variable | ast |
Undocumented |
Parse query into an internal AST.
| Parameters | |
query:str|None | GBD query string, e.g. "filename like foo% and vars > 100". Pass None or an empty string for an unconditional (match-all) query. |
verbose:bool | If True, print the raw query and its JSON AST to stdout. |
| Raises | |
ParserException | If query is syntactically invalid. |
Return the set of feature names referenced anywhere in the query.
Walks the AST recursively and collects every col leaf. The returned names
are used by GBDQuery to verify that all referenced
features exist before executing a query.
| Parameters | |
ast:dict|None | Sub-tree to walk; defaults to the root AST. |
| Returns | |
set[str] | Feature names, possibly qualified as "context:feature" or "database:feature" if the query uses such prefixes. |
| Raises | |
ParserException | On unexpected AST structure. |
Recursively compile the parsed AST into a SQL WHERE fragment.
Column addresses are fully qualified as database.table.column via
Database.faddr. The translation is cardinality-aware:
- 1:1, string col = 'v' -> db.features.col = 'v'
- 1:n, string col = 'v' -> db.col.hash IN (SELECT db.col.hash FROM db.col WHERE db.col.value = 'v')
- 1:n, string col != 'v' -> db.col.hash NOT IN (SELECT … WHERE db.col.value = 'v')
- 1:n, like col like foo% -> db.col.hash IN (SELECT … WHERE db.col.value like 'foo%')
- 1:n, numeric col > 5 -> CAST(db.col.value AS FLOAT) > 5 (any-row semantics - see Issues.md #2)
- 1:n, term col != (expr) -> db.col.hash NOT IN (SELECT … WHERE CAST(db.col.value AS FLOAT) = expr)
- 1:n, term col op (expr) (other ops) -> CAST(db.col.value AS FLOAT) op expr (any-row semantics - see Issues.md #3)
| Parameters | |
db:Database | Used to resolve feature addresses and determine cardinality. |
ast:dict|None | Sub-tree to compile; defaults to the root AST. |
| Returns | |
str | SQL expression fragment suitable for embedding in a WHERE clause. |
| Raises | |
ParserException | If the AST is malformed or a feature cannot be resolved. |