Represents the structure and SQLite connection of one GBD database (or CSV file).
Responsible for:
- Detecting whether a path is an SQLite database or a CSV file.
- Creating or introspecting the features table and any 1:n feature tables.
- Executing DDL and DML within the scope of a single database file.
SQLite database layout:
features(hash UNIQUE NOT NULL, feat1 TEXT DEFAULT 'v', feat2 TEXT DEFAULT 'w', ...) <feat_1n>(hash TEXT NOT NULL, value TEXT NOT NULL, UNIQUE(hash, value))
The features column for a 1:n feature mirrors the hash value so the separate table is joinable without an explicit FK constraint. An INSERT trigger keeps it in sync. A sentinel row (hash='None', value='None') is present in every 1:n table (see Issues.md #7).
Context detection
The context is inferred from the database name prefix, e.g. cnf_sc2021 -> context cnf. There is no context metadata stored inside the file itself.
| Class Method | context |
Undocumented |
| Class Method | context |
Undocumented |
| Class Method | context |
Infer the GBD context from a database name. |
| Class Method | create |
Auto-detect the file type at path and return the appropriate Schema. |
| Class Method | dbname |
Derive a valid SQLite ATTACH identifier from a file path. |
| Class Method | features |
Import a CSV file into an in-memory SQLite table and build feature metadata. |
| Class Method | features |
Introspect an SQLite database and build feature metadata. |
| Class Method | from |
Load a CSV file into a shared in-memory SQLite database and return a Schema. |
| Class Method | from |
Load a Schema from an existing SQLite .db file. |
| Class Method | is |
Return True if path points to an SQLite database file. |
| Class Method | valid |
Undocumented |
| Method | __init__ |
No summary |
| Method | absorb |
Undocumented |
| Method | create |
Create a new feature column or table in this schema. |
| Method | create |
Ensure the central features(hash) table exists. |
| Method | execute |
Undocumented |
| Method | get |
Undocumented |
| Method | get |
Undocumented |
| Method | get |
Undocumented |
| Method | has |
Undocumented |
| Method | is |
Undocumented |
| Method | set |
Persist value for feature on each hash in hashes. |
| Instance Variable | context |
Undocumented |
| Instance Variable | csv |
Undocumented |
| Instance Variable | dbcon |
Undocumented |
| Instance Variable | dbname |
Undocumented |
| Instance Variable | features |
Undocumented |
| Instance Variable | path |
Undocumented |
Auto-detect the file type at path and return the appropriate Schema.
| Parameters | |
path:str | Path to a .db file or CSV file. |
| Returns | |
Schema | Loaded schema instance. |
| Raises | |
SchemaException | If the file cannot be opened or parsed. |
Derive a valid SQLite ATTACH identifier from a file path.
Strips directory and extension, sanitises non-alphanumeric characters to underscores, and prepends the default context if the name starts with a digit.
| Parameters | |
path:str | File-system path. |
| Returns | |
str | Alphanumeric database name, e.g. "cnf_sc2021". |
Import a CSV file into an in-memory SQLite table and build feature metadata.
The CSV must contain a hash column; all columns become 1:1 features stored in a single features table. Column names are sanitised to valid identifiers.
| Parameters | |
dbname:str | Logical database name. |
path:str | Path to the CSV file. |
| con | In-memory sqlite3 connection. |
| Returns | |
dict[str, FeatureInfo] | Feature registry for this schema. |
| Raises | |
SchemaException | If the CSV lacks a hash column. |
def features_from_database(cls, dbname, path, con) ->
dict[ str, FeatureInfo]:
(source)
¶
Introspect an SQLite database and build feature metadata.
Iterates all non-underscore-prefixed tables. Columns that are FK references
(a features column whose name matches a table name) and the hash column
of non-features tables are skipped. All remaining columns become
FeatureInfo entries.
| Parameters | |
dbname:str | Logical database name. |
path:str | File path (informational only). |
| con | Open sqlite3 connection. |
| Returns | |
dict[str, FeatureInfo] | Feature registry for this schema. |
Return True if path points to an SQLite database file.
An empty file is accepted as a new database. If the path does not exist, the user is prompted to confirm creation.
| Raises | |
SchemaException | If the path does not exist and creation is declined. |
| Parameters | |
| dbcon | Open sqlite3 connection for this schema. |
dbname:str | Logical database name (alphanumeric, derived from filename). |
path:str | File-system path to the .db or CSV file. |
features:dict[str, FeatureInfo] | Feature registry for this schema. |
context:str | GBD context (e.g. "cnf", "kis"). |
csv:bool | True when loaded from a CSV into in-memory SQLite;
affects is_in_memory. |
Create a new feature column or table in this schema.
- 1:1 feature (default_value is not None)
- Adds {name} TEXT NOT NULL DEFAULT {default_value} to the features table.
- 1:n feature (default_value is None)
- Creates a separate table {name}(hash, value) with UNIQUE(hash, value), inserts the sentinel row ('None', 'None') (see Issues.md #7), and installs a trigger to keep features.{name} (the FK mirror column) in sync.
| Parameters | |
name:str | Feature name; validated against reserved words and SQLite keywords unless permissive is True. |
defaultNone | None for 1:n; any string for 1:1. |
permissive:bool | Skip validation and silently ignore if already exists (used internally by initialisers). |
| Returns | |
list[FeatureInfo] |
|
| Raises | |
SchemaException | If the name is invalid or already exists (unless permissive). |
Ensure the central features(hash) table exists.
If absent (new database), creates it, back-fills hashes from all existing 1:n tables, and installs INSERT triggers on those tables to keep features populated automatically.
| Returns | |
list[FeatureInfo] | A list containing the hash FeatureInfo if the table was newly created; empty list if it already existed. |
Persist value for feature on each hash in hashes.
- 1:n feature: inserts (hash, value) rows; duplicate pairs are silently ignored (INSERT OR IGNORE); also updates features.{name} = hash to keep the FK mirror column current.
- 1:1 feature: upserts into the features column; on hash conflict, updates the column to value.
| Parameters | |
feature:str | Feature name. |
| value | Value to store (coerced to TEXT by SQLite). |
hashes:list[str] | Benchmark hashes to update. |
| Raises | |
SchemaException | If the feature does not exist or hashes is empty. |