State processor API¶
State processors are plugins that extract extra column data from object
state during writes.
They allow downstream packages (for example, plone-pgcatalog) to write
supplementary columns alongside the object state in a single atomic
INSERT ... ON CONFLICT statement.
Protocol¶
State processors use duck typing. A processor must implement the following methods:
Required methods¶
get_extra_columns() -> list[ExtraColumn]Return the list of extra columns this processor writes. Called once during registration and during each
tpc_vote().process(zoid: int, class_mod: str, class_name: str, state: str) -> dict | NoneExtract extra column data from an object’s state.
stateis a JSON string (the decoded object state, pre-sanitization). The method may modifystatein-place (for example, pop annotation keys to prevent them from being persisted). Returns a dict of{column_name: value}for extra columns, orNonewhen no extra data applies to this object. Called duringstore()after pickle-to-JSON decoding for every object in the transaction.
Optional methods¶
get_schema_sql() -> str | NoneReturn DDL statements to apply (for example,
ALTER TABLE ... ADD COLUMN,CREATE INDEX). Called once duringregister_state_processor(). The DDL is executed via a separate autocommit connection. If blocked by startup read transactions (lock conflict), the DDL is deferred to the firsttpc_begin(). ReturnsNoneif no DDL is needed.finalize(cursor) -> NoneCalled at the end of
tpc_vote(), after all objects have been written but before the transaction commits. The cursor belongs to the same PostgreSQL transaction as the object writes, so any additional SQL executed here is atomic with the object state. This hook is useful for operations that depend on the full set of written objects (for example, JSONB merges or aggregation queries).
ExtraColumn¶
from zodb_pgjsonb import ExtraColumn
A dataclass that declares an extra column for the object_state table.
Fields¶
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
– |
PostgreSQL column name. Must be a valid SQL identifier (letters, digits, underscores; must start with a letter or underscore). Validated on construction. |
|
|
– |
SQL value expression for the |
|
|
|
SQL expression for the |
Construction¶
ExtraColumn(
name="path",
value_expr="%(path)s",
)
ExtraColumn(
name="searchable_text",
value_expr="to_tsvector('simple'::regconfig, %(searchable_text)s)",
update_expr="to_tsvector('simple'::regconfig, EXCLUDED.searchable_text)",
)
The name field is validated against the pattern ^[a-zA-Z_][a-zA-Z0-9_]*$.
A ValueError is raised if the name does not match.
Security¶
value_expr and update_expr are interpolated directly into SQL
INSERT statements without escaping.
Only register processors from trusted, audited code.
A compromised processor can read, modify, or delete any data in the
database.
Column names are validated against a strict identifier pattern;
expressions are not validated because they may legitimately contain
SQL function calls.
Lifecycle¶
The following sequence describes when each method is called during a ZODB write transaction:
Registration (
register_state_processor()):get_extra_columns()andget_schema_sql()are called. DDL is applied (or deferred).Store (
store()/restore()):process(zoid, class_mod, class_name, state)is called for each object after pickle-to-JSON decoding. The returned dict is attached to the object entry.Vote (
tpc_vote()):get_extra_columns()is called to collect column definitions. All objects (with extra column data) are written in a batchedexecutemany()call. After all writes,finalize(cursor)is called on each processor that implements it.Finish (
tpc_finish()): The PostgreSQL transaction is committed. No processor methods are called.Abort (
tpc_abort()): The PostgreSQL transaction is rolled back. No processor methods are called.
DDL application¶
DDL from get_schema_sql() is applied using a dedicated autocommit
connection (not the pool connection).
This avoids conflicts with REPEATABLE READ snapshots held by pool
connections.
At Zope startup, IDatabaseOpenedWithRoot subscribers fire while a
ZODB Connection still holds an ACCESS SHARE lock via its
REPEATABLE READ snapshot.
ALTER TABLE requires ACCESS EXCLUSIVE, which would deadlock.
The storage handles this by setting lock_timeout = '2s' on the DDL
connection.
If the lock cannot be acquired, the DDL is stored in a pending queue
and applied at the next tpc_begin() (after the read transaction
has been committed and the ACCESS SHARE lock released).