Python API¶
All public functions are exported from the top-level zodb_json_codec
package. The implementation is a compiled Rust extension (via PyO3);
there are no pure-Python fallbacks.
import zodb_json_codec
ZODB Record Functions¶
These functions work with ZODB’s two-pickle record format: a class pickle
followed by a state pickle, concatenated as a single bytes object.
decode_zodb_record¶
decode_zodb_record(data: bytes) -> dict
Decode a ZODB two-pickle record into a Python dict with marker keys.
The GIL is released during the pure-Rust pickle parsing phase, allowing other Python threads to run concurrently.
- Parameters
dataRaw bytes of a ZODB record (two concatenated pickles).
- Returns
A dict with two keys:
"@cls"A list of two strings:
[module, class_name]."@s"The object state. Typically a dict, but can be any JSON-representable value (including
Nonefor empty BTrees). BTree state is automatically flattened using@kv/@ksmarkers.
- Raises
ValueErrorIf the pickle data is malformed, uses unsupported opcodes, or exceeds safety limits.
Example:
record = decode_zodb_record(raw_bytes)
# {'@cls': ['persistent.mapping', 'PersistentMapping'],
# '@s': {'data': {'title': 'Hello', 'count': 42}}}
encode_zodb_record¶
encode_zodb_record(record: dict) -> bytes
Encode a Python dict back into a ZODB two-pickle record.
Uses the direct PyObject-to-pickle encoder, bypassing the intermediate PickleValue AST for maximum speed. The output uses pickle protocol 3, as required by zodbpickle.
- Parameters
recordA dict with
"@cls"(list of[module, name]) and"@s"(state value) keys. The state may contain any JSON marker dicts (@t,@b,@dt,@ref,@kv, etc.).
- Returns
Raw bytes of a ZODB record (two concatenated pickles in protocol 3).
- Raises
ValueErrorIf
@clsis missing, not a two-element list of strings, or if the state contains values that cannot be encoded.
Example:
raw_bytes = encode_zodb_record({
'@cls': ['persistent.mapping', 'PersistentMapping'],
'@s': {'data': {'title': 'Hello', 'count': 42}},
})
decode_zodb_record_for_pg¶
decode_zodb_record_for_pg(data: bytes) -> tuple
Single-pass decode optimized for PostgreSQL JSONB storage. Combines pickle decoding, persistent reference extraction, and null-byte sanitization in one operation.
The GIL is released during the pure-Rust pickle parsing and reference extraction phases.
- Parameters
dataRaw bytes of a ZODB record (two concatenated pickles).
- Returns
A 4-tuple:
class_mod(str)The module name from the class pickle (e.g.,
"persistent.mapping").class_name(str)The class name from the class pickle (e.g.,
"PersistentMapping").state(dict)The decoded object state as a Python dict with marker keys. Strings containing null bytes (
\x00) are replaced with{"@ns": base64}markers, because PostgreSQL JSONB cannot store\u0000.refs(list[int])All persistent reference OIDs found in the state, as integers. Used for the
refscolumn in SQL-based garbage collection (pack).
- Raises
ValueErrorIf the pickle data is malformed.
Example:
mod, name, state, refs = decode_zodb_record_for_pg(raw_bytes)
# mod = 'persistent.mapping'
# name = 'PersistentMapping'
# state = {'data': {'title': 'Hello'}}
# refs = [3, 7, 42]
decode_zodb_record_for_pg_json¶
decode_zodb_record_for_pg_json(data: bytes) -> tuple
Direct JSON string path for PostgreSQL. The entire pipeline – pickle parsing, JSON conversion, null-byte sanitization, and reference extraction – runs in Rust with the GIL released. No intermediate Python dicts are created.
This is the fastest path for storing ZODB records in PostgreSQL JSONB
columns: pass the returned JSON string directly to a SQL INSERT
parameter.
- Parameters
dataRaw bytes of a ZODB record (two concatenated pickles).
- Returns
A 4-tuple:
class_mod(str)The module name from the class pickle.
class_name(str)The class name from the class pickle.
state_json(str)The object state serialized as a JSON string, ready for PostgreSQL JSONB insertion. Null bytes are sanitized.
refs(list[int])All persistent reference OIDs found in the state, as integers.
- Raises
ValueErrorIf the pickle data is malformed.
Example:
mod, name, json_str, refs = decode_zodb_record_for_pg_json(raw_bytes)
# json_str is a ready-to-use JSON string:
# '{"data": {"title": "Hello"}}'
cursor.execute(
"INSERT INTO object_state (class_mod, class_name, state, refs) "
"VALUES (%s, %s, %s::jsonb, %s)",
(mod, name, json_str, refs),
)
Standalone Pickle Functions¶
These functions work with individual pickle byte streams (not ZODB two-pickle records). They are useful for general pickle-to-JSON conversion outside of ZODB.
pickle_to_dict¶
pickle_to_dict(data: bytes) -> dict
Decode a single pickle byte stream into a Python dict (or other Python object) using the direct PickleValue-to-PyObject conversion path.
The GIL is released during the pure-Rust pickle parsing phase.
- Parameters
dataRaw pickle bytes (protocol 2-3, partial protocol 4).
- Returns
The decoded Python object. Simple pickles return native Python types; objects with class information return marker dicts (
@cls,@s,@reduce, etc.).- Raises
ValueErrorIf the pickle data is malformed.
dict_to_pickle¶
dict_to_pickle(data: dict) -> bytes
Encode a Python dict into pickle bytes using the direct
PyObject-to-pickle encoder. This is the inverse of pickle_to_dict.
- Parameters
dataA Python dict, potentially containing JSON marker keys (
@t,@b,@dt,@ref,@cls+@s, etc.).
- Returns
Pickle bytes in protocol 3 format.
- Raises
ValueErrorIf the dict contains values that cannot be encoded, or if recursion depth exceeds 1,000 levels.
pickle_to_json¶
pickle_to_json(data: bytes) -> str
Convert a single pickle byte stream to a pretty-printed JSON string. The entire operation runs in Rust with the GIL released.
This goes through the serde_json intermediate representation, producing human-readable output with indentation.
- Parameters
dataRaw pickle bytes (protocol 2-3, partial protocol 4).
- Returns
A pretty-printed JSON string.
- Raises
ValueErrorIf the pickle data is malformed or cannot be represented in JSON.
json_to_pickle¶
json_to_pickle(data: str) -> bytes
Convert a JSON string back to pickle bytes. This is the inverse of
pickle_to_json.
All JSON markers (@t, @b, @dt, @ref, @cls + @s, etc.) are
recognized and converted back to the appropriate pickle opcodes.
- Parameters
dataA JSON string, potentially containing marker objects.
- Returns
Pickle bytes in protocol 3 format.
- Raises
ValueErrorIf the JSON is malformed or contains invalid marker structures.
Error Handling¶
All functions raise ValueError on failure. Common error conditions:
Unexpected end of pickle stream – truncated input data.
Unknown pickle opcode – opcode not supported by the decoder.
Pickle stack underflow – malformed pickle with missing stack values.
Invalid pickle data – structural errors (wrong types, missing fields).
JSON error – serialization or deserialization failures in the JSON path.
Invalid UTF-8 – non-UTF-8 bytes in a pickle string.
Safety Limits¶
The codec enforces several limits to prevent resource exhaustion from malicious or malformed pickle data:
Memo size: Maximum 100,000 entries.
Recursion depth: Maximum 1,000 levels (encoder and PyObject converter).
Binary data size: BINUNICODE8/BINBYTES8 capped at 256 MB before allocation.
Integer size: LONG opcode text limited to 10,000 characters.
BTree validation: Odd-length item lists in BTree buckets are rejected.
Length validation: Non-negative lengths enforced for LONG4 and BINSTRING opcodes.