Security¶
The codec processes pickle data from ZODB, which is a trusted internal format
generated by the application’s own persistence layer. Unlike Python’s
pickle.loads(), the codec does not execute arbitrary code – it parses
pickle opcodes into a data-only AST (PickleValue) and never calls REDUCE
targets.
That said, defense-in-depth is good practice. Malformed data can arrive through database corruption, storage bugs, or migration errors. The following limits ensure the codec fails gracefully rather than consuming unbounded resources.
These measures were introduced in v1.2.2 following a security review.
CODEC-C1: Non-negative length validation¶
Opcodes: LONG4, BINSTRING
Problem: These opcodes carry a 4-byte signed length prefix. A negative length value would be interpreted as a very large unsigned value, causing the decoder to attempt reading billions of bytes from the input.
Mitigation: The decoder validates that the length is non-negative before using it as a read size. Negative lengths produce an immediate decode error.
CODEC-C2: Memo size cap¶
Limit: 100,000 entries
Problem: The pickle LONG_BINPUT opcode stores a value in the memo at an
arbitrary integer index. A malicious pickle could issue LONG_BINPUT with
index 2,000,000,000, causing the memo Vec to allocate gigabytes of memory
(Rust vectors are contiguous, so the allocation must cover the full index
range).
Mitigation: The decoder rejects any memo index that would bring the total memo size above 100,000 entries. Normal ZODB records use at most a few hundred memo entries, so this limit has no effect on legitimate data.
CODEC-H1: Recursion depth limit¶
Limit: 1,000 levels
Applies to: Encoder (encode.rs) and PyObject converter (pyconv.rs)
Problem: Deeply nested Python objects (dicts containing dicts containing dicts…) cause recursive function calls in the encoder. Without a limit, pathological nesting could overflow the Rust thread stack, which by default is 8 MB on most platforms.
Mitigation: Both the encoder and the PyObject converter track recursion depth and return an error if it exceeds 1,000. Normal ZODB objects rarely exceed 10 levels of nesting. The limit is generous enough to handle any legitimate data while preventing stack overflow.
CODEC-H2: Pre-scan dict keys¶
Problem: When encoding a Python dict to pickle, the encoder must check
whether any keys are JSON marker keys (@cls, @dt, @ref, etc.). The
original code checked keys one at a time, and for dicts with a mix of marker
and non-marker keys, this could lead to quadratic re-processing: the fast
path would start, discover a marker partway through, restart on the marker
path, and reprocess already-visited keys.
Mitigation: The encoder pre-scans all dict keys in a single pass before
choosing a code path. This is O(n) regardless of key distribution. For dicts
with no @-prefixed keys (>99% of ZODB state dicts), the scan exits
immediately on the first key.
CODEC-M1: LONG opcode text limit¶
Limit: 10,000 characters
Problem: The LONG opcode (not LONG1/LONG4) represents an integer as
a text string like 12345L. Without a limit, a malformed pickle could contain
a LONG opcode with millions of digits, causing the big-integer parser to
consume excessive CPU and memory.
Mitigation: The decoder rejects LONG text representations exceeding 10,000 characters. A 10,000-digit integer is approximately 33,000 bits – far beyond any integer that appears in ZODB data.
CODEC-M2: BTree bucket validation¶
Problem: BTree bucket data is stored as a flat list of alternating keys
and values: [k1, v1, k2, v2, ...]. An odd-length list would leave a key
without a value, which could cause a panic in the chunked iterator or produce
silently corrupted output.
Mitigation: The format_flat_data() function in btrees.rs rejects
odd-length item lists with an explicit error before processing.
CODEC-M3: Large string/bytes allocation cap¶
Limit: 256 MB
Opcodes: BINUNICODE8, BINBYTES8
Problem: These protocol 4/5 opcodes carry an 8-byte length prefix, allowing lengths up to 2^64 bytes. A malformed length could cause the decoder to attempt allocating terabytes of memory.
Mitigation: The decoder caps the allocation at 256 MB. Any single string or bytes value larger than 256 MB in a ZODB record would be exceptional (blobs are stored separately, not inline in pickle state). This limit prevents unbounded allocation while being generous enough for any legitimate data.
What the codec does NOT do¶
For context, here is what the codec intentionally does not guard against:
Arbitrary code execution: The codec never executes pickle
REDUCEtargets. It records them asPickleValue::Instancedata structures. This is fundamentally safer than Python’spickle.loads(), which calls arbitrary callables.Untrusted input: The codec is designed for ZODB data produced by the application itself. It is not hardened for processing pickles from untrusted sources. The limits above are defense-in-depth against corruption, not a sandbox for hostile input.
Protocol 4/5 full support: ZODB uses zodbpickle, which supports up to protocol 3. Protocol 4/5 opcodes are partially handled (enough for interoperability) but are not the primary target.