Security considerations

zodb-pgjsonb’s most significant security property is structural: it stores object state as JSON instead of pickle, eliminating the entire class of remote code execution attacks inherent to pickle deserialization. Beyond this foundational choice, the implementation includes several hardening measures for SQL injection prevention, credential protection, file permissions, and connection safety.

This page documents each security measure, the threat it addresses, and the reasoning behind the design.

Pickle elimination

The threat

Python’s pickle protocol can serialize arbitrary objects, including instructions to import modules and call functions. A crafted pickle can execute arbitrary code when deserialized. This is not a bug – it is a fundamental property of the protocol. The Python documentation explicitly warns: “The pickle module is not secure. Only unpickle data you trust.”

In traditional ZODB deployments, every object load deserializes pickle data from the storage backend. If an attacker can write a crafted pickle to the database (through SQL injection, compromised backups, or a vulnerability in an application layer), every ZODB client that loads that object will execute the attacker’s code.

The mitigation

zodb-pgjsonb stores object state as PostgreSQL JSONB. JSON can contain strings, numbers, booleans, null, arrays, and objects – nothing else. There is no mechanism to embed executable code in JSON. A compromised JSON value can contain unexpected data, but it cannot trigger code execution when loaded.

The Rust codec (zodb-json-codec) converts pickle to JSON on write and JSON to pickle on read. The pickle bytes that ZODB sees are always freshly generated by the codec from validated JSON data, not raw bytes read from the database. This means even if an attacker modifies the JSONB data in PostgreSQL directly, the resulting pickle is constrained to the types the codec knows how to produce.

Comparison with pickle-based storage

Property

Pickle storage (FileStorage, RelStorage)

JSONB storage (zodb-pgjsonb)

Stored format

Arbitrary pickle bytecode

JSON values only

Code execution on load

Possible (pickle can call any function)

Not possible (JSON has no executable content)

Crafted data risk

Remote code execution

Data corruption (no code execution)

Defense depth

Requires restricted unpickler, class whitelist

Format itself is safe

The security difference is not one of degree – it is categorical. Pickle-based storage requires active defense (restricted unpicklers, class whitelists) to prevent code execution. JSONB storage is passively safe because the format has no execution capability.

Restricted unpickler for legacy extension data

The threat

ZODB transaction extension data (transaction.extension) is a dict that may contain metadata like user names and descriptions. Historically, this data was stored as pickle bytes. When zodb-pgjsonb reads extension data from the transaction_log table, it must deserialize these bytes.

If the extension data contains a crafted pickle (from a legacy database or a compromised write), unrestricted deserialization would execute arbitrary code.

The mitigation

zodb-pgjsonb uses _RestrictedUnpickler for legacy extension data. This unpickler overrides find_class() to allow only safe built-in types:

  • dict, list, tuple, set, frozenset

  • str, bytes, int, float, bool, complex

Any attempt to deserialize an object of a different class raises pickle.UnpicklingError. This blocks the standard pickle exploitation techniques (which rely on importing modules like os, subprocess, or builtins).

New extension data is always serialized as JSON bytes, not pickle. The restricted unpickler is a backward-compatibility measure for databases that contain legacy pickle-formatted extension data.

SQL injection prevention

ExtraColumn name validation

State processors declare extra columns via ExtraColumn(name, value_expr, update_expr). The value_expr and update_expr are SQL expressions interpolated directly into INSERT statements – this is by design, since they may legitimately contain SQL function calls.

To prevent injection through the column name, ExtraColumn.__post_init__() validates the name attribute against a strict SQL identifier pattern: ^[a-zA-Z_][a-zA-Z0-9_]*$. Any name containing spaces, quotes, semicolons, or other special characters is rejected with a ValueError.

The value_expr and update_expr are intentionally not validated because they must support arbitrary SQL expressions. The security documentation on ExtraColumn makes the trust model explicit: only register processors from trusted, audited code. A compromised processor has full SQL access to the database.

Parameterized queries

All user-facing queries (load, store, poll, history, undo) use psycopg’s parameterized query interface with %s or %(name)s placeholders. Values are never string-formatted into SQL. psycopg handles escaping at the protocol level using PostgreSQL prepared statement parameters, which is immune to SQL injection regardless of value content.

DSN credential masking

The threat

PostgreSQL connection strings (DSN) contain passwords. If a DSN appears in a log file, stack trace, or error message, the password is exposed.

The mitigation

_mask_dsn() replaces the password portion of a DSN with *** before any debug logging. The regex handles both password=secret and password='quoted secret' formats. This function is called wherever the DSN is logged – during pool creation, connection errors, and diagnostic output.

Blob file permissions

The threat

Blob files cached locally may contain sensitive data (uploaded documents, images with metadata, private files). Permissive file permissions could allow other users on the same system to read blob contents.

The mitigation

Blob files are written with permissions 0o600 (owner read/write only). This restricts access to the process owner, preventing other system users from reading blob data even if they have access to the blob cache directory.

Connection pool timeout

The threat

Without a timeout, a connection pool request blocks indefinitely when all connections are in use. An attacker or a bug that leaks connections could cause all subsequent requests to hang, effectively denying service.

The mitigation

The pool_timeout parameter (default 30 seconds) limits how long getconn() blocks before raising PoolTimeout. This ensures that connection exhaustion surfaces as an explicit error rather than an indefinite hang. The timeout is configurable via ZConfig (pool-timeout) to accommodate deployments with different concurrency requirements.

PG null-byte sanitization

The threat

PostgreSQL JSONB cannot store the null byte (\u0000) in string values. An INSERT containing a null byte in a JSONB string silently truncates the string or raises an error, depending on the PostgreSQL version. This could corrupt object data.

The mitigation

_sanitize_for_pg() recursively scans JSONB values before write. Strings containing \u0000 are replaced with an @ns marker: {"@ns": "<base64-encoded original>"}. On read, _unsanitize_from_pg() reverses the transformation.

The common case (no null bytes) incurs zero allocations – the function returns the original object unchanged. The marker format is unambiguous because @ns is a reserved key prefix in the zodb-json-codec type mapping.

This is a data integrity measure rather than a security defense, but it prevents a class of silent data corruption that could otherwise affect applications storing binary-adjacent string data in ZODB objects.

Connection safety

Autocommit mode

Pool connections operate in autocommit mode by default. This prevents idle transactions from holding locks or accumulating WAL overhead. Read snapshots use explicit BEGIN ISOLATION LEVEL REPEATABLE READ with well-defined lifecycle management – the snapshot is started in poll_invalidations() and ended before tpc_begin() or in release().

Closed connection handling

The connection pool detects and discards connections that have been closed by PostgreSQL (due to server restart, idle timeout, or network failure). This prevents stale connections from being reused, which would cause cryptic errors on subsequent queries.