Security hardening¶
plone.pgcatalog constructs SQL queries from user-supplied search parameters. This makes SQL injection prevention, denial-of-service limits, and access control critical concerns. This page documents the security measures in place and the reasoning behind each.
SQL injection prevention¶
Parameterized queries¶
All user-supplied values go through psycopg’s parameterized query interface.
The
query builder in query.py uses %(name)s placeholders exclusively – values are
never string-formatted into SQL. psycopg handles escaping and type conversion at
the protocol level (PostgreSQL prepared statement parameters), which is immune to
SQL injection regardless of the value’s content.
# Correct: parameterized (used throughout plone.pgcatalog)
cur.execute(
"SELECT zoid FROM object_state WHERE idx @> %(val)s::jsonb",
{"val": Json({"portal_type": user_input})}
)
# Never done: string formatting
cur.execute(f"... WHERE idx->>'portal_type' = '{user_input}'") # NEVER
Safe DDL composition¶
DDL statements (CREATE INDEX, ALTER TABLE, CREATE FUNCTION) cannot use parameter
placeholders because PostgreSQL does not support parameterized DDL. plone.pgcatalog
uses psycopg’s sql module for safe DDL composition:
psycopg.sql.Identifier()for column names, index names, and table names.psycopg.sql.Literal()for string constants (for example, tokenizer names in BM25 DDL).psycopg.sql.SQL()for static SQL fragments.
These composable types handle quoting correctly for all PostgreSQL identifier rules, including names that contain special characters or are reserved words.
Identifier validation¶
Index names from ZCatalog and language codes from configuration flow into SQL
expressions (for example, idx->>'portal_type').
Even though these are not user-supplied
in the HTTP request sense, they originate from site configuration and could
theoretically be manipulated by a site administrator.
validate_identifier(name) in columns.py rejects any name that does not match
the pattern ^[a-zA-Z_][a-zA-Z0-9_]*$.
This is applied to:
Index names during
IndexRegistry.register()andsync_from_catalog().Column names in DDL (BM25 column names, text index names).
Language codes in BM25 backend DDL (validated against the
LANG_TOKENIZER_MAPallowlist, then identifier-validated for the derived column/index/tokenizer names).
Any index name that fails validation is silently skipped during registry population, with a warning logged.
DoS prevention¶
Search queries can be influenced by web users via URL parameters (b_start,
b_size, SearchableText, path).
Without limits, a malicious request could
trigger resource-exhausting queries.
Limit |
Value |
Protects against |
|---|---|---|
|
10,000 |
Unbounded result sets. Even with |
|
1,000,000 |
Deep pagination. PostgreSQL must skip OFFSET rows before returning results; extremely large offsets cause full table scans. |
|
1,000 chars |
Full-text query explosion. |
|
1,000 chars |
RRULE parsing abuse. DateRecurringIndex queries use PL/pgSQL rrule expansion; extremely complex recurrence rules could consume significant CPU. |
Max path list |
100 items |
Large IN clauses. Path queries accept lists of paths; without a limit, a request could pass thousands of paths, producing an expensive |
These limits are enforced in query.py before SQL construction.
They are deliberately
conservative – legitimate Plone usage rarely approaches any of them.
Note that plone.pgcatalog does not implement application-level rate limiting on
search endpoints.
For production deployments, a reverse proxy (nginx, HAProxy) should
rate-limit requests to @@search and @@search-results endpoints.
Access control¶
Management methods¶
Maintenance operations that modify catalog data or read diagnostic information are
protected with the ManageZCatalogEntries permission (accessible only to Managers
by default):
refreshCatalog()– rebuild catalog from contentreindexIndex()– re-apply a specific indexclearFindAndRebuild()– clear all catalog data and rebuildmanage_get_catalog_summary()– ZMI summary datamanage_get_catalog_objects()– ZMI object listingmanage_get_object_detail()– ZMI object detailmanage_get_indexes_and_metadata()– ZMI index listing
Unrestricted search¶
unrestrictedSearchResults() bypasses security filtering (no allowedRolesAndUsers,
no effectiveRange).
It is declared private via ClassSecurityInfo, making it
callable only from trusted Python code (not through-the-web, not from restricted
Python scripts, not from page templates).
Security filters¶
Every call to searchResults() injects two security filters before the query
reaches build_query():
allowedRolesAndUsers: Plone’s security index. The current user’s roles and group memberships are passed as a keyword query, filtering results to objects the user is allowed to see. This is a KeywordIndex query using JSONB?|overlap, matching the same semantics as ZCatalog’sallowedRolesAndUsersindex.effectiveRange: Plone’s publication date filtering. Content with a futureeffectivedate or a pastexpiresdate is excluded unless the user has theAccessInactivePortalContentpermission. This is a DateRangeIndex query using thepgcatalog_to_timestamptz()expression index.
These filters are injected by apply_security_filters() in query.py and cannot
be bypassed through the public searchResults() API.
IPGIndexTranslator security contract¶
Custom index types are supported via IPGIndexTranslator named utilities.
These
utilities return raw SQL fragments that are inserted directly into WHERE clauses.
This is a deliberate design choice for flexibility, but it places security
responsibility on the translator implementation.
The security contract is documented in the IPGIndexTranslator interface:
All user-supplied values MUST use
%(name)sparameter placeholders with corresponding entries in the returned params dict.SQL fragments must never interpolate user input directly.
Index and column identifiers in the fragment should be hardcoded constants or validated via
validate_identifier().
Translator authors who violate this contract introduce SQL injection vulnerabilities. The interface docstring makes this explicit:
The
query()method returns a raw SQL fragment that is appended directly to the WHERE clause – never interpolate user input into this fragment.
Connection safety¶
Closed connection checks¶
release_request_connection() checks conn.closed before returning a connection
to the pool.
A connection that was closed by PostgreSQL (due to timeout, server
restart, or network failure) is not returned to the pool, preventing subsequent
queries from failing with a stale connection error.
Autocommit mode¶
Pool connections operate in autocommit mode. This prevents long-running transactions from holding locks or accumulating WAL overhead. Each query is its own transaction, committed immediately.
The exception is the storage instance connection (used for read queries), which runs inside a REPEATABLE READ transaction managed by zodb-pgjsonb. This transaction has a well-defined lifecycle tied to the ZODB connection, and is cleaned up when the ZODB connection is released.
Advisory lock serialization¶
ZODB’s tpc_finish uses PostgreSQL advisory locks to serialize transaction ID (TID)
generation.
This prevents race conditions where two concurrent commits could generate
the same TID.
The advisory lock is held only for the duration of TID assignment and
released immediately, minimizing contention.