Database schema reference¶
This page documents the PostgreSQL schema extensions, indexes, JSONB structure, and SQL functions installed by plone.pgcatalog.
object_state table extensions¶
plone.pgcatalog extends the object_state table (owned by zodb-pgjsonb)
with the following columns:
Column |
Type |
Purpose |
|---|---|---|
|
|
Physical object path (for example, |
|
|
Parent path (for path queries) |
|
|
Path depth (for depth-limited queries) |
|
|
All index and metadata values |
|
|
Weighted full-text search vector |
|
|
BM25 fallback column (when BM25 active) |
|
|
Per-language BM25 column (when BM25 active) |
The first five columns are always present.
The search_bm25 and
per-language search_bm25_{lang} columns are created only when
VectorChord-BM25 extensions are detected at startup.
PostgreSQL indexes¶
Core indexes¶
Index Name |
Type |
Expression |
Purpose |
|---|---|---|---|
|
B-tree |
|
Path column lookup (brain construction) |
|
GIN |
|
JSONB containment/existence queries |
|
GIN |
|
Full-text search |
Path expression indexes¶
Index Name |
Type |
Expression |
Purpose |
|---|---|---|---|
|
B-tree |
|
Path equality queries |
|
B-tree ( |
|
Path LIKE prefix queries |
|
B-tree |
|
Parent path queries (depth=1) |
|
B-tree |
|
Depth-limited queries |
Date expression indexes¶
Index Name |
Type |
Expression |
Purpose |
|---|---|---|---|
|
B-tree |
|
Date sorting/filtering |
|
B-tree |
|
Date sorting/filtering |
|
B-tree |
|
Date sorting/filtering |
|
B-tree |
|
Date sorting/filtering |
Field expression indexes¶
Index Name |
Type |
Expression |
Purpose |
|---|---|---|---|
|
B-tree |
|
Title sorting |
|
B-tree |
|
Type filtering |
|
B-tree |
|
Workflow state filtering |
|
B-tree |
|
UUID lookup |
Text expression indexes¶
Index Name |
Type |
Expression |
Purpose |
|---|---|---|---|
|
GIN |
|
Title word-level matching |
|
GIN |
|
Description word-level matching |
Additional GIN expression indexes are autocreated at startup for any
addon ZCTextIndex fields discovered in portal_catalog.
These follow
the naming pattern idx_os_cat_{key}_tsv and use the 'simple'
regconfig for word-level matching.
BM25 indexes (optional)¶
When VectorChord-BM25 extensions are installed:
Index Name |
Type |
Expression |
Purpose |
|---|---|---|---|
|
BM25 |
|
Fallback BM25 ranking |
|
BM25 |
|
Per-language BM25 ranking |
All indexes use WHERE idx IS NOT NULL or WHERE searchable_text IS NOT NULL
partial index predicates to exclude uncataloged rows.
idx JSONB structure¶
All standard Plone catalog indexes and metadata columns are stored together in a single JSONB document. Example for a typical Plone Page:
{
"UID": "abc123-def456-789",
"Title": "My Document",
"Description": "A sample document",
"portal_type": "Document",
"review_state": "published",
"Creator": "admin",
"Subject": ["Python", "Plone"],
"created": "2025-01-15T10:00:00+00:00",
"modified": "2025-02-20T14:30:00+00:00",
"effective": "2025-01-15T10:00:00+00:00",
"expires": null,
"is_folderish": false,
"is_default_page": false,
"sortable_title": "my document",
"allowedRolesAndUsers": ["Anonymous"],
"Language": "en",
"path": "/plone/my-document",
"path_parent": "/plone",
"path_depth": 2,
"getObjPositionInParent": 5,
"image_scales": null
}
Key conventions:
Index values (used for PG queries) are converted to JSON-safe types: dates become ISO 8601 strings, etc. These live at the top level of idx.
Metadata values that are JSON-native (str, int, float, bool, None, and lists/dicts of these) also live at the top level of idx.
Non-JSON-native metadata (Zope
DateTime, stdlibdatetime,date, etc.) is encoded via the Rust codec (zodb-json-codec) into a nested"@meta"key. This preserves original Python types so thatbrain.effectivereturns aDateTimeobject, not a string. The@metadict uses codec type markers (for example@dt,@cls+@s) and is decoded once per brain on first access (cached thereafter).Multi-value fields (for example,
Subject,allowedRolesAndUsers) are stored as JSON arrays.Boolean fields are stored as JSON
true/false.nullvalues are stored explicitly where the object has no value for that field.Path fields (
path,path_parent,path_depth) are stored in both the dedicated table columns and the idx JSONB for unified path query support.Fields that are both indexes and metadata (for example
effective) appear in both places: top-level idx holds the converted ISO string (for PG queries), while@metaholds the originalDateTime(for brain attribute access).
Text Extraction Queue (Optional)¶
Created when PGCATALOG_TIKA_URL is set.
Provides a PostgreSQL-backed
job queue for asynchronous text extraction via Apache Tika.
text_extraction_queue Table¶
Column |
Type |
Default |
Purpose |
|---|---|---|---|
|
|
auto |
Primary key |
|
|
— |
Object zoid (references |
|
|
— |
Transaction ID (identifies the blob version) |
|
|
— |
MIME type (for example, |
|
|
|
Job status: |
|
|
|
Number of processing attempts |
|
|
|
Maximum retry attempts before marking as |
|
|
— |
Error message from the last failed attempt |
|
|
|
Job creation timestamp |
|
|
|
Last status change timestamp |
Constraints: UNIQUE(zoid, tid) prevents duplicate jobs for the same
object version.
Queue Indexes¶
Index Name |
Type |
Expression |
Purpose |
|---|---|---|---|
|
B-tree (partial) |
|
Fast dequeue of pending jobs |
Queue Trigger¶
A NOTIFY trigger fires on every INSERT, sending a
text_extraction_ready notification with the job ID.
This wakes the
extraction worker instantly without polling.
CREATE TRIGGER trg_notify_extraction
AFTER INSERT ON text_extraction_queue
FOR EACH ROW EXECUTE FUNCTION notify_extraction_ready();
See Tika text extraction architecture for the full architecture and Enable Tika text extraction for setup instructions.
SQL functions¶
See SQL functions reference for the full reference of
pgcatalog_to_timestamptz(), pgcatalog_lang_to_regconfig(),
pgcatalog_merge_extracted_text(), and rrule functions.
rrule_plpgsql schema¶
Installed automatically at startup. Provides a pure PL/pgSQL implementation of RFC 5545 RRULE expansion for DateRecurringIndex queries. No C extensions required.
The schema is created idempotently using CREATE SCHEMA IF NOT EXISTS
with exception handling for type definitions.
Functions are installed
from the vendored rrule_schema.sql file in the package.
See SQL functions reference for rrule."between"() and rrule."after"()
function signatures.
Schema installation¶
Schema is applied automatically at startup via the
CatalogStateProcessor.get_schema_sql() method, called by zodb-pgjsonb
when the state processor is registered.
The installation sequence is:
Catalog columns (ALTER TABLE ... ADD COLUMN IF NOT EXISTS)
2.
SQL functions (CREATE OR REPLACE FUNCTION)
3.
Catalog indexes (CREATE INDEX IF NOT EXISTS)
4. rrule schema (idempotent CREATE SCHEMA IF NOT EXISTS)
5.
BM25 extensions and columns (if detected)
6.
Text extraction queue and merge function (if PGCATALOG_TIKA_URL is set)
All DDL is idempotent and safe to re-execute on an existing database.
The install_catalog_schema() function in schema.py executes each
DDL block via conn.execute().