Enable BM25 ranking¶
Overview¶
BM25 provides probabilistic relevance ranking, improving search quality over tsvector’s ts_rank_cd().
It is autodetected at startup – no code changes are needed.
When BM25 extensions are available, plone.pgcatalog switches from TsvectorBackend to BM25Backend automatically.
Step 1: install VectorChord-BM25¶
Docker (recommended)¶
docker run -d --name plone-pg-bm25 \
-e POSTGRES_USER=zodb \
-e POSTGRES_PASSWORD=zodb \
-e POSTGRES_DB=zodb \
-p 5432:5432 \
tensorchord/vchord-suite:pg17-latest
Manual installation¶
Install the vchord_bm25 and pg_tokenizer PostgreSQL extensions.
Both must appear in pg_available_extensions for autodetection to succeed.
See the VectorChord-BM25 documentation for build instructions.
Step 2: configure languages¶
Set the PGCATALOG_BM25_LANGUAGES environment variable before starting Zope:
# Explicit language list
export PGCATALOG_BM25_LANGUAGES=en,de,fr
# Auto-detect from Plone's portal_languages
export PGCATALOG_BM25_LANGUAGES=auto
# Single language (default if not set)
export PGCATALOG_BM25_LANGUAGES=en
Each language gets a dedicated search_bm25_{lang} column with a language-specific tokenizer (Snowball stemmer for Western languages, jieba/lindera for CJK).
A fallback search_bm25 column is always created for unmapped languages and cross-language search.
See Search backends reference for the full LANG_TOKENIZER_MAP.
Step 3: restart Zope¶
On restart, plone.pgcatalog autodetects the extensions and:
Creates per-language search_bm25_{lang} columns on object_state
2.
Sets up tokenizers via pg_tokenizer (create_tokenizer())
3.
Creates BM25 indexes for each column
4.
Switches the active backend from TsvectorBackend to BM25Backend
Check the log for:
BM25 search backend activated (languages=['en', 'de', 'fr'])
Step 4: rebuild the catalog¶
A full reindex is required to populate the new BM25 columns:
Go to ZMI > portal_catalog > Advanced tab 2. Click “Clear and Rebuild”
Or via script:
catalog = portal.portal_catalog
catalog.clearFindAndRebuild()
import transaction; transaction.commit()
Switching back to Tsvector¶
Remove the VectorChord-BM25 extensions (or switch to a standard postgres:17 image) and restart Zope.
plone.pgcatalog automatically falls back to TsvectorBackend when the extensions are not detected in pg_available_extensions.
The existing tsvector searchable_text column is always maintained regardless of which backend is active, so no rebuild is needed when switching back.