Custom types with blob fields and Tika

When PGCATALOG_TIKA_URL is configured, plone.pgcatalog overrides the SearchableText indexer for IFile to skip the synchronous portal_transforms pipeline. The Tika async worker extracts text from blobs instead.

What’s covered automatically

  • File content type (IFile)—the override handles this.

What’s NOT covered

Custom Dexterity content types with NamedBlobFile primary fields that do NOT provide IFile. If such a type has a custom SearchableText indexer that calls portal_transforms, the transforms will still run synchronously.

How to add Tika support for custom types

Register a conditional indexer similar to the built-in override:

from plone.app.contenttypes.indexers import SearchableText
from plone.indexer import indexer
from my.package.interfaces import IMyCustomType

import os

# Import your original indexer
from my.package.indexers import SearchableText_mycustomtype as _original


@indexer(IMyCustomType)
def SearchableText_mycustomtype_tika(obj):
    tika_url = os.environ.get("PGCATALOG_TIKA_URL", "").strip()
    if tika_url:
        return SearchableText(obj)
    return _original(obj)

Register it in your package’s overrides.zcml:

<adapter
    factory=".indexers.SearchableText_mycustomtype_tika"
    name="SearchableText"
    />

This ensures:

  • With Tika: only Title + Description are indexed synchronously; Tika extracts blob text async.

  • Without Tika: the original transform-based indexer runs as before.