Architecture¶
cloud-vinyl is a Kubernetes operator that manages Vinyl Cache clusters (the FOSS HTTP cache formerly known as Varnish Cache) as first-class Kubernetes resources. It is designed around the Kubernetes operator pattern with a central controller instead of per-pod sidecars.
Components¶
graph TD
subgraph Kubernetes cluster
OP[cloud-vinyl operator<br/>Deployment]
VC[VinylCache CR]
SS[StatefulSet<br/>Vinyl Cache pods]
AG[vinyl-agent<br/>sidecar per pod]
PX[Purge/BAN proxy<br/>port 8090]
WH[Admission webhook<br/>port 9443]
end
VC -->|reconcile| OP
OP -->|manages| SS
SS -->|contains| AG
OP -->|pushes VCL| AG
AG -->|manages Vinyl Cache| VCL[Vinyl Cache process]
PX -->|PURGE/BAN broadcast| SS
WH -->|defaults + validates| VC
cloud-vinyl operator¶
The operator is a single Deployment that runs the reconcile loop. It:
Watches
VinylCachecustom resources.Reconciles StatefulSets, Services, Secrets, EndpointSlices, and NetworkPolicies.
Generates VCL from the
VinylCachespec using Go templates.Pushes generated VCL to each ready pod via the vinyl-agent HTTP API.
Exposes metrics on port
8080and serves admission webhooks on port9443.Runs a Purge/BAN proxy on port
8090that broadcasts cache-invalidation requests to all pods.
vinyl-agent¶
A lightweight HTTP server running as a sidecar in each Vinyl Cache pod. It wraps the Vinyl Cache admin interface (port 6082) and exposes:
POST /vcl/push— compile and activate a new VCL.GET /vcl/active— return the hash of the currently active VCL.POST /ban— issue a ban command.
Communication between the operator and vinyl-agent is authenticated with a Bearer token.
The token is stored in a per-namespace Kubernetes Secret (cloud-vinyl-agent-token),
shared by all VinylCache instances in the same namespace. The operator reads the token
from the Secret before each push request.
Purge/BAN proxy¶
The operator exposes an HTTP endpoint on port 8090 that accepts:
PURGE /<path>— HTTP PURGE broadcast to all Vinyl Cache pods.POST /banorBANmethod — validated ban expression, forwarded to vinyl-agent/banon all pods.POST /purge/xkey— xkey-based purge, onePURGEper key withX-Xkey-Purgeheader.
Upstream services send a single request; the operator fans it out to all pods in parallel.
How the proxy finds the right cache¶
Since the operator is a single cluster-wide Deployment, a request hitting port 8090 could belong to any VinylCache in any namespace. The proxy disambiguates via the HTTP Host header — and the operator bends Kubernetes DNS to make that Just Work for callers.
For each VinylCache named my-cache in namespace app, the reconciler creates:
A
Servicenamedmy-cache-invalidationinapp(ClusterIP, port 8090).An
EndpointSlicefor that Service whose only endpoint is the operator’s own pod IP (sourced from thePOD_IPenv var via the downward API).
Result: a caller that sends
PURGE /path Host: my-cache-invalidation.app.svc.cluster.local
hits the Service → kube-proxy → operator pod (because the Service’s only endpoint IS the operator). The proxy inside the operator runs a RegisteredRouter that the controller fills with one entry per VinylCache:
my-cache-invalidation.app → {app, my-cache}
my-cache-invalidation.app.svc.cluster.local → {app, my-cache}
other-cache-invalidation.billing → {billing, other-cache}
other-cache-invalidation.billing.svc.cluster.local → {billing, other-cache}
On each request, the proxy:
Reads the
Hostheader (stripping port).Looks it up in the
RegisteredRouterto obtain{namespace, cacheName}. No match →404.Looks up the list of ready pod IPs for that cache in a parallel
PodMap.Broadcasts PURGE/BAN to every pod; aggregates per-pod results into the JSON response.
Because the discriminator is the FQDN, two caches with the same name in different namespaces (foo in a vs foo in b) are unambiguous: foo-invalidation.a and foo-invalidation.b are distinct hosts.
Gotcha — source IPs after SNAT. The proxy’s source-CIDR ACL (spec.invalidation.{purge,ban}.allowedSources) sees the IP of whoever connected to the Service, which for most CNIs means the kube-proxy SNAT source, not the original caller. If you want tight source filtering, place it at an ingress/policy layer that preserves the client IP, not in the VinylCache spec.
Why a central operator instead of sidecars?¶
A per-pod signaller sidecar design that watches the Kubernetes Endpoints API and triggers VCL reloads has several structural problems:
Chicken-and-egg at startup: pods with readiness gates cannot become ready until the sidecar receives the first VCL push — but the sidecar needs other pods to be ready to build the peer list.
Silent drop on failure: if a VCL push fails, the sidecar logs and continues. Pods silently run stale VCL.
No debouncing: rapid endpoint churn (rolling restarts) triggers many VCL regenerations back-to-back.
cloud-vinyl solves all three:
The operator pushes VCL after the pod passes its health probes — no readiness gate deadlock.
The reconcile loop retries on failure with exponential backoff (configurable via
spec.retry).Debouncing is built in (
spec.debounce.duration, default 1 s).
RBAC scope¶
The operator requires a ClusterRole because it creates resources in user namespaces
(the namespace where the VinylCache resource lives, which may differ from the operator namespace).
Specifically it manages: StatefulSets, Services, Secrets, EndpointSlices, NetworkPolicies,
and Leases (for leader election).