codex-pdf
Structured PDF extraction API that turns complex files into consistent JSON.
Collate aligns separation plates against an approved 1-up — or N candidate PDFs against a reference sheet — and reports exactly where they differ: per-ink coverage and geometry deltas, which separations are missing or extra, and a raw match score. It states the numbers; your policy engine decides what's in tolerance.
AGPL-3.0 · plate ↔ 1-up + document ↔ document · measured deltas, never a verdict · built on codex
How it works
Collate is the objective comparison layer of the stack: it reads facts from codex, measures where two files differ, and hands the numbers to the policy and viewer engines. Each engine owns exactly one job.
codex reads each file into stable, schema-versioned facts — per-separation coverage, screen ruling and angle, Pantone spot intents, dieline geometry. Collate consumes those facts; it owns no raster primitives of its own.
Collate aligns the two sides — a plate set against a 1-up, or N candidates against a reference sheet — normalizes ink names so the namespaces match, then measures: coverage delta per ink, geometry delta in millimetres, and which separations are on both, one, or neither.
lint takes the raw deltas and decides what's acceptable. Tolerances — the coverage percent or millimetre shift that fails a job — live in lint as policy, never in collate. The objective layer states the numbers; policy judges them.
lens renders the result for a prepress operator — per-ink separations, the difference image, and the spots that moved. The visual inspection that turns a measured delta into a confident decision.
Built for prepress teams and web-to-print platforms that need to know exactly where a file differs from the approved proof — without an engine guessing at the verdict for them.
Align a set of decoded separation plates (1-bit TIFF / Esko LEN) against an approved 1-up PDF. Collate reports per-ink coverage and geometry deltas and which separations are missing or extra — with an optional per-ink visual difference image.
Collate's “global vision” compare: one or more candidate PDFs against one reference — a single 1-up or a stepped, gang, or imposed sheet. Inks auto-align and each candidate is mapped to the reference instance it best matches.
Collate ships coverage_delta_percent, geometry_delta_mm, and presence ∈ {both, plate_only, pdf_only}. No tolerance constant, no coverage_mismatch flag, no overall_match roll-up. It measures; lint decides whether a delta is out of tolerance.
Document compares carry a per-candidate match_score — a raw similarity in [0, 1] from the mean coverage delta over shared inks, minus a small penalty per missing or extra separation. A number, not a judgement: whether a score is acceptable is policy's call.
Errors are Problem Details (application/problem+json), never a stack trace. When Ghostscript is unavailable the PDF side self-skips — plate-side facts plus pdf_rendered: false and a note — so a consumer never mistakes a non-render for a clean compare.
AGPL-3.0 OSS you can run on Docker with Ghostscript, or call the in-process client straight from lint — one call shape whether collate is a sidecar or a library. Or use managed Print With Synergy hosting; same engine, managed and metered.
Where it fits
codex extracts the facts, collate measures the differences, lint owns the tolerances and verdicts, lens shows the result. Collate stays strictly in the objective layer — so the numbers it reports are never coloured by someone else's policy.
Collate owns no raster primitives of its own. It reuses codex for plate decode, the Ghostscript tiffsep separation render, ink normalization, and the Pantone catalogue. codex stays the extraction layer; collate is comparison built on top — no duplicated decode/coverage code to drift.
Collate never carries a coverage or millimetre threshold. It states the deltas; lint applies its LPDF_PLATE_CMP_* policy and returns the pass/fail. A non-render floors to INCONCLUSIVE in lint, not here. That clean split — measure here, judge there — is exactly why collate exists.
The neutral-facts contract (CollateCompareResult) is versioned by COMPARE_SCHEMA_VERSION, independent of the engine version. Additive fields don't bump the schema, so a downstream consumer can read new signals without a breaking upgrade.
The CollateClient is HTTP-first with an in-process fallback, so lint uses one call shape whether collate runs as a sidecar service or a library on the same host. Compare over HTTP in the cloud, or in process where they share a deployment.
Pricing
Run the comparison engine as a managed hosted service, or self-host the open source — same engine, you pick who runs it.
AGPL-3.0 · your infrastructure
Run the whole comparison engine yourself on your own Docker host. No quotas, no per-compare fees — ever.
Get the source →Pros
Cons
metered on the managed platform
Add managed comparison to your workspace. We run the service, Ghostscript, and the scaling — you call the API.
Compare with usPros
Cons
The open-source edition is AGPL-3.0 and free forever. Managed pricing and any metered rates are shown when you connect a workspace.
Open source · managed hosting
A toolkit of focused, standalone PDF utilities — extraction, preflight, viewing, assembly, imposition planning, and an asset store. Each one plugs into the prepress workflow you already run. Use the open source yourself, or let us host any single tool for you on host.withsynergy.io.
Structured PDF extraction API that turns complex files into consistent JSON.
Objective file-comparison engine — measured coverage and geometry differences for plate ↔ 1-up and document ↔ document, never a verdict.
Detection-only PDF preflight engine — 500+ checks plus the PDF/X-4 conformance suite.
Embeddable PDF viewer with separations, TAC, layers, and annotation overlays.
GWG 2022 conformance assay — benchmark a preflight engine against the spec.
Content-addressed digital-asset plane — versioned blobs, a presigned data plane, and on-prem agent recall.
The print-data integration hub — canonical jobs, orders, and customers kept in sync across your MIS, ERP, and prepress tools.