How Split PDF is built
These are the engineering notes for Split PDF: the technologies it is built on, what each one is, and how it is used in the tool.
Tech used
The PDF document model
A PDF is not a flat image of pages — it is a tree of objects: page objects that reference shared resources (fonts, images, content streams). Splitting one isn’t slicing bytes at a page boundary; it’s copying the selected page objects, together with everything they reference, into a new document. That’s why the tool uses a library that understands the object graph rather than cutting the file.
pdf-lib
The work is done by pdf-lib, a pure-JavaScript PDF library. The source is parsed with PDFDocument.load(bytes) (from file.arrayBuffer()), and getPageCount() reports the page total. To extract, a fresh document is made with PDFDocument.create(); the chosen pages are deep-copied across with out.copyPages(src, indices) — which returns page objects detached from the source and bound to the new document — each is appended with out.addPage(p), and out.save() serializes the result to a Uint8Array wrapped in a Blob.
Parsing the page range
The selection is a typed range like 1-3, 5. A small parser turns that 1-based spec into ascending, de-duplicated 0-based indices (via a Set): it accepts singletons and ranges, normalizes a descending range, and rejects anything malformed or out of range before any pages are touched.
Shell
Same static Astro + Preact island and Service-Worker PWA shell as the other tools (see the HEIC notes). pdf-lib runs on the main thread — there is no Web Worker, and, because the tool has no page-thumbnail rendering, no pdf.js either.
Implementation & operational notes
A fresh document, so it opens anywhere. Because the output is a new PDFDocument holding only the copied pages, it is a clean, valid PDF. The trade-off: source document-level metadata (title, author) isn’t carried over — only the pages are.
Encrypted PDFs are reported, not cracked. A load failure whose message matches /encrypt/i is surfaced as “password-protected”; any other failure as “not a readable PDF.” pdf-lib does not decrypt.
In memory. The whole file is read into memory and the output is built there before download (an object URL plus a synthetic <a download>), so very large PDFs are bounded by available RAM; there is no streaming.
Try it / source
- Tool: Split PDF
- Source: github.com/GeppettoAndRomero