How PDF Redaction Works — Privacy vs Convenience
PDF redaction sounds simple. Put a black box over the sensitive part, save the file, done. But the gap between what looks redacted and what actually is redacted is significant — and misunderstanding it has caused real, embarrassing, and sometimes legally consequential information leaks.
This guide breaks down how PDF redaction actually works, what separates a properly redacted document from one that just looks that way, and why the most convenient redaction tools are often the least private ones.
Drawing black boxes is not redaction
A PDF is not an image. It's a structured document format that stores text, fonts, images, and layout instructions as separate layers of data. When you open a PDF in a viewer, the software renders all those layers together into what looks like a page.
When most PDF editors add a "redaction" — including many tools that use the word explicitly — what they're actually doing is adding a black rectangle to the visual layer on top of the text. The text itself remains in the file, completely intact, exactly where it was. The black box is cosmetic.
Anyone who wants to recover that text can do it trivially. Open the file in a different viewer, select all text, copy and paste it into a text editor. Search the document. Use a developer tool to inspect the raw file. The "redacted" content is still there, just hidden from casual view.
Multiple high-profile document leaks have happened exactly this way — PDFs with black boxes over sensitive text, where the text was fully recoverable because the underlying data was never actually removed.
What true redaction actually requires
For redaction to be real, the underlying data has to be destroyed — not hidden, not covered, destroyed. There is no way to do this while keeping the document as a native PDF with a live text layer. The text either exists in the file or it doesn't.
The only reliable method is to flatten the document. Each page gets rendered as a pixel image — the same way a screenshot captures what's on screen — with redaction boxes already applied as part of that image. The resulting image contains no text data at all, only pixels. That image is then used to rebuild the PDF.
This is called rasterization. The output PDF looks identical to the original except for the blacked-out areas, but it contains no text layer. There is nothing to copy, nothing to search, nothing to recover, because the text was never written into the new file — the new file contains only images of pages.
Metadata — the part most people forget
Even if you rasterize correctly, a PDF can still carry metadata from the original file — the author's name, the software used to create it, creation and modification dates, sometimes even the original document title. This information lives in the PDF's header, separate from the page content.
A properly redacted document should have this metadata stripped as well. Not because it reveals the content you redacted, but because metadata can reveal information you didn't intend to share — who created the original document, when, and with what software.
When RedactPDF builds the output file, it explicitly clears all standard metadata fields — title, author, subject, keywords, producer, creator, creation date, and modification date — before saving. The output file is clean in every sense of the word.
The upload problem — convenience at the cost of privacy
Most free online redaction tools work the same way: you upload your file, their servers process it, you download the result. This is the convenient approach, and for many documents it's probably fine. But if the document contains genuinely sensitive information, uploading it to a server you don't control creates a real privacy problem.
When you upload a document to an online tool:
- Your file travels over the internet to their servers
- It sits on their infrastructure while being processed
- It may be stored in logs, backups, or caches after processing
- Their privacy policy governs what happens to it — and you probably didn't read it
- If their servers are breached, your document could be exposed
Many tools claim they delete files immediately after processing. Some are probably telling the truth. But "probably" is not an acceptable standard when the document contains a social security number, a salary figure, a medical record, or privileged legal information. You have no way to verify what actually happens after you click upload.
The AI problem — a third party you didn't invite
AI-powered redaction tools add another layer to this problem. These tools promise to automatically detect and redact sensitive information — names, numbers, dates — without you having to identify it manually. Convenient in theory.
In practice, to analyze your document, the AI has to read it. And to read it, your document gets sent to a third-party AI model — typically an API from a major provider. Your document has now left the redaction tool's servers and traveled to an entirely separate company's infrastructure.
Did you read the AI provider's privacy policy? Did you read the redaction tool's terms to understand which AI provider they're using and what that provider's data retention policy is? Is your document being used to train future AI models? You almost certainly don't know the answer to any of these questions. The redaction tool's privacy policy might not even tell you — it depends on their agreement with the AI provider, which you're not party to.
This isn't a hypothetical concern. It's a structural problem with any tool that processes your data through a third-party AI API. The convenience of automatic detection comes with a chain of data exposure you can't fully audit.
How RedactPDF approaches this differently
RedactPDF processes everything locally in your browser. Your file is never uploaded anywhere. When you open a PDF, it loads directly into your browser's memory using open source libraries. When you draw redaction boxes and download the result, the entire process happens on your device — no network requests, no servers, no third parties.
You can verify this yourself. Load the site, then disconnect from WiFi. The tool still works completely. Or open your browser's developer tools, go to the Network tab, and use the tool — you'll see zero outbound requests carrying your document data. There is nothing to intercept because nothing leaves your machine.
The redaction itself uses true rasterization — each page is rendered to a PNG image with your redactions burned in as pixels, then those images are assembled into a new PDF. The text layer from the original document is never written into the output file. There is no text to recover.
Metadata is explicitly stripped from the output — all standard fields are cleared before the file is saved to your device.
The tradeoff is that you identify what to redact yourself. There's no AI scanning your document for sensitive patterns. For most personal use cases — redacting a salary, an account number, a name — that takes thirty seconds and is not a meaningful burden. For the use cases where AI detection would matter most, the privacy cost of using an AI tool is probably not worth it anyway.
Privacy and convenience rarely coexist. Most tools that make redaction easier do so by taking custody of your document. The only way to keep a document truly private during redaction is to never let it leave your device.
Redact privately — everything stays on your device
No upload, no AI, no third parties. Free, open source, works offline.
Open RedactPDF →