Deterministic PDF
Jump to navigation
Jump to search
Summary
Can a PDF be created in a 'deterministic' way, so that the output file is 'byte-for-byte identical'
(i.e. so they would have the same hash value)?
Why or why not?
Other initial thoughts
- Even if the 'format' / specification says this is possible, only some tools may actually be able to accomplish it.
- If it is not possible, can the content of a PDF (all PDFs or only a specific subset of versions / types) be exported and 'normalized' so that duplicates can be identified and similar versions can be easily compared for substantive differences (ideally automatically / without human intervention)?
- Regardless, a simple computer desktop utility could be written to graphically show the metadata (internal dates, content IDs and other 'automatic' metadata, filesystem metadata such as dates and size, and user-entered metadata such as titles and creator names) and other derived / calculated info (e.g. 'content hashes', number of pages / words / images / etc., page size / orientations, image resolutions, et. al.) of two or more (6 or less is probably a practical limit from a UX perspective) PDF files.
Details
Variable data
- Compression algorithms
- PDF metadata / internals IDs
- Order of dictionary items
- ... ?
See also
References
Notes
[Template fetch failed for https://en.wikipedia.org/wiki/Template:Template:cnote?action=render: HTTP 404] [Template fetch failed for https://en.wikipedia.org/wiki/Template:Template:Notelist?action=render: HTTP 404]
References
Wikipedia: Reflist Wikipedia: Preboot Execution Environment