Finding image redactions in PDFs

anderspeders · April 1, 2017, 6:35pm

Hi everyone,

For approx. 1000 image based PDFs I am trying to explore if there’s a machine based process for identifying redactions made to the image files - here understood as a blacked out image space.

Think for example of redactions in Freedom of Information requests: MuckRock's FOIA redaction hall of shame • MuckRock

Any suggestions would be much welcome.

Best,
Anders

todrobbins · April 6, 2017, 9:37pm

Here is the process for redacting documents via Adobe Acrobat DC. Might be useful for thinking through how the PDFs were created. In terms of identifying redaction boxes I asked around for you, but still waiting for an answer:

https://twitter.com/todrobbins/status/850099962918559744

todrobbins · April 6, 2017, 10:20pm

https://twitter.com/k_grons/status/850107082246369280
https://twitter.com/k_grons/status/850109330552705025