Finding image redactions in PDFs


#1

Hi everyone,

For approx. 1000 image based PDFs I am trying to explore if there’s a machine based process for identifying redactions made to the image files - here understood as a blacked out image space.

Think for example of redactions in Freedom of Information requests: https://www.muckrock.com/news/archives/2016/mar/14/muckrocks-redaction-hall-shame/

Any suggestions would be much welcome.

Best,
Anders


#2

Here is the process for redacting documents via Adobe Acrobat DC. Might be useful for thinking through how the PDFs were created. In terms of identifying redaction boxes I asked around for you, but still waiting for an answer:


#3