I'm not sure where to put this - I figured this was the best place. And this is a shot in the dark. I am working on my laptop which runs Ubuntu 9.10.
I have a series of PDF files I need to extract graphics from. There are a couple hundred - I know I can do it manually. But I was curious if there was a way to run a script that would extract the pages I need either in TIFF or PDF.
Basically I need to find any page with the date DEC 01/07 or OCT 01/08 and extract those pages? Any idea? I'll settle for getting it in PDF or TIFF. I have to go through and crop them anyways so I'm not too picky.
I would use something like PDFToolkit to split them into individual pages and then grep for those two blocks of text. If the date is part of an embedded image, say a datestamp on a photograph inside the PDF, then things get complicated and you're looking at OCR experiments that probably wont work.
Perfect, I searched and searched and couldn't figure anything out. The Dates are at the bottom of a page that has a graphic on it. Basically I have to extract the pages into TIFF and then crop them. That should do it well. Appreciate it.