24 Feb, 2010, Keirath wrote in the 1st comment:

Votes: 0

I'm not sure where to put this - I figured this was the best place. And this is a shot in the dark. I am working on my laptop which runs Ubuntu 9.10.

I have a series of PDF files I need to extract graphics from. There are a couple hundred - I know I can do it manually. But I was curious if there was a way to run a script that would extract the pages I need either in TIFF or PDF.

Basically I need to find any page with the date DEC 01/07 or OCT 01/08 and extract those pages? Any idea? I'll settle for getting it in PDF or TIFF. I have to go through and crop them anyways so I'm not too picky.

24 Feb, 2010, Barm wrote in the 2nd comment:

Votes: 0

I'm assuming you don't mean 'graphics' as in actual image files but instead single pages of a long PDF file?

Here's a screencast of someone using grep with PDF files:
http://www.linuxjournal.com/video/search...

I would use something like PDFToolkit to split them into individual pages and then grep for those two blocks of text. If the date is part of an embedded image, say a datestamp on a photograph inside the PDF, then things get complicated and you're looking at OCR experiments that probably wont work.

24 Feb, 2010, Keirath wrote in the 3rd comment:

Votes: 0

Perfect, I searched and searched and couldn't figure anything out. The Dates are at the bottom of a page that has a graphic on it. Basically I have to extract the pages into TIFF and then crop them. That should do it well. Appreciate it.

CoffeeMud

Clockwork

Duat