![]() ![]() Pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. pdf files which contain only images (no text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. Pdfsandwich generates "sandwich" OCR pdf files, i.e. Validating the claim is much harder, but doesn’t need to do that.Pdfsandwich pdfsandwich: A tool to make "sandwich" OCR pdf files Probably easiest to install xpdf (using brew, if you have that), and call pdfinfo from an applescript. Next, a file encoded with one of the versions of pdf/a ~ % pdfinfo -meta pdf3.pdfĪpplication/pdfUnknown TitleABBYY FineReader PDF For MacABBYY FineReader PDF For Macuuid:00002A8F-0281-78AE-1219-662C1A4CEFBBPDF/UA Universal Accessibility Schema, which part of ISO 14289 standard is followed1ĭetection of the pdf/a claim means scanning the metadata for pdfaSchema tags. Page size: 515.5 x 696.5 pts (rotated 0 degrees) I generated some pdf/a files using abbyy finereader, and used the command line tool pdfinfo to look at the result.įirst, plain old pdf ~ % pdfinfo -meta pdfnoa.pdf I already tried many command line tools to extract infos from the files, but the format (PDF/X, PDF/A, …) couldn’t be read. It would be a great simplification if I could recognize in DT that it is a PDF/A and that I didn’t have to first check with Adobe Acrobat whether it is one. Another way was to save them with Adobe Acrobat. ![]() Until now i used an automator workflow to “flatten” these files before importing them into Devonthink and annotating was possible. If they are PDF/A with every first annotation the text isn’t “readable” anymore (all characters are “?”). I need to annotate these files (or a copy of them) in DT. Does anyone have a solution or hint?īackground: I receive many PDF/A files from other people (Unfortunately, these are almost always not marked as PDF/A or non-PDF/A). Probably some sort of code could be written (AppleScript, Python, Perl, … whatever) to be integrated with DEVONthink using the specification for PDF vs. Others can comment on both.Ĭould be some way in DEVONthink that I don’t know about. Both files with OCR info (PDF+Text as labeled in DEVONthink) of course grew in size.Ĭould be some way in DEVONthink that I don’t know about. Both files could be OCR-ed with DEVONthink’s tool for OCR into new files. ![]() I created a PDF and PDF/A of a 1-page document using a Brother ADS-1700W. Re the OCR question, I ran an experiment. Probably can write your own tool if you know the content spec of the PDF/A and PDF (probably published somewhere). Internet searching came up with tools available from Adobe and others. In Preview, PDF/A documents are not editable ( as intended by the PDF/A standard, but I don’t know where that “enforced”. “info” in Finder, that can distinguish between PDF and PDF/A. I cannot find anything in DEVONthink or in Mac OSX, e.g. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |