Document Optimisation Demonstration
A raw PDF file, that is scanned directly from an image, contains a lot of noise
and background data that is really not neccessary for a user's purpose. The file
size can be considerably high for a raw PDF file. In which case, the PDF
document can be optimised to eliminate al noise, speckles and background images / information. the following image shows one of the properties of optimisation:
As a rule, if it is foreseen that the electronic documents will not go into a print media and is solely for on screen reading, optimisation can greatly reduce certain overheads such as file size etc. As you can see, the edges of the fonts on the optimised image is much more clearly defined than the one without the optimisation. It should also be noted that the file size for the optimised PDF was reduced to almost four times its original size. These factors although, would be dependant on the image itsself. A more colourful document might get a different result. The files are available for viewing and download via the links below:
PDF without Optimisation - Original scan from sample document [ File Size: 115 KB ]
PDF with Optimisation - Reduced File Size / Optimised Image Quality [ File Size: 30 KB ]
Notes on Test Conditions
Standard A4 recycled paper was used since the purpose of this demonstration is to exhibit the capabilities of OCRing. The scan was conducted at a resolution of 300DPI with image settings of Black and White / Text. You will be needing the latest version of Adobe Reader to view the above PDF documents.