1025 Monarch Street Suite 170 Beaumont Centre Lexington, KY 40513 tel. 859.252.6225 fax 859.252.6528 Exploring the Best File Formats for Document Management H ello, my name is Sarah Smith and I am part of the ISV Business Development team at Fujitsu Computer Products of America. My primary role as an ISV BD manager is to work with ISVs like TrinSoft in order to ensure that we have a seamless solution between our scanners and their software. I have learned quite a bit about document imaging and document management working in this role over the years and found that sometimes my greatest value is sharing what I have learned. That being said, TrinSoft has asked me to be a guest writer and proposed the topic of PDF vs. TIFF in terms of strengths and weaknesses. Apparently there are a lot of questions out there as to which format is better for ECM (Enterprise Content Management) solutions. I have done a little research and have also asked my ECM guru, Pam Doyle, in order to gather information and answer this question. So, let’s start with the history of the two formats. TIFF was created in the 1980s in order to create a standard file format for the storage of scanned images. It was originally created by Aldus Corporation, but is not controlled by Adobe. When we refer to TIFF format in document imaging there are two formats being used. There is Group4, which is the standard TIFF format used for bitonal (black and white) images, and there is JPEG, which comes into play when dealing with color images. PDF was designed by Adobe in 1993 and became an open standard that was officially published in July of 2008 by the ISO (International Standard for Organization). PDF is your standard PDF file; PDF/A is a format approved by AIIM and ISO and was designed for the long-term archiving of electronic documents. What that means is that PDF/A documents will be able to be reproduced exactly the same way in years to come. The PDF/A standard will not go away. TIFF has been the favored format in document imaging as long as document imaging has existed, but it appears as though that might be changing. It might be due to the fact that the PDF format was developed to improve on TIFF. It provides the ability to store both image and text and has an easy migration path to long-term preservation via PDF/A. So getting back to the specifics, I picked out what I think are the most significant features regarding the two file formats. File Size– Probably one of the biggest concerns being that some users are looking to store massive amounts of information in their ECM or document management solution. In looking at file size we are going to consider TIFF Group 4 (standard TIFF format for scanned, bitonal images), JPEG (standard TIFF file format for scanner color images), Image only PDFs and Searchable PDFs (a PDF that has been run through an OCR process that takes the PDF from being just an image but an image with recognizable text). Typically a TIFF Group 4 image will be your smallest file type and a searchable PDF will be your largest. When it comes to color documents, a PDF will be smaller than a JPEG. This is because there is a compression function when scanning to PDF in color (JPEG 2000) that does not exist when scanning to JPEG. Searchability- When looking at searchability you have two options, do you want the entire document searchable or only keywords? In an environment where scanning is a major part of the organization’s business process, they will only want to store key bits of information related to the document in order to create rhyme and reason to the way these documents will be searched and retrieved. If this sounds like a method that your organization will like, then TIFF or PDF Image Only is going to be the way to go. The other option is to OCR the entire document so that you can search and find the document based on any word that is on it. This would be a searchable PDF. Now of course that sounds like the easiest way to go, but understand that it is not a very organized, methodical way for searching or retrieving documents on a larger scale. If you are looking at doing this for your personal records like bills and change of address forms…go for it. But remember that searchable PDFs are large files so they are going to take up a lot of space. Continued on page 2 www.trindocs.com trindocs.com Exploring the Best File Formats for Document Management - Page 2 Metadata- Think of metadata as being a very important aspect of searchability. Metadata is also known as index information and it is key information related to the document that will be used to store and search for that document. Think of a folder that has a bunch of documents in it all related to Sarah Smith and her employment with FCPA. On the top of that folder is likely to be a tab with the words Sarah Smith HR. Those words, Sarah Smith HR, would be the metadata, or index information, for those documents. A more typical example would be an invoice from a vendor. When indexing an invoice one would typically chose to isolate or identify the date, the name of the vendor, the invoice number and maybe the dollar amount of the invoice. When it comes to automatically indexing documents like high-end capture solutions do, TIFF is the preferred format. If you plan on manually indexing these documents, both TIFF and PDF will allow it. Just know that if you plan on making all of your documents searchable PDFs (again, not a recommend method to document imaging), you can also assign metadata to that PDF but it is kind of pointless and redundant. Viewing- If the documents that are being scanned are going to be shared with people outside of your department or organization, you are going to want them to be in a file format the person can open. Both TIFF and PDF have widely available viewers, but remember that your average person who has no experience when it comes to document imaging will know what a PDF is and know exactly how to open it. The chances of them knowing what a TIFF is or how to open it are less likely. Also, if you have documents that have multiple pages within them, you are less likely to run into issues viewing multi-page PDFs vs. multi-page TIFFs. Below is a high level overview of each file format we have discussed. TIFF- This format is great for black and white documents, smaller file size and very specific metadata or index information. Users that are scanning a lot of documents and have a lot of automated processes will usually go with the TIFF format. Also note that most advanced capture solutions are optimized for the TIFF format. JPEG- It is a standard color format that is widely known and used, but is a large format due to the color. The point here is to only scan documents in color when really needed. If the color is not important, allow the scanner to scan in black and white. PDF and PDF/A - It is becoming more popular to see organizations leverage this file format for their document management solution. It might be a slightly larger file size, but almost anyone will be able to open and view the file. PDF files support both color and black and white. PDF also supports metadata/indexing and it can also be very secure. You can assign passwords to open it and if any changes are every made to the PDF, those changes will be noted in the document properties. (TIFF does not have this capability.) Searchable PDF– This is a great format if you want to edit the document or extract a lot of information beyond metadata. Just remember that running an OCR engine takes time and can bog down a system. Also remember that searchable PDFs are large files. There really isn’t a right or wrong file format when it comes to scanning images, it is just a matter of understanding the strengths and weaknesses of each format in order to understand which ones to use when. If you have further questions on this matter or any questions related to scanning documents please feel free to contact me. We have lots of experts here at Fujitsu and are happy to share our knowledge where we can. Thank you for taking the time to read this article. Sarah Smith Fujitsu Computer Products of America ISV Business Development [email protected] trindocs.com
© Copyright 2026 Paperzz