Monday, November 3, 2008

Google Making Scanned Docs Searchable Worldwide

Being able to search within your organization for scanned documents is a necessary aspect of the most successful document imaging systems. But maybe, in focusing so much on intra-organizational search, we're missing the bigger picture.

For instance, Internet giant Google recently revamped its technology in hopes of making scanned documents as searchable as any other Web page.

As this post indicates and your experience may corroborate, PDF files have been showing up in Google search results for a while now. But before, Google was merely reading the "metadata" of these pages, i.e. the tags attached to them, such as document title, keywords, etc.

But now, Google has created a solution whereby actual scanned documents are "read" by the Google search technology through a method called "Optical Character Recognition." Read about how this is being done at the Google blog, here. Right now, it only works on PDF docs.

This is a very interesting development, and perhaps a huge one. Even today, think of all the paper documents that are not searchable to the world at large. As humongous as the Web already is, it's growing by leaps and bounds every day!

We are going to have to keep a sharp eye on this developing story.

No comments:

 
http://www.blogger.com/rearrange?blogID=1022838784761333320