6 messages in com.googlegroups.google-desktop-developerRe: Saving an image as a text file
FromSent OnAttachments
Socrates11 Feb 2006 11:51 
Socrates11 Feb 2006 12:07 
Cat11 Feb 2006 16:36 
Socrates12 Feb 2006 03:35 
Tommi13 Feb 2006 11:39 
Socrates13 Feb 2006 12:27 
Subject:Re: Saving an image as a text file
From:Tommi (toma@gmail.com)
Date:02/13/2006 11:39:38 AM
List:com.googlegroups.google-desktop-developer

I took a look at the text file on your site and it seems to have a lot of repeating words, which will compress fairly well. The test that Cat did involved using a dictionary which I think will give more realistic results.

I'm very interested in knowing about your algorithm for converting an image to text. Is the original image already compressed e.g. by using some image compression algorithm such as jpeg, gif, png etc, or is it an uncompressed bmp perhaps?

One common way of converting binary data to text is by using base64 encoding. When using base64, each byte is encoded into a set of 64 letters (6-bit encoding), which means that the output is always going to be larger than the input. You can of course compress the base64 output again into binary by e.g. using zip compression, but the results are not (at least in my experience) going to be smaller than the original size if the original was already compressed.

Can you share more on the algorithm?