I took a look at the text file on your site and it seems to have a lot
of repeating words, which will compress fairly well. The test that Cat
did involved using a dictionary which I think will give more realistic
results.
I'm very interested in knowing about your algorithm for converting an
image to text. Is the original image already compressed e.g. by using
some image compression algorithm such as jpeg, gif, png etc, or is it
an uncompressed bmp perhaps?
One common way of converting binary data to text is by using base64
encoding. When using base64, each byte is encoded into a set of 64
letters (6-bit encoding), which means that the output is always going
to be larger than the input. You can of course compress the base64
output again into binary by e.g. using zip compression, but the results
are not (at least in my experience) going to be smaller than the
original size if the original was already compressed.
Can you share more on the algorithm?