15 messages in com.mysql.lists.mysqlRE: MySQL as document storage?
FromSent OnAttachments
Steve Folly07 Jan 2004 12:55 
Ian O'Rourke07 Jan 2004 13:06 
Leonardo Javier Belén07 Jan 2004 13:20 
Japheth Cleaver07 Jan 2004 13:35 
col...@dreamwerx.net07 Jan 2004 13:51 
Richard Davies07 Jan 2004 14:21 
Rob Brahier07 Jan 2004 14:24 
Steve Folly07 Jan 2004 15:13 
Steve Folly07 Jan 2004 15:18 
Colbey07 Jan 2004 15:24 
Andy Eastham08 Jan 2004 02:08 
Richard Davies08 Jan 2004 06:47 
Troy T. Hall08 Jan 2004 07:25 
Fred van Engen08 Jan 2004 08:49 
Richard Davies08 Jan 2004 09:47 
Subject:RE: MySQL as document storage?
From:Rob Brahier (ra@custom-mobility.com)
Date:01/07/2004 02:24:08 PM
List:com.mysql.lists.mysql

I made an online file manager using PHP and MySQL some years ago, and am now embedding something similar into my office's database front-end. I decided to store our files in the file system rather than the database in order to keep the DB size low. A benefit of this is it takes less time to restore a backup of the database than it would if I were dealing with the extra gigabytes of embedded files (which I can restore on an individual basis).

As for indexing, a lot of the document retrieval solutions out there just go by metadata when you do a file search. Business class scanning systems offer you the option of embedding user-supplied metadata in your scanned files so adding your own keywords is an option. In my experience you are better off going by just some supplied keywords and metadata rather than the full text of a document because you end up with more relevant results. The exception to this is when you are just dying to know how many documents contain the word "pie". If you find that this is the case then you obviously have the free time needed to build some extra indexes... ;)

Thanks,

-----Original Message----- From: Steve Folly [mailto:mys@spfweb.co.uk] Sent: Wednesday, January 07, 2004 3:56 PM To: MySQL MySQL Subject: MySQL as document storage?

Hi,

(disclaimer - this thread could easily go off topic; I'm interested only in the MySQL aspects of what follows...)

At work we are currently investigating ways of filing all our electronic documents.

There is commercial software that will do this I know, but I was wondering whether MySQL would be suitable for this type of thing.

The 'documents' could be literally any binary file. My idea would be to create a table with a blob column for the document itself, and document title, reference number, keywords, other meta-data. And a web-based front-end to search and serve documents.

Although the documents could be any file, the majority would be textual documents (Word documents, PDF, etc). How would one go about indexing such data, since full text searches operate on textual columns?

How to cope with columns exceeding the max packet length? Why is there a max_packet_length setting; surely this is low-level stuff that shouldn't affect query and result sizes?

Is storing the actual documents in the database such a good idea anyway? Perhaps store the file in a file system somwhere and just store the filename?

If anyone has experience in doing (or been dissuaded from doing) this kind of application your thoughts and comments would be appreciated. (If only to tell me "don't be so stupid, it'll never work" :)

Thanks.