18 messages in com.mysql.lists.win32Re: how scalable in reality?| From | Sent On | Attachments |
|---|---|---|
| Rodrigo A. Escobar P. | 23 Aug 2002 19:16 | |
| DL Neil | 24 Aug 2002 02:19 | |
| Doru-Catalin Togea | 24 Aug 2002 13:36 | |
| leo g. divinagracia iii | 26 Aug 2002 19:49 | |
| Erko[VATEVER.ee] | 26 Aug 2002 22:22 | |
| Ryan P. Rogers | 27 Aug 2002 05:59 | |
| Iikka Meriläinen | 27 Aug 2002 07:12 | |
| Faulkingham, Colin | 27 Aug 2002 08:11 | |
| Joshua Nicholson | 27 Aug 2002 08:55 | |
| Faulkingham, Colin | 27 Aug 2002 10:50 | |
| Ryan P. Rogers | 27 Aug 2002 17:08 | |
| oceanare pte ltd | 27 Aug 2002 22:00 | |
| Faulkingham, Colin | 28 Aug 2002 05:43 | |
| Jools Chesters | 29 Aug 2002 04:15 | |
| Dave Watkinson | 29 Aug 2002 04:38 | |
| Peter Goggin | 02 Sep 2002 20:05 | |
| Support | 03 Sep 2002 06:27 | |
| Andre Moll | 03 Sep 2002 23:00 |
| Subject: | Re: how scalable in reality?![]() |
|---|---|
| From: | Iikka Meriläinen (Iikk...@pato.vaala.fi) |
| Date: | 08/27/2002 07:12:56 AM |
| List: | com.mysql.lists.win32 |
Hi,
Before planning anything further, consider one thing seriously: BLOBs are painfully slow! I suppose searching Google for any benchmarks will show the pure numbers, but the performance increase you will get using only references to OS files in the database is huge. It may be easier to manage a single database with all the data you need, but it will be quite slow for real use.
With appropriate hardware (=LOTS of RAM and dual/quad CPUs) MySQL can scale up to thousands of simultaneous users, but if you use BLOBs a lot, the performance will be a real problem.
About those "data blades" then, MySQL has no such built-in feature. I'm not aware of any 3rd party modules specifically tailored for MySQL use and this purpose. Many commercial GIS solutions use Oracle Spatial as their imagery engine. But especially if using external binary files for the real data instead of BLOBs, one could easily create a module that scans the images for contrast/color changes etc.
Regards, Iikka
On Tue, 27 Aug 2002, Ryan P. Rogers wrote:
Hi folks!
I'm considering using MySQL for an upcoming project. It is a digital production management server, which will need to scale into the terrabyte+ range. A vast majority (99%+) of the data taking up this space will be binary large objects, some of which could range to many gigabytes in size, but most "files" will be in the 1mb to 100mb range. Basically think of it as a source control system for digital media files, with version and history management, check-in/check-out, branching/merging, searching, etc. The db has to be able to run on Linux and Windows at a minimum.
Does anybody think MySQL is not up to the task here? This database won't be overly complex in model I don't think, but it will get HUGE both in total size and number of records (some files could get hundreds of versions, and remember, they are binary...tough to diff well). I plan on not using transactions, nor do we need to wait for the SP functionality. The database will be "front-ended" by a single app server, this server will handle client requests, and will be written in Java (thus, we would be using J/Connector). The server will be responsible for synchronization / transaction issues of consistent objects, thus we do not need TX support out of the db. If someday in the unlikely event we ever need to scale the appserver, it will do distributed synchronization / lock management. The thinking is we want to keep this out of the db, since the db is going to probably be the bottleneck here.
Once performance concern I have is redundant data transfer. Since we will have several client, some very thin (web) and others very thick, I want them to be very dump as to the db structure, it needs to be encapsulated into the appserver so that the API exposed tot he client is very simple. However, this does mean that when a file is requested, it will need to be transferred out of MySQL to the appserver, then from the appserver to the client. Now, if the appserver and db are two processes on the same machine, no biggie (although this still isn't great). But, if they are on different machines, we now have the data going across the network twice. Now, for scalability reasons, it would be nice to offer the ability to run the appserver and db on different machines, but I think it will be far more common that people just want to use one monster box to simplify things. Therefore, it would be desireable to run MySQL embedded, right? I'm thinking we could implement a simple C++ wrapper which embeds both the JVM and the db, so transfer of data between the two is lightning quick? Will we see a performance difference vs running two processes on the same machine? It's obviously a more complex solution, since the "wrapper" would have to be configurable to be told whether or not the MySQL should be created locally in process, or whether it exists elsewhere (so he doesn't create it), and then pass the info into the JVM via system property or something. Does anybody think this is overkill? Is MySQL embeddable like this?
On a side note, eventually we would like to be able to develop "data blades" which can inspect the data in different ways and build custom indices for special types of searching, like based on contrast, color, etc of the data in a picture for instance. Does MySQL have this data blade / plugin concept, or would this be a 100% external solution which simply points into our db by the unique identifiers?
Thanks in advance, Ryan
--------------------------------------------------------------------- Please check "http://www.mysql.com/Manual_chapter/manual_toc.html" before posting. To request this thread, e-mail win3...@lists.mysql.com
To unsubscribe, send a message to the address shown in the List-Unsubscribe header of this message. If you cannot see it, e-mail win3...@lists.mysql.com instead.
****************************************** * Iikka Meriläinen * * E-mail: Iikk...@pato.vaala.fi * * Vaala, Finland * ******************************************




