Kyle R . Burton on Wed, 12 Jun 2002 09:38:37 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Optimizatoins? MySQL, 1.2Gb tables, lots of random selects...


> Depending on the specifics of the application, it may make more sense to
> store your data structure right on the filesystem
> (/path/to/mydata/PRIMARYKEY or similar).  I've found that reading a few
> kilobytes from a blob is quite slow: overhead in translating from
> BDB/InnoDB/whatever to raw data, then transferring it to your app. 
> You'll probably waste some unused inode space because of your small file
> sizes, but it might be worth it.  However, I could certainly see that a
> directory with a million files would cause many problems of its own.

Unfortunatly we're using the SQL database as an easy way to help acheive
some parallelization and distribution of work load.  Direct file system 
storage won't work for us right now.

What we're doing is a matching exercise between multiple databases.  We load
the data records into a binary format that is convienient for our software
to use, and store them in a database.  We have a seperate process that 
produces candidates of record ids that are likely to match, then we hand out
the candidate sets to our server farm to perform the exhaustive matching
(which is a CPU intensive process).  The boxes that do the matching then
pull the records from the mysql databases, do the matching and insert 
consolidated entities in an output database.  Our candidate sets range in
size anywhere from 1 to a few hundred records.  The nature of the candidate
set selection basicly means that all of the record retreival will be random
access to the databases.

> How are connections to the database handled?  Does each lookup call for
> a new connection, or are persistent shared connections (like in PHP)
> used?  This makes a big difference in performance.

They're persistient.  Once the worker process connects to the database, the
connection stays open.

> > I'm certainly not a performance guru, but the only that comes to mind
> > off the top of my head is that it might be a bit more efficient to
> > store the the primary key as a char 15 instead of a varchar.
> 
> Though I doubt that the suggestion above would be very effective alone,
> if you were to use my suggestion above about moving the blob data into
> the filesystem and then use a fixed-length primary key, MySQL would be
> able to store each row with a fixed length.  This should increase
> performance quite a bit (faster seeking to desired record).
> 
> Then again, if you were to store the blob data as files, I guess there'd
> be little need for MySQL.


Thanks for the advice.

Kyle R. Burton



-- 

------------------------------------------------------------------------------
Wisdom and Compassion are inseparable.
        -- Christmas Humphreys
mortis@voicenet.com                            http://www.voicenet.com/~mortis
------------------------------------------------------------------------------

______________________________________________________________________
Philadelphia Linux Users Group       -      http://www.phillylinux.org
Announcements-http://lists.phillylinux.org/mail/listinfo/plug-announce
General Discussion  -  http://lists.phillylinux.org/mail/listinfo/plug