JP Vossen on 2 Apr 2010 12:28:42 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] postgres data loading


> Date: Fri, 2 Apr 2010 14:53:54 -0400
> From: "John Karr" <brainbuz@brainbuz.org>
> 
> I'm working on a database that involves periodic replacement of a 10 million
> record database from an external source. 
> 
> The postgres bulk load (copy from) is very sensitive to errors and crashes
> the entire operation for just one error in an operation, so my import script
> adds records one at a time so I can deal with or ignore failures. 
> 
> On my test machine it takes nearly a day to load the database this way. I
> tried using transactions, but only got about a 30% speed boost at the cost
> of a single error in a batch crashing the whole batch. 
> 
> Any ideas on how to do this load faster? Normally this load would occur
> about once every 3 months, but in times peak activity it could be weekly and
> it wouldn't be acceptable to have the application down for a day for every
> load.

I'm *not* a DBA nor do I play one on TV.  Having said that...

First, validate your data before you attempt the import.  There's gotta 
be a way to do a quick Perl (Python, whatever) script to sanity check, 
then either remove those records or fail before the import so you can 
fix it.  (I may be able to help with that if needed.)

Second, when I have to do similar things at work (which real DBAs help 
me with :), I sometimes load into a temp table, then do the swap inside 
the DB really fast.  So you can mess around with a bulk load into a temp 
table and if it crashes, so what?  Keep trying until the temp table is 
correctly loaded, then do the swap inside the DB with a stored procedure 
or something.

Good luck,
JP
----------------------------|:::======|-------------------------------
JP Vossen, CISSP            |:::======|      http://bashcookbook.com/
My Account, My Opinions     |=========|      http://www.jpsdomain.org/
----------------------------|=========|-------------------------------
"Microsoft Tax" = the additional hardware & yearly fees for the add-on
software required to protect Windows from its own poorly designed and
implemented self, while the overhead incidentally flattens Moore's Law.
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug