Re: [PLUG] postgres data loading

Walt Mankowski on 2 Apr 2010 14:44:26 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] postgres data loading

From: Walt Mankowski <waltman@pobox.com>

To: plug@lists.phillylinux.org

Subject: Re: [PLUG] postgres data loading

Date: Fri, 2 Apr 2010 17:44:21 -0400

Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>

Sender: plug-bounces@lists.phillylinux.org

User-agent: Mutt/1.5.20 (2009-06-14)

On Fri, Apr 02, 2010 at 04:04:36PM -0400, John Karr wrote: > My preprocessor is pretty good right now, (I crunch 153 fields to 67 that > are split to 6 smaller tables and make a bunch of corrections) the main > error I get is duplicate records within the dump, which is fixable. But no > matter how tight I can make my preprocessor I know that the data source will > find a new error to throw at me, and if a few records out of 10 million > don't import I can ignore the problem, 1 bad record can stop an entire > County like Philadelphia or Allegheny from importing (if I break it into > arbitrary batches of 10,000 what about the other 9,999 records), so I > strongly prefer an import method that isn't broken by a few bad records. Could you split the input by county or zip code and then bulk import each file separately? That way you'd at least have a smaller file to try to fix. If not, it seems to me the best method (suggested already) is to try to do a bulk load to a test database. When you find errors, fix them there and try again. Presumably the batch load is fast, so you could go through several rounds of debugging and fixing errors in the time it's currently taking you to add it a record at a time. Then when it's ready, loading it into production should be easy. Walt
Attachment: signature.asc
Description: Digital signature

___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug

References:

[PLUG] postgres data loading
From: "John Karr" <brainbuz@brainbuz.org>

Re: [PLUG] postgres data loading
From: Eric <eric@lucii.org>

Re: [PLUG] postgres data loading
From: "John Karr" <brainbuz@brainbuz.org>

Prev by Date: Re: [PLUG] postgres data loading

Next by Date: Re: [PLUG] postgres data loading

Previous by thread: Re: [PLUG] postgres data loading

Next by thread: Re: [PLUG] postgres data loading

Index(es):

Date

Thread