Home > mailing lists

Re: Populating large DB from Perl script - Mailing list pgsql-general

From	andy
Subject	Re: Populating large DB from Perl script
Date	November 2, 2007 17:26:09
Msg-id	472B5DAB.1020306@squeakycode.net Whole thread Raw
In response to	Populating large DB from Perl script ("Kynn Jones" <kynnjo@gmail.com>)
Responses	Re: Populating large DB from Perl script ("Kynn Jones" <kynnjo@gmail.com>)
List	pgsql-general

Tree view

Kynn Jones wrote:
> Hi.  This is a recurrent problem that I have not been able to find a
> good solution for.  I have  large database that needs to be built from
> scratch roughly once every month.  I use a Perl script to do this.
>
> The tables are very large, so I avoid as much as possible using
> in-memory data structures, and instead I rely heavily on temporary
> flat files.
>
> The problem is the population of tables that refer to "internal" IDs
> on other tables.  By "internal" I mean IDs that have no meaning
> external to the database; they exist only to enable relational
> referencing.  They are always defined as serial integers.  So the
> script either must create and keep track of them, or it must populate
> the database in stages, letting Pg assign the serial IDs, and query
> the database for these IDs during subsequent stages.
>
> I have solved this general problem in various ways, all of them
> unwieldy (in the latest version, the script generates the serial ids
> and uses Perl's so-called "tied hashes" to retrieve them when needed).
>
> But it occurred to me that this is a generic enough problem, and that
> I'm probably re-inventing a thoroughly invented wheel.  Are there
> standard techniques or resources or Pg capabilities to deal with this
> sort of situation?
>
> TIA!
>
> kj
>

(Sorry if this double posts, I wasn't subscribed the first time)

I have done this exact same thing.  I started with tied hashes, and even
tried BerkeleyDB.  They only helped up to a point, where they got so big
(a couple gig if I recall correctly) they actually slowed things down.
In the end I used a stored proc to do the lookup and insert.  In the
beginning its not as fast, but by the time the db hits 20 gig its still
going strong, where my BerkeleyDB was becoming painful slow.  (I
recently thought of trying a sqlite table, I've had good luck with them,
they can get pretty big and still be very fast... but never got around
to trying it.)

So... not really an answer (other than I used a stored proc), but I'd be
interested in alternatives too.

-Andy

pgsql-general by date:

From: paul rivers
Date: 02 November 2007, 17:19:28
Subject: Re: young guy wanting (Postgres DBA) ammo

From: Erik Jones
Date: 02 November 2007, 17:50:20
Subject: Re: Base Backups from PITR Standby

Re: Populating large DB from Perl script - Mailing list pgsql-general

Previous

Next