Re: Populating large DB from Perl script - Mailing list pgsql-general

From Dawid Kuroczko
Subject Re: Populating large DB from Perl script
Date
Msg-id 758d5e7f0711021157g7eca84e8y7c682ba83e86cb04@mail.gmail.com
Whole thread Raw
In response to Populating large DB from Perl script  ("Kynn Jones" <kynnjo@gmail.com>)
List pgsql-general
On 11/1/07, Kynn Jones <kynnjo@gmail.com> wrote:
> Hi.  This is a recurrent problem that I have not been able to find a
> good solution for.  I have  large database that needs to be built from
> scratch roughly once every month.  I use a Perl script to do this.
>
> The tables are very large, so I avoid as much as possible using
> in-memory data structures, and instead I rely heavily on temporary
> flat files.
>
> The problem is the population of tables that refer to "internal" IDs
> on other tables.  By "internal" I mean IDs that have no meaning
> external to the database; they exist only to enable relational
> referencing.  They are always defined as serial integers.  So the
> script either must create and keep track of them, or it must populate
> the database in stages, letting Pg assign the serial IDs, and query
> the database for these IDs during subsequent stages.

If it is possible, perhaps you could load "raw" data into temporary
table and then create ids using these tables.
For instance:
CREATE TEMP TABLE foo_raw (host text, city text, who text, value int);
INSERT INTO hosts (host) SELECT DISTINCT host FROM foo; -- group by perhaps?
INSERT INTO [...]
INSERT INTO foo SELECT host_id,city_id,who_id,value
   FROM foo_raw
   JOIN hosts USING (host)
   JOIN cities USING (city)
   JOIN who USING (who);
This may or may not work, depending on your setup.

But perhaps a better approach, while needing more work would be:

Your script establishes two DB connections, one for "processing data"
and one for maintaining IDs.  Now whenever you need to get and ID do:
  1) query memcached if found, return it
  2) query database if found return it and insert into memcached
  3) insert into database, and insert into memcached, and perhaps commit it.
Befriend thyself with Cache::* perl modules. :)

   Regards,
      Dawid

pgsql-general by date:

Previous
From: "Kynn Jones"
Date:
Subject: Re: Populating large DB from Perl script
Next
From: andy
Date:
Subject: Re: Populating large DB from Perl script