Home > mailing lists

Re: Populating large DB from Perl script - Mailing list pgsql-general

From	andy
Subject	Re: Populating large DB from Perl script
Date	November 2, 2007 20:33:34
Msg-id	472B8990.4050700@squeakycode.net Whole thread Raw
In response to	Re: Populating large DB from Perl script ("Kynn Jones" <kynnjo@gmail.com>)
List	pgsql-general

Tree view

Kynn Jones wrote:
>> So... not really an answer (other than I used a stored proc)
>
> Actually, I'm interested in your solution.
>
> Just to make sure I understood what you did: you bulk-populated (i.e.
> with $dbh->do('COPY...'), $dbh->pg_putline(...), $dbh->pg_endcopy) the
> referring tables, with their fkey constraints disabled; then you ran
> stored procedure(s) that went through these referring tables and
> filled in the missing fkeys; and finally you activated their fkey
> constraints.  Is this right?
>
> I'm very much of a stored procedures dunce, so if the code for your
> stored procedure is "postable", please do.
>
> TIA,
>
> kj
>
> PS: As an aside to the list, as a programmer, when I'm starting out in
> language, I learn more than I can say from reading source code written
> by the experts, but for some reason I have had a hard time coming
> across expertly written PostgreSQL stored procedures, other than the
> occasionally didactic snippet in the docs.  All these expertly-written
> procedures seem to be very STORED away indeed!  If, on the contrary,
> it's just the case that I haven't looked in the right places, please
> hurl me a cluebrick!

I'm afraid it was nothing that super.  My scripts where to insert a
bunch of test data into a db so I could play with some very large db's.

I have a people table, and a car table.  I wanted to add ownership of
cars to people, but thought nobody should own more than 5 cars (to make
sure my "random" generator didnt bunch too many cars per person)

My perl looks like:
my $carcount = new BerkeleyDB::Btree( -Filename => 'carcount.dbm',
        -Flags => DB_CREATE ) or die "Cannot open file: $!";

<..later...>
    if (rand() > 0.10) {
        my $person = int(rand($maxperson));
        my $ok = 1;

        $i = 0;
        $carcount->db_get($person, $i);


        if ($i >= 5)
        {
            #shall we allow more than 5 cars?
            if (rand() > 0.90) {
                $ok = 1;
            } else {
                $ok = 0;
            }
        }

        if ($ok)
        {
            ... do the insert ...
        }


So I was keeping personid => carcount map.  This worked great, as I
said, until I got into the gigbytes size for the BerkeleyDB.

Instead I created a stored proc:

CREATE OR REPLACE FUNCTION addowner(xpersonid integer, xcarid integer)
returns void AS $$
declare
   cc integer;
begin
   select into cc count(*)
     from ownership
     where personid = xpersonid;

   if cc < 5 then
     insert into ownership(personid, carid) values (xpersonid, xcarid);
   end if;

   return;
end
$$ LANGUAGE plpgsql;


... and here is another stored proc I wrote for our website.  We have
company shirts n'stuff that employee's can order.  So the webpage is
kinda like a shopping cart thing.

create or replace function updatecart(xuserid integer, xgroupid integer,
xprodid integer, xsizeid integer, xcolorid integer, xqty integer)
returns void as $$
declare
   xid integer;
begin
   select into xid rowid from vcs_ordertable where userid = xuserid and
groupid = xgroupid and prodid = xprodid and sizeid = xsizeid and colorid
= xcolorid;
   if not found then
     insert into vcs_ordertable(userid, groupid, prodid, sizeid,
colorid, quant) values (xuserid, xgroupid, xprodid, xsizeid, xcolorid,
xqty);
   else
     update vcs_ordertable set quant = quant + xqty where rowid = xid;
   end if;
   return;
end
$$ LANGUAGE plpgsql;


When you select a shirt (including the size, color, etc) I only wanted
one row per (user, group, productid, size and color), if one already
exists in the table, I just bump its count, if not I insert it.

These two procs are about as complex as I've needed to get.

-Andy

pgsql-general by date:

From: "Dawid Kuroczko"
Date: 02 November 2007, 18:58:09
Subject: Re: Populating large DB from Perl script

From: "Jeffrey W. Baker"
Date: 02 November 2007, 21:15:08
Subject: setting for maximum acceptable plan cost?

Re: Populating large DB from Perl script - Mailing list pgsql-general

Previous

Next