Re: Populating large DB from Perl script - Mailing list pgsql-general
From | andy |
---|---|
Subject | Re: Populating large DB from Perl script |
Date | |
Msg-id | 472B8990.4050700@squeakycode.net Whole thread Raw |
In response to | Re: Populating large DB from Perl script ("Kynn Jones" <kynnjo@gmail.com>) |
List | pgsql-general |
Kynn Jones wrote: >> So... not really an answer (other than I used a stored proc) > > Actually, I'm interested in your solution. > > Just to make sure I understood what you did: you bulk-populated (i.e. > with $dbh->do('COPY...'), $dbh->pg_putline(...), $dbh->pg_endcopy) the > referring tables, with their fkey constraints disabled; then you ran > stored procedure(s) that went through these referring tables and > filled in the missing fkeys; and finally you activated their fkey > constraints. Is this right? > > I'm very much of a stored procedures dunce, so if the code for your > stored procedure is "postable", please do. > > TIA, > > kj > > PS: As an aside to the list, as a programmer, when I'm starting out in > language, I learn more than I can say from reading source code written > by the experts, but for some reason I have had a hard time coming > across expertly written PostgreSQL stored procedures, other than the > occasionally didactic snippet in the docs. All these expertly-written > procedures seem to be very STORED away indeed! If, on the contrary, > it's just the case that I haven't looked in the right places, please > hurl me a cluebrick! I'm afraid it was nothing that super. My scripts where to insert a bunch of test data into a db so I could play with some very large db's. I have a people table, and a car table. I wanted to add ownership of cars to people, but thought nobody should own more than 5 cars (to make sure my "random" generator didnt bunch too many cars per person) My perl looks like: my $carcount = new BerkeleyDB::Btree( -Filename => 'carcount.dbm', -Flags => DB_CREATE ) or die "Cannot open file: $!"; <..later...> if (rand() > 0.10) { my $person = int(rand($maxperson)); my $ok = 1; $i = 0; $carcount->db_get($person, $i); if ($i >= 5) { #shall we allow more than 5 cars? if (rand() > 0.90) { $ok = 1; } else { $ok = 0; } } if ($ok) { ... do the insert ... } So I was keeping personid => carcount map. This worked great, as I said, until I got into the gigbytes size for the BerkeleyDB. Instead I created a stored proc: CREATE OR REPLACE FUNCTION addowner(xpersonid integer, xcarid integer) returns void AS $$ declare cc integer; begin select into cc count(*) from ownership where personid = xpersonid; if cc < 5 then insert into ownership(personid, carid) values (xpersonid, xcarid); end if; return; end $$ LANGUAGE plpgsql; ... and here is another stored proc I wrote for our website. We have company shirts n'stuff that employee's can order. So the webpage is kinda like a shopping cart thing. create or replace function updatecart(xuserid integer, xgroupid integer, xprodid integer, xsizeid integer, xcolorid integer, xqty integer) returns void as $$ declare xid integer; begin select into xid rowid from vcs_ordertable where userid = xuserid and groupid = xgroupid and prodid = xprodid and sizeid = xsizeid and colorid = xcolorid; if not found then insert into vcs_ordertable(userid, groupid, prodid, sizeid, colorid, quant) values (xuserid, xgroupid, xprodid, xsizeid, xcolorid, xqty); else update vcs_ordertable set quant = quant + xqty where rowid = xid; end if; return; end $$ LANGUAGE plpgsql; When you select a shirt (including the size, color, etc) I only wanted one row per (user, group, productid, size and color), if one already exists in the table, I just bump its count, if not I insert it. These two procs are about as complex as I've needed to get. -Andy
pgsql-general by date: