Working with very large datasets - Mailing list pgsql-sql

From Wilkinson Charlie E
Subject Working with very large datasets
Date
Msg-id 9DDDC002B736D711A7320004AC3699EB01C192C7@nct0010mb03.nc.no.irs.gov
Whole thread Raw
Responses Re: Working with very large datasets
Re: Working with very large datasets
List pgsql-sql
<p><font size="2">Greetings,</font><br /><font size="2">Can anyone enlighten me or point me at resources concerning use
ofpgsql with</font><br /><font size="2">very large datasets?</font><p><font size="2">My specific problem is
this:</font><p><fontsize="2">I have two tables, one with about 100 million rows and one with about 22,000</font><br
/><fontsize="2">rows.  My plan was to inner join the two tables on an integer key and output</font><br /><font
size="2">the4 significant columns, excluding the keys. (Those with a better understanding</font><br /><font size="2">of
pgsqlinternals, feel free to laugh.)  The result was a big angry psql that</font><br /><font size="2">grew to 800+MB
beforeI had to kill it.</font><p><font size="2">Example:</font><p><font size="2">CREATE TABLE users (</font><br /><font
size="2">       userid             INTEGER NOT NULL,</font><br /><font size="2">        fileid             INTEGER NOT
NULL</font><br/><font size="2">);</font><br /><font size="2">CREATE INDEX "users_userid_key" on "users" using btree (
"userid""int4_ops" );</font><p><font size="2">CREATE TABLE files</font><br /><font size="2">(</font><br /><font
size="2">       fileid          SERIAL NOT NULL,</font><br /><font size="2">        name           
VARCHAR(120),</font><br/><font size="2">        size            INTEGER,</font><br /><font size="2">       
mtime          INTEGER</font><br /><font size="2">);</font><br /><font size="2">-- files.fileid will automagically get
indexed</font><p><fontsize="2">[100 million inserts to 'users', 22,000 inserts to 'files'...]</font><p><font
size="2">SELECTuserid,name,size,mtime FROM users INNER JOIN files ON users.fileid = files.fileid;</font><p><font
size="2">[attackof the 50 foot psql...]</font><p><font size="2">Did I miss sonething important?  Is there a better
way? Any clues appreciated.</font><p><font size="2">-cw-</font><p><font size="2">(Apologies for any HTML that gets
tackedonto this message.  I spec'd plaintext!)</font><br /><font size="2">--</font><br /><font size="2">Charlie
Wilkinson</font><br/><font size="2">TRIS Development Systems Administrator</font><br /><font
size="2">IS:SD:CT:CC:TD</font><br/><font size="2">Phone: 202-283-3241</font><br /><font size="2">MSMail:  
Charlie.E.Wilkinson@irs.gov</font><br/><font size="2">SMTP:  cwilkins@tris.irs.gov</font><br /><font size="2">Home: 
cwilkins@boinklabs.com</font><br/><font size="2">This message constructed from 90% post-consumer electrons. </font> 

pgsql-sql by date:

Previous
From: Wei Weng
Date:
Subject: What is wrong with this identification configuration?
Next
From: "Susan"
Date:
Subject: adding not null constraints on columns