Re: optimising data load - Mailing list pgsql-novice
From | Patrick Hatcher |
---|---|
Subject | Re: optimising data load |
Date | |
Msg-id | OF90D8E83C.A706D781-ON88256BC1.0054A6F0@fds.com Whole thread Raw |
In response to | optimising data load (John Taylor <postgres@jtresponse.co.uk>) |
List | pgsql-novice |
Dump the records from the other dbase to a text file and then use the COPY command for Pg. I update tables nightly with 400K+ records and it only takes 1 -2 mins. You should drop and re-add your indexes and then do a vacuum analyze Patrick Hatcher Macys.Com Legacy Integration Developer 415-932-0610 office HatcherPT - AIM John Taylor <postgres@jtresponse.co To: pgsql-novice@postgresql.org .uk> cc: Sent by: Subject: [NOVICE] optimising data load pgsql-novice-owner@post gresql.org 05/22/2002 06:45 AM Hi all, I'm (still) working on an application to regularly populate my database with some data provided from a third party DB. I'm still having really bad performance problems. There are 300,000 records to be inserted, but I'm only getting 10,000/hour. 30 hours to populate a single table is just not on. There must be something seriously bad going on. I have to update to different databases: live, and an update log. I've included explains for both the db's below. I have a few questions: 1) Can it really be right that it is going this slow ? I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3 2) Can anyone see a way to optimise these queries further ? 3) Is it likely to be quicker if I use a query to obtain ol.line, and then generate a CSV file for use with COPY ? Thanks JohnT --- LIVE --- explain INSERT INTO orderlines (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment) SELECT oh.theorder,'P',' 0310',coalesce(ol.line+1,1),5,0,.52,0,0,'' FROM orderheader oh LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' ORDER BY ol.line DESC LIMIT 1; NOTICE: QUERY PLAN: Subquery Scan *SELECT* (cost=47.41..47.41 rows=1 width=12) -> Limit (cost=47.41..47.41 rows=1 width=12) -> Sort (cost=47.41..47.41 rows=1 width=12) -> Nested Loop (cost=0.00..47.40 rows=1 width=12) -> Index Scan using orderheader_account on orderheader oh (cost=0.00..21.64 rows=1 width=4) -> Index Scan using orderlines_pkey on orderlines ol (cost=0.00..25.54 rows=17 width=8) EXPLAIN --- UPDATE LOG --- explain INSERT INTO orderlinesupdates (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin) SELECT oh.theorder,'P',' 0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C' FROM orderheaderupdates oh LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = ol.theorder AND ol.stock=' 0310' LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O' ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1; NOTICE: QUERY PLAN: Subquery Scan *SELECT* (cost=81.29..81.29 rows=1 width=36) -> Limit (cost=81.29..81.29 rows=1 width=36) -> Sort (cost=81.29..81.29 rows=1 width=36) -> Nested Loop (cost=0.00..81.28 rows=1 width=36) -> Nested Loop (cost=0.00..52.47 rows=1 width=28) -> Index Scan using orderheaderupdates_account on orderheaderupdates oh (cost=0.00..23.62 rows=1 width=8) -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol (cost=0.00..28.60 rows=17 width=20) -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol2 (cost=0.00..28.60 rows=17 width=8) EXPLAIN ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
pgsql-novice by date: