Re: optimising data load - Mailing list pgsql-novice

From Rasmus Mohr
Subject Re: optimising data load
Date
Msg-id 910513A5A944D5118BE900C04F67CB5A0BFDB2@MAIL
Whole thread Raw
In response to optimising data load  (John Taylor <postgres@jtresponse.co.uk>)
List pgsql-novice
I think something like setting fsync = off in postgresql.conf and dropping
indeces should boost performance. I think it did help when we had a similar
problem populating our database with data from an old access database.

--------------------------------------------------------------
Rasmus T. Mohr            Direct  :             +45 36 910 122
Application Developer     Mobile  :             +45 28 731 827
Netpointers Intl. ApS     Phone   :             +45 70 117 117
Vestergade 18 B           Fax     :             +45 70 115 115
1456 Copenhagen K         Email   : mailto:rmo@netpointers.com
Denmark                   Website : http://www.netpointers.com

"Remember that there are no bugs, only undocumented features."
--------------------------------------------------------------

> -----Original Message-----
> From: pgsql-novice-owner@postgresql.org
> [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of John Taylor
> Sent: Wednesday, May 22, 2002 3:46 PM
> To: pgsql-novice@postgresql.org
> Subject: [NOVICE] optimising data load
>
>
>
> Hi all,
>
> I'm (still) working on an application to regularly populate
> my database with some data provided from
> a third party DB.
>
> I'm still having really bad performance problems.
> There are 300,000 records to be inserted, but I'm only
> getting 10,000/hour.
> 30 hours to populate a single table is just not on. There
> must be something seriously bad going on.
>
> I have to update to different databases: live, and an update log.
>
> I've included explains for both the db's below.
>
> I have a few questions:
>
> 1) Can it really be right that it is going this slow ?
>    I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3
>
> 2) Can anyone see a way to optimise these queries further ?
>
> 3) Is it likely to be quicker if I use a query to obtain
> ol.line, and then generate a CSV file for use with COPY ?
>
> Thanks
> JohnT
>
> --- LIVE ---
> explain INSERT INTO orderlines
> (theorder,type,stock,line,ordercurrent,sellingquant,price,disc
> ount,vatrate,comment)
> SELECT oh.theorder,'P','
> 0310',coalesce(ol.line+1,1),5,0,.52,0,0,''  FROM orderheader oh
> LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder
> WHERE oh.account=' MILN1' AND oh.delivery=1 AND
> oh.thedate='2002-06-01' AND oh.ordertype='O'
> ORDER BY ol.line DESC LIMIT 1;
> NOTICE:  QUERY PLAN:
>
> Subquery Scan *SELECT*  (cost=47.41..47.41 rows=1 width=12)
>   ->  Limit  (cost=47.41..47.41 rows=1 width=12)
>         ->  Sort  (cost=47.41..47.41 rows=1 width=12)
>               ->  Nested Loop  (cost=0.00..47.40 rows=1 width=12)
>                     ->  Index Scan using orderheader_account
> on orderheader oh  (cost=0.00..21.64 rows=1 width=4)
>                     ->  Index Scan using orderlines_pkey on
> orderlines ol  (cost=0.00..25.54 rows=17 width=8)
>
> EXPLAIN
>
> --- UPDATE LOG ---
> explain INSERT INTO orderlinesupdates
> (theorder,type,stock,line,ordercurrent,sellingquant,price,disc
> ount,vatrate,comment,updated,utype,origin)
> SELECT oh.theorder,'P','
> 0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C
> ' FROM orderheaderupdates oh
> LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder =
> ol.theorder AND ol.stock='  0310'
> LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder
> WHERE oh.account=' MILN1' AND oh.delivery=1 AND
> oh.thedate='2002-06-01' AND oh.ordertype='O'
> ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1;
> NOTICE:  QUERY PLAN:
>
> Subquery Scan *SELECT*  (cost=81.29..81.29 rows=1 width=36)
>   ->  Limit  (cost=81.29..81.29 rows=1 width=36)
>         ->  Sort  (cost=81.29..81.29 rows=1 width=36)
>               ->  Nested Loop  (cost=0.00..81.28 rows=1 width=36)
>                     ->  Nested Loop  (cost=0.00..52.47 rows=1
> width=28)
>                           ->  Index Scan using
> orderheaderupdates_account on orderheaderupdates oh
> (cost=0.00..23.62 rows=1 width=8)
>                           ->  Index Scan using
> orderlinesupdates_theorder on orderlinesupdates ol
>  (cost=0.00..28.60 rows=17 width=20)
>                     ->  Index Scan using
> orderlinesupdates_theorder on orderlinesupdates ol2
> (cost=0.00..28.60 rows=17 width=8)
>
> EXPLAIN
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>
>

pgsql-novice by date:

Previous
From: Tom Lane
Date:
Subject: Re: pl/perl Documentation
Next
From: Tom Lane
Date:
Subject: Re: Undead record haunts my database, need exorcism