Re: optimising data load - Mailing list pgsql-novice

From Ron Johnson
Subject Re: optimising data load
Date
Msg-id 1022089203.19119.26.camel@rebel
Whole thread Raw
In response to optimising data load  (John Taylor <postgres@jtresponse.co.uk>)
Responses Re: optimising data load  (John Taylor <john.taylor@jtresponse.co.uk>)
List pgsql-novice
On Wed, 2002-05-22 at 08:45, John Taylor wrote:
>
> Hi all,
>
> I'm (still) working on an application to regularly populate my database with some data provided from
> a third party DB.
>
> I'm still having really bad performance problems.
> There are 300,000 records to be inserted, but I'm only getting 10,000/hour.
> 30 hours to populate a single table is just not on. There must be something seriously bad going on.

From what I see of your INSERT statement, you are joining orderheader
to orderlines and then inserting those records into orderlines.
Is that correct?

I think I'd make a view from the SELECT portion.  Then, nightly:
1. COPY out the view.
2. Drop the index(es) on orderlines
3. COPY into orderlines
4. recreate indexes on orderlines.

Do the same for orderlineupdates.  The thing is is that
modifying indexes is a _very_ expensive operation, no matter
what RDBMS or hardware configuration you have...

> I have to update to different databases: live, and an update log.
>
> I've included explains for both the db's below.
>
> I have a few questions:
>
> 1) Can it really be right that it is going this slow ?
>    I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3
>
> 2) Can anyone see a way to optimise these queries further ?
>
> 3) Is it likely to be quicker if I use a query to obtain ol.line, and then generate a CSV file for use with COPY ?
>
> Thanks
> JohnT
>
> --- LIVE ---
> explain INSERT INTO orderlines (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment)
> SELECT oh.theorder,'P','  0310',coalesce(ol.line+1,1),5,0,.52,0,0,''  FROM orderheader oh
> LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder
> WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O'
> ORDER BY ol.line DESC LIMIT 1;
> NOTICE:  QUERY PLAN:
>
> Subquery Scan *SELECT*  (cost=47.41..47.41 rows=1 width=12)
>   ->  Limit  (cost=47.41..47.41 rows=1 width=12)
>         ->  Sort  (cost=47.41..47.41 rows=1 width=12)
>               ->  Nested Loop  (cost=0.00..47.40 rows=1 width=12)
>                     ->  Index Scan using orderheader_account on orderheader oh  (cost=0.00..21.64 rows=1 width=4)
>                     ->  Index Scan using orderlines_pkey on orderlines ol  (cost=0.00..25.54 rows=17 width=8)
>
> EXPLAIN
>
> --- UPDATE LOG ---
> explain INSERT INTO orderlinesupdates
(theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin)
> SELECT oh.theorder,'P','  0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C' FROM orderheaderupdates oh
> LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = ol.theorder AND ol.stock='  0310'
> LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder
> WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O'
> ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1;
> NOTICE:  QUERY PLAN:
>
> Subquery Scan *SELECT*  (cost=81.29..81.29 rows=1 width=36)
>   ->  Limit  (cost=81.29..81.29 rows=1 width=36)
>         ->  Sort  (cost=81.29..81.29 rows=1 width=36)
>               ->  Nested Loop  (cost=0.00..81.28 rows=1 width=36)
>                     ->  Nested Loop  (cost=0.00..52.47 rows=1 width=28)
>                           ->  Index Scan using orderheaderupdates_account on orderheaderupdates oh  (cost=0.00..23.62
rows=1width=8) 
>                           ->  Index Scan using orderlinesupdates_theorder on orderlinesupdates ol
>  (cost=0.00..28.60 rows=17 width=20)
>                     ->  Index Scan using orderlinesupdates_theorder on orderlinesupdates ol2  (cost=0.00..28.60
rows=17width=8) 
>
> EXPLAIN

--
+---------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net     |
| Jefferson, LA  USA      http://ronandheather.dhs.org:81 |
|                                                         |
| "I have created a government of whirled peas..."        |
|   Maharishi Mahesh Yogi, 12-May-2002,                   |
!   CNN, Larry King Live                                  |
+---------------------------------------------------------+


pgsql-novice by date:

Previous
From: Ron Johnson
Date:
Subject: Re: Better way to bulk-load millions of CSV records into
Next
From: Ron Johnson
Date:
Subject: Re: Better way to bulk-load millions of CSV records into