Thread: performance problems with subselects

performance problems with subselects

From

John Taylor

Date:

23 April 2002, 09:36:27

Hi,

I have a rather complicated problem, that I hope someone can help with.

My application keeps track of updates to an order database running on another machine.

For reasons that are too complicated to go into, the orderlines are supplied with some of the
values missing, so I have to look them up before inserting into my database.

I'm now at the prototype stage, and I'm having performance problems.
It currently takes 10 hours to insert 44,000 records. This is almost 1 sec/record which is much too slow.
I have run vacuum, but that doesn't seem to make much difference.


The two tables concered are as follows:

                                      Table "orderheaderupdates"
  Attribute   |          Type          |                           Modifier
--------------+------------------------+--------------------------------------------------------------
 account      | character varying(6)   |
 delivery     | smallint               |
 thedate      | date                   |
 theorder     | integer                | not null default nextval('"orderheader_theorder_seq"'::text)
 deliverynote | character varying(10)  |
 invoicenote  | character varying(10)  |
 grnpod       | character varying(10)  |
 comments     | character varying(240) |
 dcomments    | character varying(100) |
 round        | character varying(3)   |
 tempno       | smallint               |
 status       | character(1)           |
 ordertype    | character(1)           |
 transno      | integer                |
 updated      | integer                |
 utype        | character(1)           |
 origin       | character(1)           |
Indices: orderheaderupdates_account,
         orderheaderupdates_delivery,
         orderheaderupdates_ordertype,
         orderheaderupdates_origin,
         orderheaderupdates_thedate,
         orderheaderupdates_theorder,
         orderheaderupdates_updated


            Table "orderlinesupdates"
  Attribute   |          Type          | Modifier
--------------+------------------------+----------
 theorder     | integer                | not null
 type         | character(1)           |
 stock        | character varying(6)   |
 line         | integer                | not null
 ordercurrent | integer                |
 sellingquant | integer                |
 price        | numeric(7,2)           |
 discount     | numeric(5,2)           |
 vatrate      | smallint               |
 comment      | character varying(100) |
 updated      | integer                |
 utype        | character(1)           |
 origin       | character(1)           |
Indices: orderlinesupdates_line,
         orderlinesupdates_origin,
         orderlinesupdates_stock,
         orderlinesupdates_theorder,
         orderlinesupdates_updated


To insert into the orderlines tables I am currently performing the following:

INSERT INTO orderlinesupdates
   (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin)
VALUES
   (
     (SELECT theorder FROM orderheaderupdates WHERE account='  1003' AND delivery=1 AND thedate='2002-02-26' AND
ordertype='O'ORDER BY updated DESC LIMIT 1), 
     'P',
     'ITEM1',
     (select coalesce
             (
                (select line from orderlinesupdates where theorder=(SELECT theorder FROM orderheaderupdates WHERE
account=' 1003' AND delivery=1 AND thedate='2002-02-26' AND ordertype='O'  ORDER BY updated DESC LIMIT 1) AND
stock='ITEM1'limit 1), 
                (select max(line)+1 from orderlinesupdates where theorder in (SELECT theorder FROM orderheader WHERE
account=' 1003' AND delivery=1  AND thedate='2002-02-26' AND ordertype='O' LIMIT 1)), 
                1
             )
     ),
     1,
     1,
     1.1,
     0.0,
     0,
     '',
     0,
     '+',
     'C'
   );

The first subselect is to find the correct ordernumber (theorder) from the orderheader.
The second subselect tries to locate the line number, by searching for the orderline in case this line has been
insertedpreviously. 
The third subselect finds the next free line number when it is the first time this orderline has been added.
The orderline defaults to 1, if this is the first orderline entered for the matching orderheader.

I am aware that I am using almost the same select from orderheader 3 times, but I'm not sure how I can improve on this.
Would a stored procedure help ?

Running explain on the insert gives the following:


Result  (cost=0.00..0.00 rows=0 width=0)
  InitPlan
    ->  Limit  (cost=2.04..2.04 rows=1 width=8)
          ->  Sort  (cost=2.04..2.04 rows=1 width=8)
                ->  Index Scan using orderheaderupdates_account on orderheaderupdates  (cost=0.00..2.03 rows=1 width=8)
    ->  Result  (cost=0.00..0.00 rows=0 width=0)
          InitPlan
            ->  Limit  (cost=0.00..12.64 rows=1 width=4)
                  InitPlan
                    ->  Limit  (cost=2.04..2.04 rows=1 width=8)
                          ->  Sort  (cost=2.04..2.04 rows=1 width=8)
                                ->  Index Scan using orderheaderupdates_account on orderheaderupdates  (cost=0.00..2.03
rows=1width=8) 
                  ->  Index Scan using orderlinesupdates_theorder on orderlinesupdates  (cost=0.00..12.64 rows=1
width=4)
            ->  Limit  (cost=0.00..12.64 rows=1 width=4)
                  InitPlan
                    ->  Limit  (cost=2.04..2.04 rows=1 width=8)
                          ->  Sort  (cost=2.04..2.04 rows=1 width=8)
                                ->  Index Scan using orderheaderupdates_account on orderheaderupdates  (cost=0.00..2.03
rows=1width=8) 
                  ->  Index Scan using orderlinesupdates_theorder on orderlinesupdates  (cost=0.00..12.64 rows=1
width=4)
            ->  Aggregate  (cost=45959.26..45959.26 rows=1 width=4)
                  ->  Seq Scan on orderlinesupdates  (cost=0.00..45849.26 rows=44000 width=4)
                        SubPlan
                          ->  Materialize  (cost=1.01..1.01 rows=1 width=4)
                                ->  Limit  (cost=0.00..1.01 rows=1 width=4)
                                      ->  Result  (cost=0.00..2.03 rows=2 width=4)
                                            ->  Append  (cost=0.00..2.03 rows=2 width=4)
                                                  ->  Seq Scan on orderheader  (cost=0.00..0.00 rows=1 width=4)
                                                  ->  Index Scan using orderheaderupdates_account on orderheaderupdates
orderheader (cost=0.00..2.03 rows=1 width=4) 
            ->  Aggregate  (cost=45959.26..45959.26 rows=1 width=4)
                  ->  Seq Scan on orderlinesupdates  (cost=0.00..45849.26 rows=44000 width=4)
                        SubPlan
                          ->  Materialize  (cost=1.01..1.01 rows=1 width=4)
                                ->  Limit  (cost=0.00..1.01 rows=1 width=4)
                                      ->  Result  (cost=0.00..2.03 rows=2 width=4)
                                            ->  Append  (cost=0.00..2.03 rows=2 width=4)
                                                  ->  Seq Scan on orderheader  (cost=0.00..0.00 rows=1 width=4)
                                                  ->  Index Scan using orderheaderupdates_account on orderheaderupdates
orderheader (cost=0.00..2.03 rows=1 width=4) 

Any ideas would be greatly appeciated.

Thanks
JohnT

Re: performance problems with subselects

From

Tom Lane

Date:

23 April 2002, 10:04:59

John Taylor <postgres@jtresponse.co.uk> writes:
> I am aware that I am using almost the same select from orderheader 3
> times, but I'm not sure how I can improve on this.

Can you do this?

INSERT INTO target SELECT foo, bar, baz FROM source WHERE condition

To the extent that you are computing multiple fields from the same
row in the source table, this will help.

Also, avoid that IN construct.

            regards, tom lane

Re: performance problems with subselects

From

John Taylor

Date:

23 April 2002, 10:41:08

On Tuesday 23 April 2002 15:04, Tom Lane wrote:
> John Taylor <postgres@jtresponse.co.uk> writes:
> > I am aware that I am using almost the same select from orderheader 3
> > times, but I'm not sure how I can improve on this.
>
> Can you do this?
>
> INSERT INTO target SELECT foo, bar, baz FROM source WHERE condition
>
> To the extent that you are computing multiple fields from the same
> row in the source table, this will help.

Where would I do that ?
Do you mean to select the order number (theorder) into a temporary table ?
The problem is that I need to use the order number in 3 seperate subselects.

>
> Also, avoid that IN construct.

OK. Thats done.

Thanks
JohnT

Re: performance problems with subselects

From

John Taylor

Date:

24 April 2002, 11:34:39

On Tuesday 23 April 2002 15:04, Tom Lane wrote:
> John Taylor <postgres@jtresponse.co.uk> writes:
> > I am aware that I am using almost the same select from orderheader 3
> > times, but I'm not sure how I can improve on this.
>

Thanks to advice from Tom Lane, and Andrew McMillan I have now restructured
my query to go twice as fast !

I now have:

INSERT INTO orderlinesupdates
(theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin)
SELECT oh.theorder,'P','stock2',coalesce(ol.line,ol2.line+1,1),12,2,1.2,0.2,1,'testing',0,'+','C'
  FROM orderheaderupdates oh
  LEFT OUTER JOIN orderlinesupdates ol
  ON oh.theorder = ol.theorder AND ol.stock='stock2'
  LEFT OUTER JOIN orderlinesupdates ol2
  ON oh.theorder = ol2.theorder
  WHERE oh.account='account1' AND oh.delivery=1 AND oh.thedate='2002-02-01' AND oh.ordertype='O'
  ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC
  LIMIT 1;

The only way I can see to improve this any more is to reduce the JOINS and coalesce()

What I want to do is:
  If there is a matching orderline, use the same line number
  otherwise, find the largest line number in the orderlines with the same orderheader, and use 1 greater
  otherwise, this is the first orderline for this orderheader, so use 1

Is there a way I can do this without two outer joins ?
As one JOIN is a subset of the other, is there a way to get postgres to use the results from the ol2 query
and then apply the additional ol restriction without performing the join again ?

Maybe I could use a subselect here instead ?

Thanks
JohnT

Re: performance problems with subselects

From

Andrew McMillan

Date:

24 April 2002, 18:49:20

On Thu, 2002-04-25 at 03:27, John Taylor wrote:
>
> Thanks to advice from Tom Lane, and Andrew McMillan I have now restructured
> my query to go twice as fast !
>
> I now have:
>
> INSERT INTO orderlinesupdates
> (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin)
> SELECT oh.theorder,'P','stock2',coalesce(ol.line,ol2.line+1,1),12,2,1.2,0.2,1,'testing',0,'+','C'
>   FROM orderheaderupdates oh
>   LEFT OUTER JOIN orderlinesupdates ol
>   ON oh.theorder = ol.theorder AND ol.stock='stock2'
>   LEFT OUTER JOIN orderlinesupdates ol2
>   ON oh.theorder = ol2.theorder
>   WHERE oh.account='account1' AND oh.delivery=1 AND oh.thedate='2002-02-01' AND oh.ordertype='O'
>   ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC
>   LIMIT 1;
>
> The only way I can see to improve this any more is to reduce the JOINS and coalesce()
>
> What I want to do is:
>   If there is a matching orderline, use the same line number
>   otherwise, find the largest line number in the orderlines with the same orderheader, and use 1 greater
>   otherwise, this is the first orderline for this orderheader, so use 1
>
> Is there a way I can do this without two outer joins ?
> As one JOIN is a subset of the other, is there a way to get postgres to use the results from the ol2 query
> and then apply the additional ol restriction without performing the join again ?
>
> Maybe I could use a subselect here instead ?

Possibly a small pl/pgsql function would be better:

CREATE OR REPLACE FUNCTION get_next_olnum( INT, TEXT )
  RETURNS INT AS '
DECLARE
   i_ordernum ALIAS FOR $1;
   i_stockcode ALIAS FOR $2;
   o_nextnum INT DEFAULT 1;
BEGIN
  SELECT line INTO o_nextnum
         FROM orderlinesupdates
         WHERE theorder = i_ordernum
           AND stock = i_stockcode;
  IF NOT FOUND THEN
    SELECT line + 1 INTO o_nextnum
           FROM orderlinesupdates
           WHERE theorder = i_ordernum;
    IF NOT FOUND THEN
      o_nextnum := 1;
    END IF;
  END IF;
  RETURN o_nextnum;
END;
' LANGUAGE 'plpgsql';

This way the second select will not be executed in the event that the
first select is successful.

Then you can just use it in place of the coalesce:

..., get_next_olnum( oh.theorder, 'stock2' ), ...

Regards,
                    Andrew.
--
--------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/        PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201    MOB: +64(21)635-694    OFFICE: +64(4)499-2267
       Are you enrolled at http://schoolreunions.co.nz/ yet?