Thread: update slow

update slow

From
Josué Maldonado
Date:
Hello list,

I have 7.3.4 on RH 8, server hardware is a dual processor Intel Xeon 2.4
Ghz, 2G RAM, 27Gb HD available on postgres partition.

Table cost2est2003 with 99350 recs is described as:
code char(4)    -- index
tcos numeric(12,4)
mes char(6) -- index

Table estprod with 355513 recs is described as:
pk_estprod  -- index
epr_periodo char(6)  -- Index
epr_venta numeric(12,4)
epr_costo numeric(12,4)
pro_code char(4)


This is the update command:
update estprod set epr_costo=(select tcos from cost2est2003 where
code=pro_code and mes=epr_periodo) where epr_periodo >='200301'

The above filters the records to affect only 99157 rows, the update
takes hours and don't get done (I did cancel it), I changed the filter
to "epr_periodo ='200301'" to update only 9756 rows but still has more
than 10 minutes working.

I wonder if there is something I'm doing wrong, any help will be
appreciated. BTW the server is not in production is not doing anything
else. Thanks in advance



--
Josué Maldonado.



Re: update slow

From
Richard Huxton
Date:
On Tuesday 11 November 2003 19:46, Josué Maldonado wrote:
> This is the update command:
> update estprod set epr_costo=(select tcos from cost2est2003 where
> code=pro_code and mes=epr_periodo) where epr_periodo >='200301'
>
> The above filters the records to affect only 99157 rows, the update
> takes hours and don't get done (I did cancel it), I changed the filter
> to "epr_periodo ='200301'" to update only 9756 rows but still has more
> than 10 minutes working.

Can you post EXPLAIN ANALYSE <query> on this shorter one - that will show what
PG is doing.

One thing that might be worth looking at is using PG's non-standard
UPDATE t1 SET c1=... FROM t2 ...

--
  Richard Huxton
  Archonet Ltd

Re: update slow

From
Josué Maldonado
Date:
Hi Richard,

Here are both ways:

dbmund=# EXPLAIN ANALYSE update estprod set
dbmund-# epr_costo=(select tcos from cost2est2003 where code=pro_code
and mes=epr_periodo)
dbmund-# where epr_periodo='200311';

  Seq Scan on estprod  (cost=0.00..9177.91 rows=8080 width=163) (actual
time=440.99..627139.23 rows=6867 loops=1)
    Filter: (epr_periodo = '200311'::bpchar)
    SubPlan
      ->  Index Scan using c2emes on cost2est2003  (cost=0.00..1532.53
rows=2 width=16) (actual time=66.60..91.19 rows=1 loops=6867)
            Index Cond: (mes = $1)
            Filter: (code = $0)
  Total runtime: 628557.56 msec
(7 rows)


dbmund=# explain analyze update estprod set
dbmund-# epr_costo= tcos
dbmund-# from cost2est2003
dbmund-# where code=pro_code and mes=epr_periodo
dbmund-# and epr_periodo='200311';

  Merge Join  (cost=10080.76..15930.98 rows=316 width=197) (actual
time=1191.89..4704.49 rows=6851 loops=1)
    Merge Cond: ("outer".code = "inner".pro_code)
    Join Filter: ("outer".mes = "inner".epr_periodo)
    ->  Index Scan using c2ecode on cost2est2003  (cost=0.00..4614.85
rows=99350 width=34) (actual time=0.16..2974.96 rows=99350 loops=1)
    ->  Sort  (cost=10080.76..10100.96 rows=8080 width=163) (actual
time=1191.62..1235.32 rows=55216 loops=1)
          Sort Key: estprod.pro_code
          ->  Seq Scan on estprod  (cost=0.00..9177.91 rows=8080
width=163) (actual time=396.88..1126.28 rows=6867 loops=1)
                Filter: (epr_periodo = '200311'::bpchar)
  Total runtime: 5990.34 msec
(9 rows)

Wow, update from is pretty faster than the first update, can't
understand why.

Thanks


Richard Huxton wrote:

> On Tuesday 11 November 2003 19:46, Josué Maldonado wrote:
>
>>This is the update command:
>>update estprod set epr_costo=(select tcos from cost2est2003 where
>>code=pro_code and mes=epr_periodo) where epr_periodo >='200301'
>>
>>The above filters the records to affect only 99157 rows, the update
>>takes hours and don't get done (I did cancel it), I changed the filter
>>to "epr_periodo ='200301'" to update only 9756 rows but still has more
>>than 10 minutes working.
>
>
> Can you post EXPLAIN ANALYSE <query> on this shorter one - that will show what
> PG is doing.
>
> One thing that might be worth looking at is using PG's non-standard
> UPDATE t1 SET c1=... FROM t2 ...
>


--
Josué Maldonado.



Re: update slow

From
Richard Huxton
Date:
On Tuesday 11 November 2003 20:55, Josué Maldonado wrote:
> Hi Richard,
>
> Here are both ways:

Excellent - OK, here's a crash course in understanding the output of EXPLAIN
ANALYSE...

> dbmund=# EXPLAIN ANALYSE update estprod set
> dbmund-# epr_costo=(select tcos from cost2est2003 where code=pro_code
> and mes=epr_periodo)
> dbmund-# where epr_periodo='200311';
>
>   Seq Scan on estprod  (cost=0.00..9177.91 rows=8080 width=163) (actual
> time=440.99..627139.23 rows=6867 loops=1)

PG predicted 8080 rows to update (each of 163 bytes). It was actually 6867
rows with a total time of 627139.23 milliseconds (over 10 minutes).

>     Filter: (epr_periodo = '200311'::bpchar)
>     SubPlan
>       ->  Index Scan using c2emes on cost2est2003  (cost=0.00..1532.53
> rows=2 width=16) (actual time=66.60..91.19 rows=1 loops=6867)

Each sub-select took 91.19ms but looping through 6867 calls, that's 626
seconds - almost all of your query time.

>             Index Cond: (mes = $1)
>             Filter: (code = $0)
>   Total runtime: 628557.56 msec

The only thing that might speed it up is to build an index on (mes,code) for
cost2est2003, and even then I wouldn't expect wonders.

> dbmund=# explain analyze update estprod set
> dbmund-# epr_costo= tcos
> dbmund-# from cost2est2003
> dbmund-# where code=pro_code and mes=epr_periodo
> dbmund-# and epr_periodo='200311';
>
>   Merge Join  (cost=10080.76..15930.98 rows=316 width=197) (actual
> time=1191.89..4704.49 rows=6851 loops=1)
>     Merge Cond: ("outer".code = "inner".pro_code)
>     Join Filter: ("outer".mes = "inner".epr_periodo)
>     ->  Index Scan using c2ecode on cost2est2003  (cost=0.00..4614.85
> rows=99350 width=34) (actual time=0.16..2974.96 rows=99350 loops=1)
>     ->  Sort  (cost=10080.76..10100.96 rows=8080 width=163) (actual
> time=1191.62..1235.32 rows=55216 loops=1)
>           Sort Key: estprod.pro_code
>           ->  Seq Scan on estprod  (cost=0.00..9177.91 rows=8080
> width=163) (actual time=396.88..1126.28 rows=6867 loops=1)
>                 Filter: (epr_periodo = '200311'::bpchar)
>   Total runtime: 5990.34 msec
> (9 rows)
>
> Wow, update from is pretty faster than the first update, can't
> understand why.

In the second example, PG gathers the target rows in estprod sorts them and
then joins using your index on c2ecode. This is clearly a better plan. Note
that since the rows from the previous test could still be in memory, you'd
expect an improvement anyway, but this plan just looks better.

PG obviously can't figure out that it can convert the first example into the
second, but I'm guessing there are good reasons for that. Although your
example is simple enough, there might be cases where it isn't safe to do so.
Presumably this is one reason why the non-standard UPDATE...FROM statement is
there.

Try the second form with your real query, VACUUM ANALYSE both tables and
perhaps try an index on (mes,code) and see what that does for you. If you get
any more performance issues, there is a mailing list specifically for them -
plenty of knowledgable types there.
--
  Richard Huxton
  Archonet Ltd