Thread: Slow table update

Slow table update

From
Laszlo Nagy
Date:
SQL:

update product set sz_category_id=null where am_style_kw1 is not null
and sz_category_id is not null

query plan:

"Seq Scan on product  (cost=0.00..647053.30 rows=580224 width=1609)"
"  Filter: ((am_style_kw1 IS NOT NULL) AND (sz_category_id IS NOT NULL))"

Information on the table:

row count ~ 2 million
table size: 4841 MB
toast table size: 277mb
indexes size: 4434 MB

Computer: FreeBSD 7.0 stable, Dual Xeon Quad code 5420 2.5GHZ, 8GB
memory, 6 ES SATA disks in hw RAID 6 (+2GB write back cache) for the
database.

Autovacuum is enabled. We also perform "vacuum analyze" on the database,
each day.

Here are some non-default values from postgresql.conf:

shared_buffers=400MB
maintenance_work_mem = 256MB
max_fsm_pages = 1000000

There was almost no load on the machine (CPU: mostly idle, IO: approx.
5% total) when we started this update.

Maybe I'm wrong with this, but here is a quick calculation: the RAID
array should do at least 100MB/sec. Reading the whole table should not
take more than 1 min. I think about 20% of the rows should have been
updated. Writting out all changes should not take too much time. I
believe that this update should have been completed within 2-3 minutes.

In reality, after 2600 seconds I have cancelled the query. We monitored
disk I/O and it was near 100% all the time.

What is wrong?

Thank you,

   Laszlo


Re: Slow table update

From
Laszlo Nagy
Date:
Laszlo Nagy wrote:
> SQL:
>
> update product set sz_category_id=null where am_style_kw1 is not null
> and sz_category_id is not null
Hmm, this query:

select count(*) from product where am_style_kw1 is not null and
sz_category_id is not null and sz_category_id<>4809

opens in 10 seconds. The update would not finish in 2600 seconds. I
don't understand.

L


Re: Slow table update

From
"Gregory Williamson"
Date:

Laszlo Nagy wrote:

>
> Laszlo Nagy wrote:
> > SQL:
> >
> > update product set sz_category_id=null where am_style_kw1 is not null
> > and sz_category_id is not null
> Hmm, this query:
>
> ?select count(*) from product where am_style_kw1 is not null and
> sz_category_id is not null and sz_category_id<>4809
>
> opens in 10 seconds. The update would not finish in 2600 seconds. I
> don't understand.

If the table has some sort of FK relations it might be being slowed by the need to check a row meant to be deleted has any children.

Perhaps triggers ?

If the table is very bloated with lots of dead rows (but you did say you vacuum frequently and check the results to make sure they are effective?) that would slow it down.

A long running transaction elsewhere that is blocking the delete ? Did you check the locks ?

HTH,

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)


Re: Slow table update

From
Laszlo Nagy
Date:
>
> If the table has some sort of FK relations it might be being slowed by
> the need to check a row meant to be deleted has any children.
>
If you look at my SQL, there is only one column to be updated. That
column has no foreign key constraint. (It should have, but we did not
want to add that constraint in order to speed up updates.)
>
>
> Perhaps triggers ?
>
Table "product" has no triggers.
>
>
> If the table is very bloated with lots of dead rows (but you did say
> you vacuum frequently and check the results to make sure they are
> effective?) that would slow it down.
>
I'm not sure how to check if the vacuum was effective. But we have
max_fsm_pages=1000000 in postgresql.conf, and I do not get any errors
from the daily vacuum script, so I presume that the table hasn't got too
many dead rows.

Anyway, the table size is only 4GB. Even if half of the rows are dead,
the update should run quite quickly. Another argument is that when I
"select count(*)" instead of "UPDATE", then I get the result in 10
seconds. I don't think that dead rows can make such a big difference
between reading and writing.

My other idea was that there are so many indexes on this table, maybe
the update is slow because of the indexes? The column being updated has
only one index on it, and that is 200MB. But I have heard somewhere that
because of PostgreSQL's multi version system, sometimes the system needs
to update indexes with columns that are not being updated. I'm not sure.
Might this be the problem?
>
>
> A long running transaction elsewhere that is blocking the delete ? Did
> you check the locks ?
>
Sorry, this was an update. A blocking transaction would never explain
why the disk I/O went up to 100% for 2600 seconds.

   L


Re: Slow table update

From
Laszlo Nagy
Date:
I just tested the same on a test machine. It only has one processor 1GB
memory, and one SATA disk. The same "select count(*)" was 58 seconds. I
started the same UPDATE with EXPLAIN ANALYZE. It is running since 1000
seconds. I'm now 100% sure that the problem is with the database,
because this machine has nothing but a postgresql server running on it.
I'll post the output of explain analyze later.


Re: Slow table update

From
Tom Lane
Date:
Laszlo Nagy <gandalf@shopzeus.com> writes:
>> If the table has some sort of FK relations it might be being slowed by
>> the need to check a row meant to be deleted has any children.
>>
> If you look at my SQL, there is only one column to be updated. That
> column has no foreign key constraint.

That was not the question that was asked.

> My other idea was that there are so many indexes on this table, maybe
> the update is slow because of the indexes?

Updating indexes is certainly very far from being free.  How many is
"many"?

            regards, tom lane

Re: Slow table update

From
"Nikolas Everett"
Date:
The thing to keep in mind is that every update creates a new row version that has to be indexed for all indexes on the table, not just the indexes on the column updated.  You can test the weight of indexes by copying the table then trying your query again.

I've heard tell that if you have a table that updates frequently but needs to be indexed you can get some performance by breaking it into two tables with the same primary key.  One table with the stuff you index and another table with the stuff you update.

I hope this helps.


On Mon, Dec 22, 2008 at 8:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Laszlo Nagy <gandalf@shopzeus.com> writes:
>> If the table has some sort of FK relations it might be being slowed by
>> the need to check a row meant to be deleted has any children.
>>
> If you look at my SQL, there is only one column to be updated. That
> column has no foreign key constraint.

That was not the question that was asked.

> My other idea was that there are so many indexes on this table, maybe
> the update is slow because of the indexes?

Updating indexes is certainly very far from being free.  How many is
"many"?

                       regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Slow table update

From
Laszlo Nagy
Date:
>> My other idea was that there are so many indexes on this table, maybe
>> the update is slow because of the indexes?
>>
>
> Updating indexes is certainly very far from being free.  How many is
> "many"?
>
Number of indexes = 15.

3 indexex are on "text" type column, 500MB in size each.
Other are on int8 and timestamp columns, cca. 200MB each.



Re: Slow table update

From
"Gregory Williamson"
Date:

Laszlo Nagy wrote:
> >> My other idea was that there are so many indexes on this table, maybe
> >> the update is slow because of the indexes?
> >>    
> >
> > Updating indexes is certainly very far from being free.  How many is
> > "many"?
> >  
> Number of indexes = 15.
>
> 3 indexex are on "text" type column, 500MB in size each.
> Other are on int8 and timestamp columns, cca. 200MB each.

To me, that's "many" ;-)

That's a lot when you think about what happens when indexed columns are changed, deleted or inserted -- a lot of background work that the database has to do.

Inf 8.3 the HOT feature may help if the columns being updated are indexed ... what version of PostgreSQL is this again ? (Forgive my lack of memory -- the last few days I've forgotten a lot, heh heh.)

Any chances to reduce those to a bare minimum, perhaps using conditional index strategies or even some form of replication, so the primary uses indexes related to the updates and the mirror uses indexes related to the read-only / reporting needs ? Perhaps some form of staging table with no indexes to load, check data, etc. and then insert.

Any way to reduce those ? Check the usage via the system stats on table/index use and try removing some and testing to see what makes a difference.

HTH

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)

Re: Slow table update - SOLVED!

From
Laszlo Nagy
Date:
>
> Inf 8.3 the HOT feature may help if the columns being updated are
> indexed ... what version of PostgreSQL is this again ? (Forgive my
> lack of memory -- the last few days I've forgotten a lot, heh heh.)
>
8.3.5.

The colum that was being updated is part of one small index only.

>
> Any chances to reduce those to a bare minimum, perhaps using
> conditional index strategies or even some form of replication, so the
> primary uses indexes related to the updates and the mirror uses
> indexes related to the read-only / reporting needs ? Perhaps some form
> of staging table with no indexes to load, check data, etc. and then
> insert.
>
> Any way to reduce those ? Check the usage via the system stats on
> table/index use and try removing some and testing to see what makes a
> difference.
>
We tried to remove all indexes on a test system and the update was
speedy. We are going to try to reduce the row size also move static
description/name/textual data into a separate table, and leave
frequently updated data in the original one. We tested this theoretical
version:

Query returned successfully: 182752 rows affected, 56885 ms execution time.

This is much faster. However, this table is used by hundreds of
programs. Anyway, I got the answer to my question.

Thank you!

   Laszlo