Thread: Slow table update
SQL: update product set sz_category_id=null where am_style_kw1 is not null and sz_category_id is not null query plan: "Seq Scan on product (cost=0.00..647053.30 rows=580224 width=1609)" " Filter: ((am_style_kw1 IS NOT NULL) AND (sz_category_id IS NOT NULL))" Information on the table: row count ~ 2 million table size: 4841 MB toast table size: 277mb indexes size: 4434 MB Computer: FreeBSD 7.0 stable, Dual Xeon Quad code 5420 2.5GHZ, 8GB memory, 6 ES SATA disks in hw RAID 6 (+2GB write back cache) for the database. Autovacuum is enabled. We also perform "vacuum analyze" on the database, each day. Here are some non-default values from postgresql.conf: shared_buffers=400MB maintenance_work_mem = 256MB max_fsm_pages = 1000000 There was almost no load on the machine (CPU: mostly idle, IO: approx. 5% total) when we started this update. Maybe I'm wrong with this, but here is a quick calculation: the RAID array should do at least 100MB/sec. Reading the whole table should not take more than 1 min. I think about 20% of the rows should have been updated. Writting out all changes should not take too much time. I believe that this update should have been completed within 2-3 minutes. In reality, after 2600 seconds I have cancelled the query. We monitored disk I/O and it was near 100% all the time. What is wrong? Thank you, Laszlo
Laszlo Nagy wrote: > SQL: > > update product set sz_category_id=null where am_style_kw1 is not null > and sz_category_id is not null Hmm, this query: select count(*) from product where am_style_kw1 is not null and sz_category_id is not null and sz_category_id<>4809 opens in 10 seconds. The update would not finish in 2600 seconds. I don't understand. L
Laszlo Nagy wrote:
>
> Laszlo Nagy wrote:
> > SQL:
> >
> > update product set sz_category_id=null where am_style_kw1 is not null
> > and sz_category_id is not null
> Hmm, this query:
>
> ?select count(*) from product where am_style_kw1 is not null and
> sz_category_id is not null and sz_category_id<>4809
>
> opens in 10 seconds. The update would not finish in 2600 seconds. I
> don't understand.
If the table has some sort of FK relations it might be being slowed by the need to check a row meant to be deleted has any children.
Perhaps triggers ?
If the table is very bloated with lots of dead rows (but you did say you vacuum frequently and check the results to make sure they are effective?) that would slow it down.
A long running transaction elsewhere that is blocking the delete ? Did you check the locks ?
HTH,
Greg Williamson
Senior DBA
DigitalGlobe
Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
(My corporate masters made me say this.)
> > If the table has some sort of FK relations it might be being slowed by > the need to check a row meant to be deleted has any children. > If you look at my SQL, there is only one column to be updated. That column has no foreign key constraint. (It should have, but we did not want to add that constraint in order to speed up updates.) > > > Perhaps triggers ? > Table "product" has no triggers. > > > If the table is very bloated with lots of dead rows (but you did say > you vacuum frequently and check the results to make sure they are > effective?) that would slow it down. > I'm not sure how to check if the vacuum was effective. But we have max_fsm_pages=1000000 in postgresql.conf, and I do not get any errors from the daily vacuum script, so I presume that the table hasn't got too many dead rows. Anyway, the table size is only 4GB. Even if half of the rows are dead, the update should run quite quickly. Another argument is that when I "select count(*)" instead of "UPDATE", then I get the result in 10 seconds. I don't think that dead rows can make such a big difference between reading and writing. My other idea was that there are so many indexes on this table, maybe the update is slow because of the indexes? The column being updated has only one index on it, and that is 200MB. But I have heard somewhere that because of PostgreSQL's multi version system, sometimes the system needs to update indexes with columns that are not being updated. I'm not sure. Might this be the problem? > > > A long running transaction elsewhere that is blocking the delete ? Did > you check the locks ? > Sorry, this was an update. A blocking transaction would never explain why the disk I/O went up to 100% for 2600 seconds. L
I just tested the same on a test machine. It only has one processor 1GB memory, and one SATA disk. The same "select count(*)" was 58 seconds. I started the same UPDATE with EXPLAIN ANALYZE. It is running since 1000 seconds. I'm now 100% sure that the problem is with the database, because this machine has nothing but a postgresql server running on it. I'll post the output of explain analyze later.
Laszlo Nagy <gandalf@shopzeus.com> writes: >> If the table has some sort of FK relations it might be being slowed by >> the need to check a row meant to be deleted has any children. >> > If you look at my SQL, there is only one column to be updated. That > column has no foreign key constraint. That was not the question that was asked. > My other idea was that there are so many indexes on this table, maybe > the update is slow because of the indexes? Updating indexes is certainly very far from being free. How many is "many"? regards, tom lane
I've heard tell that if you have a table that updates frequently but needs to be indexed you can get some performance by breaking it into two tables with the same primary key. One table with the stuff you index and another table with the stuff you update.
I hope this helps.
Laszlo Nagy <gandalf@shopzeus.com> writes:That was not the question that was asked.
>> If the table has some sort of FK relations it might be being slowed by
>> the need to check a row meant to be deleted has any children.
>>
> If you look at my SQL, there is only one column to be updated. That
> column has no foreign key constraint.Updating indexes is certainly very far from being free. How many is
> My other idea was that there are so many indexes on this table, maybe
> the update is slow because of the indexes?
"many"?
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
>> My other idea was that there are so many indexes on this table, maybe >> the update is slow because of the indexes? >> > > Updating indexes is certainly very far from being free. How many is > "many"? > Number of indexes = 15. 3 indexex are on "text" type column, 500MB in size each. Other are on int8 and timestamp columns, cca. 200MB each.
Laszlo Nagy wrote:
> >> My other idea was that there are so many indexes on this table, maybe
> >> the update is slow because of the indexes?
> >>
> >
> > Updating indexes is certainly very far from being free. How many is
> > "many"?
> >
> Number of indexes = 15.
>
> 3 indexex are on "text" type column, 500MB in size each.
> Other are on int8 and timestamp columns, cca. 200MB each.
To me, that's "many" ;-)
That's a lot when you think about what happens when indexed columns are changed, deleted or inserted -- a lot of background work that the database has to do.
Inf 8.3 the HOT feature may help if the columns being updated are indexed ... what version of PostgreSQL is this again ? (Forgive my lack of memory -- the last few days I've forgotten a lot, heh heh.)
Any chances to reduce those to a bare minimum, perhaps using conditional index strategies or even some form of replication, so the primary uses indexes related to the updates and the mirror uses indexes related to the read-only / reporting needs ? Perhaps some form of staging table with no indexes to load, check data, etc. and then insert.
Any way to reduce those ? Check the usage via the system stats on table/index use and try removing some and testing to see what makes a difference.
HTH
Greg Williamson
Senior DBA
DigitalGlobe
Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
(My corporate masters made me say this.)
> > Inf 8.3 the HOT feature may help if the columns being updated are > indexed ... what version of PostgreSQL is this again ? (Forgive my > lack of memory -- the last few days I've forgotten a lot, heh heh.) > 8.3.5. The colum that was being updated is part of one small index only. > > Any chances to reduce those to a bare minimum, perhaps using > conditional index strategies or even some form of replication, so the > primary uses indexes related to the updates and the mirror uses > indexes related to the read-only / reporting needs ? Perhaps some form > of staging table with no indexes to load, check data, etc. and then > insert. > > Any way to reduce those ? Check the usage via the system stats on > table/index use and try removing some and testing to see what makes a > difference. > We tried to remove all indexes on a test system and the update was speedy. We are going to try to reduce the row size also move static description/name/textual data into a separate table, and leave frequently updated data in the original one. We tested this theoretical version: Query returned successfully: 182752 rows affected, 56885 ms execution time. This is much faster. However, this table is used by hundreds of programs. Anyway, I got the answer to my question. Thank you! Laszlo