Thread: Why Does UPDATE Take So Long?
Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a very long time; 15 minutes so far and no end in sight. From the explain, it doesn't seem like it should take that long, and this column is not indexed. Sure, there's 2.7 million records but it only takes a few minutes to scan the whole file. Is there some special overhead I should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too. Or am I just expecting too much? Here's the explain: explain UPDATE farms SET prog_year='2007'; QUERY PLAN ---------------------------------------------------------------- Seq Scan on farms (cost=0.00..59144.07 rows=2772207 width=54) (1 row) TIA, - Bill Thoen
On Tuesday 30 September 2008, Bill Thoen <bthoen@gisnet.com> wrote: > Working with PG 8.1 I'm trying to update a char(4) column, and it's > taking a very long time; 15 minutes so far and no end in sight. From the > explain, it doesn't seem like it should take that long, and this column > is not indexed. Sure, there's 2.7 million records but it only takes a > few minutes to scan the whole file. Is there some special overhead I > should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too. update creates new rows for all affected rows. If the table is indexed, it creates new index rows for all affected rows in every index. Slow updates is a common PostgreSQL complaint. -- Alan
Bill Thoen <bthoen@gisnet.com> schrieb: > Working with PG 8.1 I'm trying to update a char(4) column, and it's > taking a very long time; 15 minutes so far and no end in sight. From the > explain, it doesn't seem like it should take that long, and this column > is not indexed. Sure, there's 2.7 million records but it only takes a > few minutes to scan the whole file. Is there some special overhead I > should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too. > > Or am I just expecting too much? > > Here's the explain: > explain UPDATE farms SET prog_year='2007'; > QUERY PLAN > ---------------------------------------------------------------- > Seq Scan on farms (cost=0.00..59144.07 rows=2772207 width=54) > (1 row) Please provide us an EXPLAIN ANALYSE. But without a WHERE-condition a seq-scan are logical, and PG has to rewrite the whole table and the transaction-log. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Tue, 2008-09-30 at 13:00 -0600, Bill Thoen wrote: > Working with PG 8.1 I'm trying to update a char(4) column, and it's > taking a very long time; 15 minutes so far and no end in sight. From the > explain, it doesn't seem like it should take that long, and this column > is not indexed. Sure, there's 2.7 million records but it only takes a > few minutes to scan the whole file. Is there some special overhead I > should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too. > In PostgreSQL, think of an UPDATE as an INSERT+DELETE. It's got to write the new versions of the tuples, and it has to keep the old versions until there are no more transactions that might reference those old versions. Imagine if you canceled the query halfway through, for example. Also, it has to create new index entries for the same reason, which is expensive. There are some optimizations in 8.3 for when the same tuple gets updated many times, but that won't help you in this case. Regards, Jeff Davis
On Tue, Sep 30, 2008 at 1:00 PM, Bill Thoen <bthoen@gisnet.com> wrote: > Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a > very long time; 15 minutes so far and no end in sight. From the explain, it > doesn't seem like it should take that long, and this column is not indexed. > Sure, there's 2.7 million records but it only takes a few minutes to scan > the whole file. Is there some special overhead I should be aware of with an > UPDATE? I VACUUMed and ANALYZEd first, too. > > Or am I just expecting too much? The problem is that on older versions of pgsql, the db had to update each index for each row updated as well as the rows. The latest version, with a low enough fill factor, can update non-indedexed fields by using the free space in each page and not have to hit the indexes. But on 8.1 you don't get that optimization.
Doesn't look like that's the problem. I moved my table over to another Linux box running PG 8.3 and update performance was pretty bad there as well. In the time that PG 8.3 was struggling with update there I created a copy of my table on my PG 8.1 machine and inserted all columns with one containing the altered values I wanted and that took less than two minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still thrashing away trying to update that one column that's not even part of any index.. Something is really wrong with UPDATE in PostgreSQL I think. Scott Marlowe wrote: > On Tue, Sep 30, 2008 at 1:00 PM, Bill Thoen <bthoen@gisnet.com> wrote: > >> Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a >> very long time; 15 minutes so far and no end in sight. From the explain, it >> doesn't seem like it should take that long, and this column is not indexed. >> Sure, there's 2.7 million records but it only takes a few minutes to scan >> the whole file. Is there some special overhead I should be aware of with an >> UPDATE? I VACUUMed and ANALYZEd first, too. >> >> Or am I just expecting too much? >> > > The problem is that on older versions of pgsql, the db had to update > each index for each row updated as well as the rows. The latest > version, with a low enough fill factor, can update non-indedexed > fields by using the free space in each page and not have to hit the > indexes. But on 8.1 you don't get that optimization. > >
In response to Bill Thoen <bthoen@gisnet.com>: > Doesn't look like that's the problem. I moved my table over to another > Linux box running PG 8.3 and update performance was pretty bad there as > well. In the time that PG 8.3 was struggling with update there I created > a copy of my table on my PG 8.1 machine and inserted all columns with > one containing the altered values I wanted and that took less than two > minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still > thrashing away trying to update that one column that's not even part of > any index.. > > Something is really wrong with UPDATE in PostgreSQL I think. That's an interesting theory, although it's completely wrong and founded in ridiculosity. If something were "really wrong with UPDATE" in every version of PostgreSQL, you'd be reading about it on the mailing lists, and you won't. What I suspect is that the typical tuning advice applies here. I don't see any information about your configuration or your hardware setup. * What are shared_buffers set at? * What do the checkpoint configs look like? * In general, what does your postgresql.conf look like, how much tuning have you done? * What is your hardware setup? You're not running RAID 5 are you? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Bill Moran wrote: > What I suspect is that the typical tuning advice applies here. I don't > see any information about your configuration or your hardware setup. > * What are shared_buffers set at? > * What do the checkpoint configs look like? > * In general, what does your postgresql.conf look like, how much tuning > have you done? > * What is your hardware setup? You're not running RAID 5 are you? Also, how many indexes does this table have? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Sorry for the hyperbole; I should have qualified that ridiculous statement with "...on my machines." No doubt the problem has something to do with configuration, because I don't know much about that. One of my machines is running PG 8.1 on Linux Fedora Core 5. It's got an AMD 64bit CPU with a GB RAM and plenty of normal disk space (not running RAID 5). The other machine is running Linux FC9 and PG 8.3. It's got a i686 cpu with a GB RAM and also not using RAID. Since I don't understand much about configuring PostgreSQL, both of these machines use the default PostgreSQL configuration. I figured that it was optimized for general use but maybe since my files are large-ish (in the low multi-million record ranges) mayb ethta doesn't qualify as general use. Anyway, here's the configuration settings you mentioned. Shared_buffers are = 1000 #checkpoint_segments = 3 #checkpoint_timeout = 300 #checkpoint_warning = 30 What should I be looking for in the configuration to improve UPDATE performance? Thanks, - Bill Thoen Bill Moran wrote: > In response to Bill Thoen <bthoen@gisnet.com>: > > >> Doesn't look like that's the problem. I moved my table over to another >> Linux box running PG 8.3 and update performance was pretty bad there as >> well. In the time that PG 8.3 was struggling with update there I created >> a copy of my table on my PG 8.1 machine and inserted all columns with >> one containing the altered values I wanted and that took less than two >> minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still >> thrashing away trying to update that one column that's not even part of >> any index.. >> >> Something is really wrong with UPDATE in PostgreSQL I think. >> > > That's an interesting theory, although it's completely wrong and founded > in ridiculosity. If something were "really wrong with UPDATE" in every > version of PostgreSQL, you'd be reading about it on the mailing lists, > and you won't. > > What I suspect is that the typical tuning advice applies here. I don't > see any information about your configuration or your hardware setup. > * What are shared_buffers set at? > * What do the checkpoint configs look like? > * In general, what does your postgresql.conf look like, how much tuning > have you done? > * What is your hardware setup? You're not running RAID 5 are you? > >
On Tue, Sep 30, 2008 at 2:51 PM, Bill Thoen <bthoen@gisnet.com> wrote: > Doesn't look like that's the problem. I moved my table over to another Linux > box running PG 8.3 and update performance was pretty bad there as well. In > the time that PG 8.3 was struggling with update there I created a copy of my > table on my PG 8.1 machine and inserted all columns with one containing the > altered values I wanted and that took less than two minutes. Meanwhile, a > half-hour later, my PG 8.3 machine was still thrashing away trying to update > that one column that's not even part of any index.. > > Something is really wrong with UPDATE in PostgreSQL I think. You'll remember I mentioned a low fill factor. With a 100% fillfactor you'll get no advantage from 8.3 The default tuning in postgresql allows it to run reasonably well on things like laptops and desktops. It's impossible to deliver it ready for a 32 CPU 200 drive megaserver with the same configuration file you'd use for a laptop. Do a quick google search on postgresql performance tuning and you'll turn up quite a few sites and wikis on it. The 5 minute version: set shared_buffers = 1/4 memory. set work_mem to something like 8 megs. Turn on the autovacuum daemon
On Tuesday 30 September 2008, Bill Thoen <bthoen@gisnet.com> wrote: > Sorry for the hyperbole; I should have qualified that ridiculous > statement with "...on my machines." No doubt the problem has something > to do with configuration, because I don't know much about that. One of > my machines is running PG 8.1 on Linux Fedora Core 5. It's got an AMD > 64bit CPU with a GB RAM and plenty of normal disk space You'll have to expand on the disk space thing ... the problem with updates is all the random I/O when adding tuples to all the indexes. A good RAID controller with write-back cache makes updates a lot less painful. -- Alan
Alvaro Herrera wrote: > Bill Moran wrote: > > >> What I suspect is that the typical tuning advice applies here. I don't >> see any information about your configuration or your hardware setup. >> * What are shared_buffers set at? >> * What do the checkpoint configs look like? >> * In general, what does your postgresql.conf look like, how much tuning >> have you done? >> * What is your hardware setup? You're not running RAID 5 are you? >> > > Also, how many indexes does this table have? > > Two, but the column I'm updating isn't included in either one of them.
On Tue, 2008-09-30 at 16:34 -0600, Bill Thoen wrote: > > Also, how many indexes does this table have? > > > > > Two, but the column I'm updating isn't included in either one of them. > Even if the column is not indexed, when a new row is created (which is the case with UPDATE) a new index entry must be made in each index to point to the new row. Regards, Jeff Davis
On Tue, Sep 30, 2008 at 4:37 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2008-09-30 at 16:34 -0600, Bill Thoen wrote: >> > Also, how many indexes does this table have? >> > >> > >> Two, but the column I'm updating isn't included in either one of them. >> > > Even if the column is not indexed, when a new row is created (which is > the case with UPDATE) a new index entry must be made in each index to > point to the new row. Unless you're: running 8.3 or later AND have enough free space for the new tuple to go in the same page. for instance here's a sample from my db at work: select n_tup_upd, n_tup_hot_upd from pg_stat_user_tables where schemaname='public' order by n_tup_hot_upd desc limit 20; n_tup_upd | n_tup_hot_upd -----------+--------------- 52872193 | 5665884 4635216 | 3876594 264194 | 261693 159171 | 153360 242383 | 75591 97962 | 72665 86800 | 66914 57300 | 56013 284929 | 50079 43411 | 37527 43283 | 33285 30657 | 28132 31705 | 22572 26358 | 18495 19296 | 18411 22299 | 17065 16343 | 15981 23311 | 15748 13575 | 13330 12808 | 12536 If you notice some of those tables have well over 75% of the updates are HOT. Our load dropped from 15 or 20 to 1 or 2 going to 8.3.
Many thanks to everyone who helped me with this. It'll be a while before I understand enough to be able to do a good job of tuning my system's configuration, but there seem to be a few basics I can apply right away. Also pointing out how UPDATE actually works was very helpful. Since I'm at the data building stage, most of my updates will apply to an entire column and in cases like that it's much more efficient to simply use joins into a new table and delete the old. In this case: CREATE TABLE farm2 (LIKE farms); INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT farm_id, fips_cd, farm_nbr, '2007' FROM farms; DROP TABLE farms; ALTER TABLE farm2 RENAME TO farms; CREATE UNIQUE INDEX farms_id_key ON farms(farm_id); CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr); takes only a few minutes for this 2.77 million record table. The alternative UPDATE farms SET prog_year='2007'; takes hours! I don't know how many because I gave up after waiting for 1.5 hrs. Thanks all, - Bill Thoen
"object references" and renaming was: Why Does UPDATE Take So Long?
From
Ivan Sergio Borgonovo
Date:
On Wed, 01 Oct 2008 08:32:16 -0600 Bill Thoen <bthoen@gisnet.com> wrote: > CREATE TABLE farm2 (LIKE farms); > INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT > farm_id, fips_cd, farm_nbr, '2007' FROM farms; > DROP TABLE farms; > ALTER TABLE farm2 RENAME TO farms; > CREATE UNIQUE INDEX farms_id_key ON farms(farm_id); > CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr); Is this kind of stuff going to affect any reference to the farm table? eg. inside functions, triggers etc? what if: create table farm_rel ( farm_id [sometype] references farm(farm_id) on delete cascade, ... ); and I alter table farm rename to farm_t; alter table farm2 rename to farm; drop table farm_t; or similar situations... where could I incur in troubles using RENAME (for tables, columns etc...)? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: "object references" and renaming was: Why Does UPDATE Take So Long?
From
"Filip Rembiałkowski"
Date:
2008/10/1 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Wed, 01 Oct 2008 08:32:16 -0600 > Bill Thoen <bthoen@gisnet.com> wrote: > >> CREATE TABLE farm2 (LIKE farms); >> INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT >> farm_id, fips_cd, farm_nbr, '2007' FROM farms; >> DROP TABLE farms; this will fail if there are FK references to farms table. >> ALTER TABLE farm2 RENAME TO farms; >> CREATE UNIQUE INDEX farms_id_key ON farms(farm_id); >> CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr); > > Is this kind of stuff going to affect any reference to the farm > table? eg. inside functions, triggers etc? no, not in functions source. only FK references will be affected. FK triggers are handled internally. I don't know if any other kind of object references are handled this way. > what if: > create table farm_rel ( > farm_id [sometype] references farm(farm_id) on delete cascade, > ... > ); > > and I > > alter table farm rename to farm_t; > alter table farm2 rename to farm; > drop table farm_t; well, check it :) I did: filip@filip=# \d farm_rel Table "public.farm_rel" Column | Type | Modifiers ---------+---------+----------- farm_id | integer | Foreign-key constraints: "farm_rel_farm_id_fkey" FOREIGN KEY (farm_id) REFERENCES farm_t(farm_id) ON DELETE CASCADE filip@filip=# drop table farm_t; NOTICE: constraint farm_rel_farm_id_fkey on table farm_rel depends on table farm_t ERROR: cannot drop table farm_t because other objects depend on it HINT: Use DROP ... CASCADE to drop the dependent objects too. > > or similar situations... > > where could I incur in troubles using RENAME (for tables, columns > etc...)? if you reference renamed objects from, say, pl/pgsql function source, it can effect in broken code. filip@filip=# create function get_farm_id() returns int as $$SELECT farm_id from farm limit 1$$ language sql; CREATE FUNCTION filip@filip=# alter table farm rename to farm_t; ALTER TABLE filip@filip=# select get_farm_id(); ERROR: relation "farm" does not exist CONTEXT: SQL function "get_farm_id" during startup OTOH, your rename trick will work for such functions :) -- Filip Rembiałkowski
Re: "object references" and renaming was: Why Does UPDATE Take So Long?
From
Ivan Sergio Borgonovo
Date:
On Wed, 1 Oct 2008 22:19:29 +0200 "Filip Rembiałkowski" <plk.zuber@gmail.com> wrote: > > Is this kind of stuff going to affect any reference to the farm > > table? eg. inside functions, triggers etc? > no, not in functions source. I've read somewhere that create *or replace* should be used exactly because internally postgresql reference functions by id. dropping and recreating a function will change the id. Or was I daydreaming and I have memories of another DB? I think that postgresql store plpgsql functions as "text". I don't know if it caches plans, "compile" the function somehow etc... So I was wondering if renaming stuff referenced in a function may have some unexpected effect. > only FK references will be affected. FK triggers are handled > internally. I don't know if any other kind of object references handled internally? > are handled this way. > > what if: > > create table farm_rel ( > > farm_id [sometype] references farm(farm_id) on delete cascade, > > ... > > ); > > > > and I > > > > alter table farm rename to farm_t; > > alter table farm2 rename to farm; > > drop table farm_t; > well, check it :) I did: Well I wrote the example not to let someone check for me... ;) just to give an example of a "renaming" situation where I should be cautious... What else could be renamed other than tables and columns? triggers...? I'd expect renaming columns follow the same pattern of renaming tables. > > or similar situations... > > > > where could I incur in troubles using RENAME (for tables, columns > > etc...)? > OTOH, your rename trick will work for such functions :) I think the problem arise from dependencies following the name and dependencies following the object (table). It seems that in these cases what is chosen is due to the difficulty to do otherwise. -- Ivan Sergio Borgonovo http://www.webthatworks.it