Thread: Why Does UPDATE Take So Long?

Why Does UPDATE Take So Long?

From
Bill Thoen
Date:
Working with PG 8.1 I'm trying to update a char(4) column, and it's
taking a very long time; 15 minutes so far and no end in sight. From the
explain, it doesn't seem like it should take that long, and this column
is not indexed. Sure, there's 2.7 million records but it only takes a
few minutes to scan the whole file. Is there some special overhead I
should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too.

Or am I just expecting too much?

Here's the explain:
explain UPDATE farms SET prog_year='2007';
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on farms  (cost=0.00..59144.07 rows=2772207 width=54)
(1 row)


TIA,
- Bill Thoen


Re: Why Does UPDATE Take So Long?

From
Alan Hodgson
Date:
On Tuesday 30 September 2008, Bill Thoen <bthoen@gisnet.com> wrote:
> Working with PG 8.1 I'm trying to update a char(4) column, and it's
> taking a very long time; 15 minutes so far and no end in sight. From the
> explain, it doesn't seem like it should take that long, and this column
> is not indexed. Sure, there's 2.7 million records but it only takes a
> few minutes to scan the whole file. Is there some special overhead I
> should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too.

update creates new rows for all affected rows. If the table is indexed, it
creates new index rows for all affected rows in every index. Slow updates
is a common PostgreSQL complaint.

--
Alan

Re: Why Does UPDATE Take So Long?

From
Andreas Kretschmer
Date:
Bill Thoen <bthoen@gisnet.com> schrieb:

> Working with PG 8.1 I'm trying to update a char(4) column, and it's
> taking a very long time; 15 minutes so far and no end in sight. From the
> explain, it doesn't seem like it should take that long, and this column
> is not indexed. Sure, there's 2.7 million records but it only takes a
> few minutes to scan the whole file. Is there some special overhead I
> should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too.
>
> Or am I just expecting too much?
>
> Here's the explain:
> explain UPDATE farms SET prog_year='2007';
>                           QUERY PLAN
> ----------------------------------------------------------------
> Seq Scan on farms  (cost=0.00..59144.07 rows=2772207 width=54)
> (1 row)

Please provide us an EXPLAIN ANALYSE. But without a WHERE-condition a
seq-scan are logical, and PG has to rewrite the whole table and the
transaction-log.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Why Does UPDATE Take So Long?

From
Jeff Davis
Date:
On Tue, 2008-09-30 at 13:00 -0600, Bill Thoen wrote:
> Working with PG 8.1 I'm trying to update a char(4) column, and it's
> taking a very long time; 15 minutes so far and no end in sight. From the
> explain, it doesn't seem like it should take that long, and this column
> is not indexed. Sure, there's 2.7 million records but it only takes a
> few minutes to scan the whole file. Is there some special overhead I
> should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too.
>

In PostgreSQL, think of an UPDATE as an INSERT+DELETE. It's got to write
the new versions of the tuples, and it has to keep the old versions
until there are no more transactions that might reference those old
versions. Imagine if you canceled the query halfway through, for
example. Also, it has to create new index entries for the same reason,
which is expensive.

There are some optimizations in 8.3 for when the same tuple gets updated
many times, but that won't help you in this case.

Regards,
    Jeff Davis




Re: Why Does UPDATE Take So Long?

From
"Scott Marlowe"
Date:
On Tue, Sep 30, 2008 at 1:00 PM, Bill Thoen <bthoen@gisnet.com> wrote:
> Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a
> very long time; 15 minutes so far and no end in sight. From the explain, it
> doesn't seem like it should take that long, and this column is not indexed.
> Sure, there's 2.7 million records but it only takes a few minutes to scan
> the whole file. Is there some special overhead I should be aware of with an
> UPDATE? I VACUUMed and ANALYZEd first, too.
>
> Or am I just expecting too much?

The problem is that on older versions of pgsql, the db had to update
each index for each row updated as well as the rows.  The latest
version, with a low enough fill factor, can update non-indedexed
fields by using the free space in each page and not have to hit the
indexes.  But on 8.1 you don't get that optimization.

Re: Why Does UPDATE Take So Long?

From
Bill Thoen
Date:
Doesn't look like that's the problem. I moved my table over to another
Linux box running PG 8.3 and update performance was pretty bad there as
well. In the time that PG 8.3 was struggling with update there I created
a copy of my table on my PG 8.1 machine and inserted all columns with
one containing the altered values I wanted and that took less than two
minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still
thrashing away trying to update that one column that's not even part of
any index..

Something is really wrong with UPDATE in PostgreSQL I think.


Scott Marlowe wrote:
> On Tue, Sep 30, 2008 at 1:00 PM, Bill Thoen <bthoen@gisnet.com> wrote:
>
>> Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a
>> very long time; 15 minutes so far and no end in sight. From the explain, it
>> doesn't seem like it should take that long, and this column is not indexed.
>> Sure, there's 2.7 million records but it only takes a few minutes to scan
>> the whole file. Is there some special overhead I should be aware of with an
>> UPDATE? I VACUUMed and ANALYZEd first, too.
>>
>> Or am I just expecting too much?
>>
>
> The problem is that on older versions of pgsql, the db had to update
> each index for each row updated as well as the rows.  The latest
> version, with a low enough fill factor, can update non-indedexed
> fields by using the free space in each page and not have to hit the
> indexes.  But on 8.1 you don't get that optimization.
>
>


Re: Why Does UPDATE Take So Long?

From
Bill Moran
Date:
In response to Bill Thoen <bthoen@gisnet.com>:

> Doesn't look like that's the problem. I moved my table over to another
> Linux box running PG 8.3 and update performance was pretty bad there as
> well. In the time that PG 8.3 was struggling with update there I created
> a copy of my table on my PG 8.1 machine and inserted all columns with
> one containing the altered values I wanted and that took less than two
> minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still
> thrashing away trying to update that one column that's not even part of
> any index..
>
> Something is really wrong with UPDATE in PostgreSQL I think.

That's an interesting theory, although it's completely wrong and founded
in ridiculosity.  If something were "really wrong with UPDATE" in every
version of PostgreSQL, you'd be reading about it on the mailing lists,
and you won't.

What I suspect is that the typical tuning advice applies here.  I don't
see any information about your configuration or your hardware setup.
* What are shared_buffers set at?
* What do the checkpoint configs look like?
* In general, what does your postgresql.conf look like, how much tuning
  have you done?
* What is your hardware setup?  You're not running RAID 5 are you?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Why Does UPDATE Take So Long?

From
Alvaro Herrera
Date:
Bill Moran wrote:

> What I suspect is that the typical tuning advice applies here.  I don't
> see any information about your configuration or your hardware setup.
> * What are shared_buffers set at?
> * What do the checkpoint configs look like?
> * In general, what does your postgresql.conf look like, how much tuning
>   have you done?
> * What is your hardware setup?  You're not running RAID 5 are you?

Also, how many indexes does this table have?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Why Does UPDATE Take So Long?

From
Bill Thoen
Date:
Sorry for the hyperbole; I should have qualified that ridiculous
statement with "...on my machines." No doubt the problem has something
to do with configuration, because I don't know much about that. One of
my machines is running PG 8.1 on Linux Fedora Core 5. It's got an AMD
64bit CPU with a GB RAM and plenty of normal disk space (not running
RAID 5). The other machine is running Linux FC9 and PG 8.3. It's got a
i686 cpu with a GB RAM and also not using RAID.

Since I don't understand much about configuring PostgreSQL, both of
these machines use the default PostgreSQL configuration. I figured that
it was optimized for general use but maybe since my files are large-ish
(in the low multi-million record ranges) mayb ethta doesn't qualify as
general use. Anyway, here's the configuration settings you mentioned.
Shared_buffers are = 1000
#checkpoint_segments = 3
#checkpoint_timeout = 300
#checkpoint_warning = 30

What should I be looking for in the configuration to improve UPDATE
performance?

Thanks,
- Bill Thoen

Bill Moran wrote:
> In response to Bill Thoen <bthoen@gisnet.com>:
>
>
>> Doesn't look like that's the problem. I moved my table over to another
>> Linux box running PG 8.3 and update performance was pretty bad there as
>> well. In the time that PG 8.3 was struggling with update there I created
>> a copy of my table on my PG 8.1 machine and inserted all columns with
>> one containing the altered values I wanted and that took less than two
>> minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still
>> thrashing away trying to update that one column that's not even part of
>> any index..
>>
>> Something is really wrong with UPDATE in PostgreSQL I think.
>>
>
> That's an interesting theory, although it's completely wrong and founded
> in ridiculosity.  If something were "really wrong with UPDATE" in every
> version of PostgreSQL, you'd be reading about it on the mailing lists,
> and you won't.
>
> What I suspect is that the typical tuning advice applies here.  I don't
> see any information about your configuration or your hardware setup.
> * What are shared_buffers set at?
> * What do the checkpoint configs look like?
> * In general, what does your postgresql.conf look like, how much tuning
>   have you done?
> * What is your hardware setup?  You're not running RAID 5 are you?
>
>


Re: Why Does UPDATE Take So Long?

From
"Scott Marlowe"
Date:
On Tue, Sep 30, 2008 at 2:51 PM, Bill Thoen <bthoen@gisnet.com> wrote:
> Doesn't look like that's the problem. I moved my table over to another Linux
> box running PG 8.3 and update performance was pretty bad there as well. In
> the time that PG 8.3 was struggling with update there I created a copy of my
> table on my PG 8.1 machine and inserted all columns with one containing the
> altered values I wanted and that took less than two minutes. Meanwhile, a
> half-hour later, my PG 8.3 machine was still thrashing away trying to update
> that one column that's not even part of any index..
>
> Something is really wrong with UPDATE in PostgreSQL I think.

You'll remember I mentioned a low fill factor.  With a 100% fillfactor
you'll get no advantage from 8.3

The default tuning in postgresql allows it to run reasonably well on
things like laptops and desktops.  It's impossible to deliver it ready
for a 32 CPU 200 drive megaserver with the same configuration file
you'd use for a laptop.

Do a quick google search on postgresql performance tuning and you'll
turn up quite a few sites and wikis on it.  The 5 minute version:

set shared_buffers = 1/4 memory.
set work_mem to something like 8 megs.
Turn on the autovacuum daemon

Re: Why Does UPDATE Take So Long?

From
Alan Hodgson
Date:
On Tuesday 30 September 2008, Bill Thoen <bthoen@gisnet.com> wrote:
> Sorry for the hyperbole; I should have qualified that ridiculous
> statement with "...on my machines." No doubt the problem has something
> to do with configuration, because I don't know much about that. One of
> my machines is running PG 8.1 on Linux Fedora Core 5. It's got an AMD
> 64bit CPU with a GB RAM and plenty of normal disk space

You'll have to expand on the disk space thing ... the problem with updates
is all the random I/O when adding tuples to all the indexes. A good RAID
controller with write-back cache makes updates a lot less painful.


--
Alan

Re: Why Does UPDATE Take So Long?

From
Bill Thoen
Date:
Alvaro Herrera wrote:
> Bill Moran wrote:
>
>
>> What I suspect is that the typical tuning advice applies here.  I don't
>> see any information about your configuration or your hardware setup.
>> * What are shared_buffers set at?
>> * What do the checkpoint configs look like?
>> * In general, what does your postgresql.conf look like, how much tuning
>>   have you done?
>> * What is your hardware setup?  You're not running RAID 5 are you?
>>
>
> Also, how many indexes does this table have?
>
>
Two, but the column I'm updating isn't included in either one of them.


Re: Why Does UPDATE Take So Long?

From
Jeff Davis
Date:
On Tue, 2008-09-30 at 16:34 -0600, Bill Thoen wrote:
> > Also, how many indexes does this table have?
> >
> >
> Two, but the column I'm updating isn't included in either one of them.
>

Even if the column is not indexed, when a new row is created (which is
the case with UPDATE) a new index entry must be made in each index to
point to the new row.

    Regards,
        Jeff Davis


Re: Why Does UPDATE Take So Long?

From
"Scott Marlowe"
Date:
On Tue, Sep 30, 2008 at 4:37 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2008-09-30 at 16:34 -0600, Bill Thoen wrote:
>> > Also, how many indexes does this table have?
>> >
>> >
>> Two, but the column I'm updating isn't included in either one of them.
>>
>
> Even if the column is not indexed, when a new row is created (which is
> the case with UPDATE) a new index entry must be made in each index to
> point to the new row.

Unless you're:

running 8.3 or later AND
have enough free space for the new tuple to go in the same page.

for instance here's a sample from my db at work:

select  n_tup_upd, n_tup_hot_upd  from pg_stat_user_tables where
schemaname='public' order by n_tup_hot_upd desc limit 20; n_tup_upd |
n_tup_hot_upd
-----------+---------------
  52872193 |       5665884
   4635216 |       3876594
    264194 |        261693
    159171 |        153360
    242383 |         75591
     97962 |         72665
     86800 |         66914
     57300 |         56013
    284929 |         50079
     43411 |         37527
     43283 |         33285
     30657 |         28132
     31705 |         22572
     26358 |         18495
     19296 |         18411
     22299 |         17065
     16343 |         15981
     23311 |         15748
     13575 |         13330
     12808 |         12536

If you notice some of those tables have well over 75% of the updates
are HOT.    Our load dropped from 15 or 20 to 1 or 2 going to 8.3.

Re: Why Does UPDATE Take So Long?

From
Bill Thoen
Date:
Many thanks to everyone who helped me with this. It'll be a while before
I understand enough to be able to do a good job of tuning my system's
configuration, but there seem to be a few basics I can apply right away.
Also pointing out how UPDATE actually works was very helpful. Since I'm
at the data building stage, most of my updates  will apply to an entire
column and in cases like that it's much more efficient to simply use
joins into a new table and delete the old. In this case:

CREATE TABLE farm2 (LIKE farms);
INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT
farm_id, fips_cd, farm_nbr, '2007' FROM farms;
DROP TABLE farms;
ALTER TABLE farm2 RENAME TO farms;
CREATE UNIQUE INDEX farms_id_key ON farms(farm_id);
CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr);

takes only a few minutes for this 2.77 million record table. The alternative

UPDATE farms SET prog_year='2007';

takes hours! I don't know how many because I gave up after waiting for
1.5 hrs.

Thanks all,
- Bill Thoen


"object references" and renaming was: Why Does UPDATE Take So Long?

From
Ivan Sergio Borgonovo
Date:
On Wed, 01 Oct 2008 08:32:16 -0600
Bill Thoen <bthoen@gisnet.com> wrote:

> CREATE TABLE farm2 (LIKE farms);
> INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT
> farm_id, fips_cd, farm_nbr, '2007' FROM farms;
> DROP TABLE farms;
> ALTER TABLE farm2 RENAME TO farms;
> CREATE UNIQUE INDEX farms_id_key ON farms(farm_id);
> CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr);

Is this kind of stuff going to affect any reference to the farm
table? eg. inside functions, triggers etc?
what if:
create table farm_rel (
  farm_id [sometype] references farm(farm_id) on delete cascade,
...
);

and I

alter table farm rename to farm_t;
alter table farm2 rename to farm;
drop table farm_t;

or similar situations...

where could I incur in troubles using RENAME (for tables, columns
etc...)?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: "object references" and renaming was: Why Does UPDATE Take So Long?

From
"Filip Rembiałkowski"
Date:
2008/10/1 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> On Wed, 01 Oct 2008 08:32:16 -0600
> Bill Thoen <bthoen@gisnet.com> wrote:
>
>> CREATE TABLE farm2 (LIKE farms);
>> INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT
>> farm_id, fips_cd, farm_nbr, '2007' FROM farms;
>> DROP TABLE farms;

this will fail if there are FK references to farms table.

>> ALTER TABLE farm2 RENAME TO farms;
>> CREATE UNIQUE INDEX farms_id_key ON farms(farm_id);
>> CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr);
>
> Is this kind of stuff going to affect any reference to the farm
> table? eg. inside functions, triggers etc?

no, not in functions source.
only FK references will be affected. FK triggers are handled internally.
I don't know if any other kind of object references are handled this way.

> what if:
> create table farm_rel (
>  farm_id [sometype] references farm(farm_id) on delete cascade,
> ...
> );
>
> and I
>
> alter table farm rename to farm_t;
> alter table farm2 rename to farm;
> drop table farm_t;

well, check it :) I did:

filip@filip=# \d farm_rel
    Table "public.farm_rel"
 Column  |  Type   | Modifiers
---------+---------+-----------
 farm_id | integer |
Foreign-key constraints:
    "farm_rel_farm_id_fkey" FOREIGN KEY (farm_id) REFERENCES
farm_t(farm_id) ON DELETE CASCADE

filip@filip=# drop table farm_t;
NOTICE:  constraint farm_rel_farm_id_fkey on table farm_rel depends on
table farm_t
ERROR:  cannot drop table farm_t because other objects depend on it
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

>
> or similar situations...
>
> where could I incur in troubles using RENAME (for tables, columns
> etc...)?


if you reference renamed objects from, say, pl/pgsql function source,
it can effect in broken code.

filip@filip=# create function get_farm_id() returns int as $$SELECT
farm_id from farm limit 1$$ language sql;
CREATE FUNCTION
filip@filip=# alter table farm rename to farm_t;
ALTER TABLE
filip@filip=# select get_farm_id();
ERROR:  relation "farm" does not exist
CONTEXT:  SQL function "get_farm_id" during startup


OTOH, your rename trick will work for such functions :)



--
Filip Rembiałkowski

Re: "object references" and renaming was: Why Does UPDATE Take So Long?

From
Ivan Sergio Borgonovo
Date:
On Wed, 1 Oct 2008 22:19:29 +0200
"Filip Rembiałkowski" <plk.zuber@gmail.com> wrote:

> > Is this kind of stuff going to affect any reference to the farm
> > table? eg. inside functions, triggers etc?

> no, not in functions source.

I've read somewhere that create *or replace* should be used exactly
because internally postgresql reference functions by id.
dropping and recreating a function will change the id.
Or was I daydreaming and I have memories of another DB?

I think that postgresql store plpgsql functions as "text". I don't
know if it caches plans, "compile" the function somehow etc...
So I was wondering if renaming stuff referenced in a function may
have some unexpected effect.

> only FK references will be affected. FK triggers are handled
> internally. I don't know if any other kind of object references

handled internally?

> are handled this way.

> > what if:
> > create table farm_rel (
> >  farm_id [sometype] references farm(farm_id) on delete cascade,
> > ...
> > );
> >
> > and I
> >
> > alter table farm rename to farm_t;
> > alter table farm2 rename to farm;
> > drop table farm_t;

> well, check it :) I did:

Well I wrote the example not to let someone check for me... ;) just
to give an example of a "renaming" situation where I should be
cautious...
What else could be renamed other than tables and columns?
triggers...?
I'd expect renaming columns follow the same pattern of renaming
tables.

> > or similar situations...
> >
> > where could I incur in troubles using RENAME (for tables, columns
> > etc...)?

> OTOH, your rename trick will work for such functions :)

I think the problem arise from dependencies following the name and
dependencies following the object (table).
It seems that in these cases what is chosen is due to the difficulty
to do otherwise.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it