performance problems on updates on large tables with indexes - Mailing list pgsql-bugs

From Reinhard Max
Subject performance problems on updates on large tables with indexes
Date
Msg-id Pine.LNX.4.44.0202271651400.12673-100000@Wotan.suse.de
Whole thread Raw
Responses Re: performance problems on updates on large tables with indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi,

I am not sure whether the behaviour described below is to be expected
or to be considered as a bug, but I bet you can tell me ;)

I have a table defined as follows:

CREATE TABLE "foo" (
        "id" int NOT NULL,
        "name" character varying(256) NOT NULL,
        "street" character varying(256) NOT NULL,
        "zip" character varying(128) NOT NULL,
        "city" character varying(256) NOT NULL,
        "tel" character varying(256),
        "fax" character varying(30)
);

It has a unique index on id, non-unique indexes on all othe columns,
and contains approx. 350000 rows. The following statement touches all
rows except a few hundred and takes several minutes to complete unless
all the indexes have been dropped before:

    UPDATE foo SET fax=' ' WHERE fax=' ';

When I try to run the same statement for a second time, it takes even
longer (cancelled it after about half an hour).

While the update runs, the backend process is mostly I/O bound which
becomes even worse when a checkpoint process is being started.
I've been able to improve this a bit by setting

    shared_buffers = 16000
    wal_buffers = 16
    wal_files = 32
    checkpoint_segments = 16

in postgresql.conf .

The results of a "VACCUM ANALYZE" (which takes also unusually long)
done after the update looks like every single row has been rewritten
instead of just updated and all the indexes have been changed, too.

DEBUG:  --Relation foo--
DEBUG:  Index idx_foo_street: Pages 3244; Tuples 352072: Deleted 351742.
        CPU 0.48s/3.55u sec elapsed 25.31 sec.
DEBUG:  Index idx_foo_zip: Pages 2664; Tuples 352072: Deleted 351742.
        CPU 0.57s/3.89u sec elapsed 29.15 sec.
DEBUG:  Index idx_foo_tel: Pages 2645; Tuples 352072: Deleted 351742.
        CPU 0.59s/3.66u sec elapsed 17.31 sec.
DEBUG:  Index idx_foo_fax: Pages 2183; Tuples 352072: Deleted 351742.
        CPU 0.43s/3.08u sec elapsed 69.76 sec.
DEBUG:  Index idx_foo_name: Pages 3386; Tuples 352072: Deleted 351742.
        CPU 0.77s/4.15u sec elapsed 16.12 sec.
DEBUG:  Index idx_foo_city: Pages 3093; Tuples 352072: Deleted 351742.
        CPU 0.67s/3.66u sec elapsed 20.39 sec.
DEBUG:  Index foo_pkey: Pages 1815; Tuples 352072: Deleted 351742.
        CPU 0.43s/3.95u sec elapsed 8.03 sec.
DEBUG:  Removed 351742 tuples in 5740 pages.
        CPU 0.85s/0.85u sec elapsed 5.82 sec.
DEBUG:  Pages 11462: Changed 11462, Empty 0; Tup 352072: Vac 351742,
Keep 0, UnUsed 0.
        Total CPU 5.63s/27.13u sec elapsed 217.72 sec.
DEBUG:  Analyzing foo


Is that intentional? And if so is there any chance to avoid the
massive performance hit other than dropping the indexes before the
update and re-creating them afterwards?

The machine is a PIII/1GHz with 256MB of RAM and a UDMA100 disk
running PostgreSQL 7.2 on Linux 2.4.17.


Thanks in advance and greetings from Nuremberg,

    Reinhard

pgsql-bugs by date:

Previous
From: Jean-Paul ARGUDO
Date:
Subject: Re: function tree_level(varchar) (from OpenACS) no longer work under 7.2
Next
From: Tom Lane
Date:
Subject: Re: function tree_level(varchar) (from OpenACS) no longer work under 7.2