Slow update of indexed column with many nulls - Mailing list pgsql-performance

From bsamwel@xs4all.nl
Subject Slow update of indexed column with many nulls
Date
Msg-id 21139.194.109.187.67.1048441138.squirrel@webmail.xs4all.nl
Whole thread Raw
Responses Re: Slow update of indexed column with many nulls  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Hi everybody,

I'm having a performance problem, PostgreSQL (7.3.2) is skipping some
optimisation options that it shouldn't IMO. It can be fully reproduced as
follows:

create table foo(
bar char(100),
baz integer
);

Now create a file with 1.2 million empty lines and do a \copy foo (bar)
from 'thatfile'. This should fill the table with 1.2 million rows. Now do:

insert into foo (baz) values (28);
create index foo_idx on foo(baz);
vacuum full analyze foo;

Now, we would expect that PostgreSQL is fully aware that there are not
many rows in foo that have "baz is not null". However:

bsamwel=> explain update foo set baz=null where baz is not null;
                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..34470.09 rows=1286146 width=110)
   Filter: (baz IS NOT NULL)
(2 rows)


So, it thinks it must do a sequential scan on foo, even though it should
know by now that foo.baz is really mostly null. Even if I disable
sequential scan it still chooses this option! Why doesn't it use the
index? It doesn't use the index either when I try to select all rows that
are not null.

Just for completeness' sake I'll give you the explain analyze:

bsamwel=> explain analyze update foo set baz=null where baz is not null;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..34470.09 rows=1286146 width=110) (actual
time=19678.82..19678.84 rows=1 loops=1)
   Filter: (baz IS NOT NULL)
 Total runtime: 19750.21 msec
(3 rows)

Do you guys have any idea?

Regards,
Bart


pgsql-performance by date:

Previous
From: Mario Weilguni
Date:
Subject: Re: Page Size in Future Releases
Next
From: bsamwel@xs4all.nl
Date:
Subject: Adding a foreign key constraint is extremely slow