Thread: Slow update of indexed column with many nulls

Slow update of indexed column with many nulls

From
bsamwel@xs4all.nl
Date:
Hi everybody,

I'm having a performance problem, PostgreSQL (7.3.2) is skipping some
optimisation options that it shouldn't IMO. It can be fully reproduced as
follows:

create table foo(
bar char(100),
baz integer
);

Now create a file with 1.2 million empty lines and do a \copy foo (bar)
from 'thatfile'. This should fill the table with 1.2 million rows. Now do:

insert into foo (baz) values (28);
create index foo_idx on foo(baz);
vacuum full analyze foo;

Now, we would expect that PostgreSQL is fully aware that there are not
many rows in foo that have "baz is not null". However:

bsamwel=> explain update foo set baz=null where baz is not null;
                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..34470.09 rows=1286146 width=110)
   Filter: (baz IS NOT NULL)
(2 rows)


So, it thinks it must do a sequential scan on foo, even though it should
know by now that foo.baz is really mostly null. Even if I disable
sequential scan it still chooses this option! Why doesn't it use the
index? It doesn't use the index either when I try to select all rows that
are not null.

Just for completeness' sake I'll give you the explain analyze:

bsamwel=> explain analyze update foo set baz=null where baz is not null;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..34470.09 rows=1286146 width=110) (actual
time=19678.82..19678.84 rows=1 loops=1)
   Filter: (baz IS NOT NULL)
 Total runtime: 19750.21 msec
(3 rows)

Do you guys have any idea?

Regards,
Bart


Re: Slow update of indexed column with many nulls

From
Josh Berkus
Date:
Bart,

> insert into foo (baz) values (28);
> create index foo_idx on foo(baz);
> vacuum full analyze foo;
>
> Now, we would expect that PostgreSQL is fully aware that there are not
> many rows in foo that have "baz is not null". However:

This is a known issue discussed several times on this list.   Try re-creating
your index as:

create index foo_idx on foo(baz) where foo is not null;

See the list archives for the reasons why.  This may improve in future
releases of PostgreSQL.

--
Josh Berkus
Aglio Database Solutions
San Francisco