Thread: Index tuple count != heap tuple count problem identified

Index tuple count != heap tuple count problem identified

From
Tom Lane
Date:
You'll probably recall reports of messages like this out of VACUUM:
NOTICE:  Index ind1: NUMBER OF INDEX' TUPLES (2002) IS NOT THE SAME AS HEAP' (3003).
I've figured out the cause (or at least a cause) of this condition.

Consider a table having some data and indices, eg "onek" from the
regression tests:

regression=# vacuum verbose analyze onek;
NOTICE:  --Relation onek--
NOTICE:  Pages 24: Changed 0, reaped 1, Empty 0, New 0; Tup 1000: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 32, MinLen 180,
MaxLen180; Re-using: Free/Avail. Space 5988/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.11u sec.
 
NOTICE:  Index onek_stringu1: Pages 28; Tuples 1000: Deleted 0. CPU 0.00s/0.01u sec.
NOTICE:  Index onek_hundred: Pages 12; Tuples 1000: Deleted 0. CPU 0.00s/0.01u sec.
NOTICE:  Index onek_unique2: Pages 18; Tuples 1000: Deleted 0. CPU 0.00s/0.02u sec.
NOTICE:  Index onek_unique1: Pages 17; Tuples 1000: Deleted 0. CPU 0.00s/0.01u sec.
VACUUM

In a second psql, start up a transaction and leave it open:

regression=# begin;
BEGIN
regression=# select 1;?column?
----------       1
(1 row)

regression=#

(It's necessary to actually select something so that the transaction
will get assigned an ID; "begin" alone won't do anything.)

Now return to the first psql and modify the table, doesn't matter how:

regression=# update onek set odd = odd+0;
UPDATE 1000
regression=#

At this point, onek contains 1000 committed updated tuples and 1000 dead
but not yet deleted tuples.  Moreover, because we have an open
transaction that should see those dead tuples if it looks at the table
(at least if it's in SERIALIZABLE mode), VACUUM knows it should not
delete those tuples:

regression=# vacuum verbose analyze onek;
NOTICE:  --Relation onek--
NOTICE:  Pages 47: Changed 47, reaped 0, Empty 0, New 0; Tup 2000: Vac 0, Keep/VTL 1000/0, Crash 0, UnUsed 0, MinLen
180,MaxLen 180; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.22u sec.
 
NOTICE:  Index onek_stringu1: Pages 28; Tuples 2000. CPU 0.01s/0.02u sec.
NOTICE:  Index onek_hundred: Pages 12; Tuples 2000. CPU 0.00s/0.02u sec.
NOTICE:  Index onek_unique2: Pages 18; Tuples 2000. CPU 0.00s/0.01u sec.
NOTICE:  Index onek_unique1: Pages 17; Tuples 2000. CPU 0.00s/0.01u sec.
VACUUM

But what if we create a new index while in this state?

regression=# create index toolate on onek(unique1);
CREATE

regression=# vacuum verbose analyze onek;
NOTICE:  --Relation onek--
NOTICE:  Pages 47: Changed 0, reaped 0, Empty 0, New 0; Tup 2000: Vac 0, Keep/VTL 1000/0, Crash 0, UnUsed 0, MinLen
180,MaxLen 180; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.01s/0.22u sec.
 
NOTICE:  Index toolate: Pages 5; Tuples 1000. CPU 0.00s/0.01u sec.
NOTICE:  Index toolate: NUMBER OF INDEX' TUPLES (1000) IS NOT THE SAME AS HEAP' (2000).       Recreate the index.
NOTICE:  Index onek_stringu1: Pages 28; Tuples 2000. CPU 0.00s/0.02u sec.
NOTICE:  Index onek_hundred: Pages 12; Tuples 2000. CPU 0.00s/0.02u sec.
NOTICE:  Index onek_unique2: Pages 18; Tuples 2000. CPU 0.01s/0.02u sec.
NOTICE:  Index onek_unique1: Pages 17; Tuples 2000. CPU 0.00s/0.02u sec.
VACUUM

The CREATE INDEX operation has only bothered to index the non-dead
tuples.  So, VACUUM's little sanity check fails.

I believe that this is not really a bug.  If that old transaction came
along and tried to use the index to scan for tuples, then we'd have a
problem, because it'd fail to find tuples that it should have found.
BUT: if that old transaction is serializable, it won't even believe that
the index exists, not so?  It can't see the index's entry in pg_class.
So I think CREATE INDEX's behavior is OK, and we just have an
insufficiently smart cross-check in VACUUM.

I am not sure if it is possible to make an exact cross-check at
reasonable cost.  A recently created index might contain entries for
all, none, or just some of the committed-dead tuples in its table.
Depending on how old the oldest open transaction is, VACUUM might be
able to remove some but not all of those dead tuples.  So in general I
don't see an easy way to cross-check the number of index tuples against
the number of table tuples exactly.

I am inclined to change the check to complain if there are more index
tuples than table tuples (that's surely wrong), or if there are fewer
index tuples than committed-live table tuples (ditto), but not to
complain if it's in between those limits.  Comments?
        regards, tom lane


Re: Index tuple count != heap tuple count problem identified

From
Bruce Momjian
Date:
> You'll probably recall reports of messages like this out of VACUUM:
> NOTICE:  Index ind1: NUMBER OF INDEX' TUPLES (2002) IS NOT THE SAME AS HEAP' (3003).
> I've figured out the cause (or at least a cause) of this condition.
> 
> I am inclined to change the check to complain if there are more index
> tuples than table tuples (that's surely wrong), or if there are fewer
> index tuples than committed-live table tuples (ditto), but not to
> complain if it's in between those limits.  Comments?

Sounds good to me.  I know I never considered such an interaction.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: Index tuple count != heap tuple count problem identified

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane
> 
> You'll probably recall reports of messages like this out of VACUUM:
> NOTICE:  Index ind1: NUMBER OF INDEX' TUPLES (2002) IS NOT THE 
> SAME AS HEAP' (3003).
> I've figured out the cause (or at least a cause) of this condition.
> 
> The CREATE INDEX operation has only bothered to index the non-dead
> tuples.  So, VACUUM's little sanity check fails.
>

Is it wrong to change the implementation of CREATE INDEX ?
I have a fix.
It needs the change of duplicate check(tuplesort->btbuild) and
I've thougth that it would be better to change it after the release 
of 7.0.   

Regards.  

Hiroshi Inoue
Inoue@tpf.co.jp


Re: Index tuple count != heap tuple count problem identified]

From
Bruce Momjian
Date:
> > -----Original Message-----
> > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> > Behalf Of Tom Lane
> > 
> > You'll probably recall reports of messages like this out of VACUUM:
> > NOTICE:  Index ind1: NUMBER OF INDEX' TUPLES (2002) IS NOT THE 
> > SAME AS HEAP' (3003).
> > I've figured out the cause (or at least a cause) of this condition.
> > 
> > The CREATE INDEX operation has only bothered to index the non-dead
> > tuples.  So, VACUUM's little sanity check fails.
>
> 
> Is it wrong to change the implementation of CREATE INDEX ?
> I have a fix.
> It needs the change of duplicate check(tuplesort->btbuild) and
> I've thougth that it would be better to change it after the release 
> of 7.0.   

Well, it seems we better do something about it before 7.0 is released. 
Now it seems we have to decide to change CREATE INDEX, or modify VACUUM.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: Index tuple count != heap tuple count problem identified]

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
>
> > > -----Original Message-----
> > > From: pgsql-hackers-owner@hub.org
> [mailto:pgsql-hackers-owner@hub.org]On
> > > Behalf Of Tom Lane
> > >
> > > You'll probably recall reports of messages like this out of VACUUM:
> > > NOTICE:  Index ind1: NUMBER OF INDEX' TUPLES (2002) IS NOT THE
> > > SAME AS HEAP' (3003).
> > > I've figured out the cause (or at least a cause) of this condition.
> > >
> > > The CREATE INDEX operation has only bothered to index the non-dead
> > > tuples.  So, VACUUM's little sanity check fails.
> >
> >
> > Is it wrong to change the implementation of CREATE INDEX ?
> > I have a fix.
> > It needs the change of duplicate check(tuplesort->btbuild) and
> > I've thougth that it would be better to change it after the release
> > of 7.0.
>
> Well, it seems we better do something about it before 7.0 is released.
> Now it seems we have to decide to change CREATE INDEX, or modify VACUUM.
>

It's difficult for me to provide a fix for CREATE INDEX before 7.0 is
released.
It's not sufficiently checked and I don't remember details now.
I'm a little busy now and don't have enough time to look at it again.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: Index tuple count != heap tuple count problem identified]

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>>>> Is it wrong to change the implementation of CREATE INDEX ?
>>>> I have a fix.
>>>> It needs the change of duplicate check(tuplesort->btbuild) and
>>>> I've thougth that it would be better to change it after the release
>>>> of 7.0.
>> 
>> Well, it seems we better do something about it before 7.0 is released.
>> Now it seems we have to decide to change CREATE INDEX, or modify VACUUM.

> It's difficult for me to provide a fix for CREATE INDEX before 7.0 is
> released.
> It's not sufficiently checked and I don't remember details now.

Also, we'd need to change the other index access methods too.  That
doesn't seem to me like a good thing to tackle a week before release...
        regards, tom lane