Thread: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)

Hi hackers,

While working on the relation stats split into table and index stats 
[1], I noticed that currently pg_stat_have_stats() returns true for 
dropped indexes (or for index creation transaction rolled back).

Example:

postgres=# create table bdt as select a from generate_series(1,1000) a;
SELECT 1000
postgres=# create index bdtidx on bdt(a);
CREATE INDEX
postgres=# select * from bdt where a = 30;
  a
----
  30
(1 row)

postgres=# SELECT 'bdtidx'::regclass::oid;
   oid
-------
  16395
(1 row)

postgres=# select pg_stat_have_stats('relation', 5, 16395);
  pg_stat_have_stats
--------------------
  t
(1 row)

postgres=# drop index bdtidx;
DROP INDEX
postgres=# select pg_stat_have_stats('relation', 5, 16395);
  pg_stat_have_stats
--------------------
  t
(1 row)


Please find attached a patch proposal to fix it.

It does contain additional calls to pgstat_create_relation() and 
pgstat_drop_relation() as well as additional TAP tests.

[1]: 
https://www.postgresql.org/message-id/5bfcf1a5-4224-9324-594b-725e704c95b1%40amazon.com

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On 2022-08-22 18:39:07 +0200, Drouvot, Bertrand wrote:
> While working on the relation stats split into table and index stats [1], I
> noticed that currently pg_stat_have_stats() returns true for dropped indexes
> (or for index creation transaction rolled back).

Good catch.

I guess Horiguchi-san and/or I wrongly assumed it'd be taken care of by the
pgstat_create_relation() in heap_create_with_catalog(), but index_create()
doesn't use that.


> Please find attached a patch proposal to fix it.

Perhaps a better fix would be to move the pgstat_create_relation() from
heap_create_with_catalog() into heap_create()? Although I guess it's a bit
pointless to deduplicate given that you're going to split it up again...


> It does contain additional calls to pgstat_create_relation() and
> pgstat_drop_relation() as well as additional TAP tests.

Would be good to add a test for CREATE INDEX / DROP INDEX / REINDEX
CONCURRENTLY as well.

Might be worth adding a test to stats.sql or stats.spec in the main regression
tests. Perhaps that's best where the aforementioned things should be tested?


> @@ -2349,6 +2354,7 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
>      CatalogTupleDelete(indexRelation, &tuple->t_self);
>  
>      ReleaseSysCache(tuple);
> +
>      table_close(indexRelation, RowExclusiveLock);
>  
>      /*

Assume this was just an accident?


Greetings,

Andres Freund



Hi,

On 8/22/22 7:36 PM, Andres Freund wrote:
> On 2022-08-22 18:39:07 +0200, Drouvot, Bertrand wrote:
>> Please find attached a patch proposal to fix it.
> Perhaps a better fix would be to move the pgstat_create_relation() from
> heap_create_with_catalog() into heap_create()? Although I guess it's a bit
> pointless to deduplicate given that you're going to split it up again...

Thanks for looking at it!

Agree it's better to move it to heap_create(): it's done in the new 
version attached.

We'll see later on if it needs to be duplicated for the table/index 
split work.

>> It does contain additional calls to pgstat_create_relation() and
>> pgstat_drop_relation() as well as additional TAP tests.
> Would be good to add a test for CREATE INDEX / DROP INDEX / REINDEX
> CONCURRENTLY as well.
>
> Might be worth adding a test to stats.sql or stats.spec in the main regression
> tests. Perhaps that's best where the aforementioned things should be tested?

Yeah that sounds better, I'm also adding more tests around table 
creation while at it.

I ended up adding the new tests in stats.sql.

>
>> @@ -2349,6 +2354,7 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
>>        CatalogTupleDelete(indexRelation, &tuple->t_self);
>>
>>        ReleaseSysCache(tuple);
>> +
>>        table_close(indexRelation, RowExclusiveLock);
>>
>>        /*
> Assume this was just an accident?

Oops, thanks!

New version attached.

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Attachment
Good catch, and thanks for the patch!

(The file name would correctly be v2-0001-...:)

At Tue, 23 Aug 2022 09:58:03 +0200, "Drouvot, Bertrand" <bdrouvot@amazon.com> wrote in 
> Agree it's better to move it to heap_create(): it's done in the new
> version attached.

+1 (not considering stats splitting)

> We'll see later on if it needs to be duplicated for the table/index
> split work.

The code changes looks good to me.

> >> It does contain additional calls to pgstat_create_relation() and
> >> pgstat_drop_relation() as well as additional TAP tests.
> > Would be good to add a test for CREATE INDEX / DROP INDEX / REINDEX
> > CONCURRENTLY as well.
> >
> > Might be worth adding a test to stats.sql or stats.spec in the main
> > regression
> > tests. Perhaps that's best where the aforementioned things should be
> > tested?
> 
> Yeah that sounds better, I'm also adding more tests around table
> creation while at it.
> 
> I ended up adding the new tests in stats.sql.

+-- pg_stat_have_stats returns true for table creation inserting data
+-- pg_stat_have_stats returns true for committed index creation
+

Not sure we need this, as we check that already in the same file. (In
other words, if we failed this, the should have failed earlier.) Maybe
we only need check for drop operations and reindex cases?

We have other variable-numbered stats kinds
FUNCTION/REPLSLOT/SUBSCRIPTION. Don't we need the same for these?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Hi,

On 8/25/22 4:47 AM, Kyotaro Horiguchi wrote:
> Good catch, and thanks for the patch!
>
> (The file name would correctly be v2-0001-...:)
>
> At Tue, 23 Aug 2022 09:58:03 +0200, "Drouvot, Bertrand" <bdrouvot@amazon.com> wrote in
>> Agree it's better to move it to heap_create(): it's done in the new
>> version attached.
> +1 (not considering stats splitting)
>
>> We'll see later on if it needs to be duplicated for the table/index
>> split work.
> The code changes looks good to me.

Thanks for looking at it!

> +-- pg_stat_have_stats returns true for table creation inserting data
> +-- pg_stat_have_stats returns true for committed index creation
> +
>
> Not sure we need this, as we check that already in the same file. (In
> other words, if we failed this, the should have failed earlier.)

That's right.

> Maybe
> we only need check for drop operations

Looking closer at it, I think we are already good for the drop case on 
the tables (by making direct use of the pg_stat_get_* functions on the 
before dropped oid).

So I think we can remove all the "table" new checks: new patch attached 
is doing so.

On the other hand, for the index case, I think it's better to keep the 
"committed index creation one".

Indeed, to check that the drop behaves correctly I think it's better in 
"the same test" to ensure we've had the desired result before the drop 
(I mean having pg_stat_have_stats() returning false after a drop does 
not really help if we are not 100% sure it was returning true for the 
exact same index before the drop).

> We have other variable-numbered stats kinds
> FUNCTION/REPLSLOT/SUBSCRIPTION. Don't we need the same for these?

I don't think we need more tests for the FUNCTION case (as it looks to 
me it is already covered in stat.sql by the pg_stat_get_function_calls() 
calls on the dropped functions oids).

For SUBSCRIPTION, i think this is covered in 026_stats.pl:

# Subscription stats for sub1 should be gone
is( $node_subscriber->safe_psql(
         $db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid))),
     qq(f),
     qq(Subscription stats for subscription '$sub1_name' should be 
removed.));

For REPLSLOT, I agree that we can add one test: I added it in 
contrib/test_decoding/sql/stats.sql. It relies on pg_stat_have_stats() 
(as relying on pg_stat_replication_slots and/or 
pg_stat_get_replication_slot() would not help that much for this test 
given that the slot has been removed from ReplicationSlotCtl)

Attaching v3-0001 (with the right "numbering" this time ;-) )

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Attachment
At Thu, 25 Aug 2022 11:44:34 +0200, "Drouvot, Bertrand" <bdrouvot@amazon.com> wrote in
> Looking closer at it, I think we are already good for the drop case on
> the tables (by making direct use of the pg_stat_get_* functions on the
> before dropped oid).
>
> So I think we can remove all the "table" new checks: new patch
> attached is doing so.
>
> On the other hand, for the index case, I think it's better to keep the
> "committed index creation one".

I agree.

> Indeed, to check that the drop behaves correctly I think it's better
> in "the same test" to ensure we've had the desired result before the
> drop (I mean having pg_stat_have_stats() returning false after a drop
> does not really help if we are not 100% sure it was returning true for
> the exact same index before the drop).

Sounds reasonable.

> > We have other variable-numbered stats kinds
> > FUNCTION/REPLSLOT/SUBSCRIPTION. Don't we need the same for these?
>
> I don't think we need more tests for the FUNCTION case (as it looks to
> me it is already covered in stat.sql by the
> pg_stat_get_function_calls() calls on the dropped functions oids).

Right.

> For SUBSCRIPTION, i think this is covered in 026_stats.pl:
>
> # Subscription stats for sub1 should be gone
> is( $node_subscriber->safe_psql(
>         $db, qq(SELECT pg_stat_have_stats('subscription', 0,
> $sub1_oid))),
>     qq(f),
>     qq(Subscription stats for subscription '$sub1_name' should be
> removed.));
>
> For REPLSLOT, I agree that we can add one test: I added it in
> contrib/test_decoding/sql/stats.sql. It relies on pg_stat_have_stats()
> (as relying on pg_stat_replication_slots and/or
> pg_stat_get_replication_slot() would not help that much for this test
> given that the slot has been removed from ReplicationSlotCtl)

Thanks for the searching.

+-- pg_stat_have_stats returns true for regression_slot_stats1
+-- Its index is 1 in ReplicationSlotCtl->replication_slots
+select pg_stat_have_stats('replslot', 0, 1);

This is wrong. The index is actually 0.  We cannot know the id
reliably since we don't expose it at all. We could slightly increase
robustness by assuming the range of the id but that is just moving the
problem to another place. If the test is broken by a change of
replslot id assignment policy, it would be easily found and fixed.

So is it fine simply fixing the comment with the correct ID?

Or, contrarily we can be more sensitive to the change of ID assignment
policy by checking all the replication slots.

select count(n) from generate_series(0, 2) as n where pg_stat_have_stats('replslot', 0, n);

The number changes from 3 to 0 across the slots drop.. If any of the
slots has gone out of the range, the number before the drop decreases.

> Attaching v3-0001 (with the right "numbering" this time ;-) )

Yeah, Looks fine:p

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Hiu,

On 2022-08-25 11:44:34 +0200, Drouvot, Bertrand wrote:
> For REPLSLOT, I agree that we can add one test: I added it in
> contrib/test_decoding/sql/stats.sql. It relies on pg_stat_have_stats() (as
> relying on pg_stat_replication_slots and/or pg_stat_get_replication_slot()
> would not help that much for this test given that the slot has been removed
> from ReplicationSlotCtl)

As Horiguchi-san noted, we can't rely on specific indexes being used. I feel
ok with the current coverage in that area, but if we *really* feel we need to
test it, we'd need to count the number of indexes with slots before dropping
the slot and after the drop.


> +-- pg_stat_have_stats returns true for committed index creation

Maybe another test for an uncommitted index creation would be good too?

Could you try running this test with debug_discard_caches = 1 - it's pretty
easy to write tests in this area that aren't reliable timing wise.


> +CREATE table stats_test_tab1 as select generate_series(1,10) a;
> +CREATE index stats_test_idx1 on stats_test_tab1(a);
> +SELECT oid AS dboid from pg_database where datname = current_database() \gset

Since you introduced this, maybe convert the other instance of this query at
the end of the file as well?


Greetings,

Andres Freund



Hi,

On 8/31/22 9:10 AM, Kyotaro Horiguchi wrote:
Thanks for the searching.
+-- pg_stat_have_stats returns true for regression_slot_stats1
+-- Its index is 1 in ReplicationSlotCtl->replication_slots
+select pg_stat_have_stats('replslot', 0, 1);

This is wrong. The index is actually 0.  

Right, thanks for pointing out. (gdb) p get_replslot_index("regression_slot_stats1")
$1 = 0
(gdb) p get_replslot_index("regression_slot_stats2")
$2 = 1
(gdb) p get_replslot_index("regression_slot_stats3")

$3 = 2

We cannot know the id
reliably since we don't expose it at all.

Right.

 We could slightly increase
robustness by assuming the range of the id but that is just moving the
problem to another place. If the test is broken by a change of
replslot id assignment policy, it would be easily found and fixed.

So is it fine simply fixing the comment with the correct ID?

Or, contrarily we can be more sensitive to the change of ID assignment
policy by checking all the replication slots.

select count(n) from generate_series(0, 2) as n where pg_stat_have_stats('replslot', 0, n);

The number changes from 3 to 0 across the slots drop.. If any of the
slots has gone out of the range, the number before the drop decreases.

Thanks for the ideas! I'm coming up with a slightly different one (also based on Andre's feedback in [1]) in the upcoming v4.

[1]: https://www.postgresql.org/message-id/20220831192657.jqhphpud2mxbzbom%40awork3.anarazel.de

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Hi,

On 8/31/22 9:26 PM, Andres Freund wrote:
Hiu,

On 2022-08-25 11:44:34 +0200, Drouvot, Bertrand wrote:
For REPLSLOT, I agree that we can add one test: I added it in
contrib/test_decoding/sql/stats.sql. It relies on pg_stat_have_stats() (as
relying on pg_stat_replication_slots and/or pg_stat_get_replication_slot()
would not help that much for this test given that the slot has been removed
from ReplicationSlotCtl)
As Horiguchi-san noted, we can't rely on specific indexes being used. 

Yeah.

I feel
ok with the current coverage in that area, but if we *really* feel we need to
test it, we'd need to count the number of indexes with slots before dropping
the slot and after the drop.

Thanks for the suggestion, I'm coming up with this proposal in v4 attached:

  • count the number of slots
  • ensure we have at least one for which pg_stat_have_stats() returns true
  • get the list of ids (true_ids) for which pg_stat_have_stats() returns true
  • drop all the slots
  • get the list of ids (false_ids) for which pg_stat_have_stats() returns false
  • ensure that both lists (true_ids and false_ids) are the same

I don't "really" feel the need we need to test it but i think that this thread was a good opportunity to try to test it.

That said, that's also fine for me if this test is not part of the patch.

Maybe a better/simpler option could be to expose a function to get the slot id based on its name and then write a "simple" test with it? (If so I think that would better to start another patch/thread).

+-- pg_stat_have_stats returns true for committed index creation
Maybe another test for an uncommitted index creation would be good too?

You mean in addition to the "-- pg_stat_have_stats returns false for rolled back index creation" one?

Could you try running this test with debug_discard_caches = 1 - it's pretty
easy to write tests in this area that aren't reliable timing wise.

Thanks for the suggestion. I did and it passed without any issues.

+CREATE table stats_test_tab1 as select generate_series(1,10) a;
+CREATE index stats_test_idx1 on stats_test_tab1(a);
+SELECT oid AS dboid from pg_database where datname = current_database() \gset
Since you introduced this, maybe convert the other instance of this query at
the end of the file as well?

yeah good point. In v4, I moved the dboid recording at the top and use it when appropriate.

Regards,
--

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com
Attachment
Hi,

On 2022-09-01 08:40:54 +0200, Drouvot, Bertrand wrote:
> Thanks for the suggestion, I'm coming up with this proposal in v4 attached:

I pushed the bugfix / related test portion to 15, master. Thanks!

I left the replication stuff out - it seemed somewhat independent. Probably
will just push that to master, unless somebody thinks it should be in both?

Greetings,

Andres Freund



Hi,

On 9/23/22 10:45 PM, Andres Freund wrote:

> 
> 
> 
> Hi,
> 
> On 2022-09-01 08:40:54 +0200, Drouvot, Bertrand wrote:
>> Thanks for the suggestion, I'm coming up with this proposal in v4 attached:
> 
> I pushed the bugfix / related test portion to 15, master. Thanks!

Thanks!

> 
> I left the replication stuff out - it seemed somewhat independent.

Yeah.

> Probably
> will just push that to master, unless somebody thinks it should be in both?

Sounds good to me as this is not a bug and that seems unlikely to me 
that an issue in this area will be introduced later on on 15 without 
being introduced on master too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On 9/26/22 10:23 AM, Drouvot, Bertrand wrote:
> Hi,
> 
> On 9/23/22 10:45 PM, Andres Freund wrote:
> 
>>
>>
>>
>> Hi,
>>
>> On 2022-09-01 08:40:54 +0200, Drouvot, Bertrand wrote:
>>> Thanks for the suggestion, I'm coming up with this proposal in v4 
>>> attached:
>>
>> I pushed the bugfix / related test portion to 15, master. Thanks!
> 
> Thanks!
> 

Forgot to say that with that being fixed, I'll come back with a patch 
proposal for the tables/indexes stats split (discovered the issue fixed 
in this current thread while working on the split patch.)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com