Thread: pg_stat_statements: more test coverage

pg_stat_statements: more test coverage

From
Peter Eisentraut
Date:
pg_stat_statements has some significant gaps in test coverage, 
especially around the serialization of data around server restarts, so I 
wrote a test for that and also made some other smaller tweaks to 
increase the coverage a bit.  These patches are all independent of each 
other.

After that, the only major function that isn't tested is gc_qtexts(). 
Maybe a future project.
Attachment

Re: pg_stat_statements: more test coverage

From
Michael Paquier
Date:
On Sat, Dec 23, 2023 at 03:18:01PM +0100, Peter Eisentraut wrote:
> +/* LCOV_EXCL_START */
> +PG_FUNCTION_INFO_V1(pg_stat_statements);
>  PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
> +/* LCOV_EXCL_STOP */

The only reason why I've seen this used at the C level was to bump up
the coverage requirements because of some internal company projects.
I'm pretty sure to have proposed in the past at least one patch that
would make use of that, but it got rejected.  It is not the only code
area that has a similar pattern.  So why introducing that now?

> Subject: [PATCH v1 2/5] pg_stat_statements: Add coverage for
>  pg_stat_statements_1_8()
>
> [...]
>
> Subject: [PATCH v1 3/5] pg_stat_statements: Add coverage for
>  pg_stat_statements_reset_1_7

Yep, why not.

> +SELECT format('create table t%s (a int)', lpad(i::text, 3, '0')) FROM generate_series(1, 101) AS x(i) \gexec
> +create table t001 (a int)
> [...]
> +create table t101 (a int)

That's a lot of bloat.  This relies on pg_stat_statements.max's
default to be at 100.  Based on the regression tests, the maximum
number of rows we have reported from the view pg_stat_statements is
39 in utility.c.  I think that you should just:
- Use a DO block of a PL function, say with something like that to
ensure an amount of N queries?  Say with something like that after
tweaking pg_stat_statements.track:
CREATE OR REPLACE FUNCTION create_tables(num_tables int)
  RETURNS VOID AS
  $func$
  BEGIN
  FOR i IN 1..num_tables LOOP
    EXECUTE format('
      CREATE TABLE IF NOT EXISTS %I (id int)', 't_' || i);
  END LOOP;
END
$func$ LANGUAGE plpgsql;
- Switch the minimum to be around 40~50 in the local
pg_stat_statements.conf used for the regression tests.

> +SELECT count(*) <= 100 AND count(*) > 0 FROM pg_stat_statements;

You could fetch the max value in a \get and reuse it here, as well.

> +is( $node->safe_psql(
> +        'postgres',
> +        "SELECT count(*) FROM pg_stat_statements WHERE query LIKE '%t1%'"),
> +    '2',
> +    'pg_stat_statements data kept across restart');

Checking that the contents match would be a bit more verbose than just
a count.  One trick I've used in the patch is in
027_stream_regress.pl, where there is a query grouping the stats
depending on the beginning of the queries.  Not exact, a bit more
verbose.
--
Michael

Attachment

Re: pg_stat_statements: more test coverage

From
Peter Eisentraut
Date:
On 24.12.23 03:03, Michael Paquier wrote:
> On Sat, Dec 23, 2023 at 03:18:01PM +0100, Peter Eisentraut wrote:
>> +/* LCOV_EXCL_START */
>> +PG_FUNCTION_INFO_V1(pg_stat_statements);
>>   PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
>> +/* LCOV_EXCL_STOP */
> 
> The only reason why I've seen this used at the C level was to bump up
> the coverage requirements because of some internal company projects.
> I'm pretty sure to have proposed in the past at least one patch that
> would make use of that, but it got rejected.  It is not the only code
> area that has a similar pattern.  So why introducing that now?

What other code areas have similar patterns (meaning, extension entry 
points for upgrade support that are not covered by currently available 
extension installation files)?

> That's a lot of bloat.  This relies on pg_stat_statements.max's
> default to be at 100.

The default is 5000.  I set 100 explicitly in the configuration file for 
the test.

> - Use a DO block of a PL function, say with something like that to
> ensure an amount of N queries?  Say with something like that after
> tweaking pg_stat_statements.track:
> CREATE OR REPLACE FUNCTION create_tables(num_tables int)
>    RETURNS VOID AS
>    $func$
>    BEGIN
>    FOR i IN 1..num_tables LOOP
>      EXECUTE format('
>        CREATE TABLE IF NOT EXISTS %I (id int)', 't_' || i);
>    END LOOP;
> END
> $func$ LANGUAGE plpgsql;

I tried it like this first, but this doesn't register as separately 
executed commands for pg_stat_statements.

> - Switch the minimum to be around 40~50 in the local
> pg_stat_statements.conf used for the regression tests.

100 is the hardcoded minimum for the setting.

>> +is( $node->safe_psql(
>> +        'postgres',
>> +        "SELECT count(*) FROM pg_stat_statements WHERE query LIKE '%t1%'"),
>> +    '2',
>> +    'pg_stat_statements data kept across restart');
> 
> Checking that the contents match would be a bit more verbose than just
> a count.  One trick I've used in the patch is in
> 027_stream_regress.pl, where there is a query grouping the stats
> depending on the beginning of the queries.  Not exact, a bit more
> verbose.

Yeah, this could be expanded a bit.




Re: pg_stat_statements: more test coverage

From
Julien Rouhaud
Date:
Hi,

On Tue, Dec 26, 2023 at 10:03 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 24.12.23 03:03, Michael Paquier wrote:
> > - Use a DO block of a PL function, say with something like that to
> > ensure an amount of N queries?  Say with something like that after
> > tweaking pg_stat_statements.track:
> > CREATE OR REPLACE FUNCTION create_tables(num_tables int)
> >    RETURNS VOID AS
> >    $func$
> >    BEGIN
> >    FOR i IN 1..num_tables LOOP
> >      EXECUTE format('
> >        CREATE TABLE IF NOT EXISTS %I (id int)', 't_' || i);
> >    END LOOP;
> > END
> > $func$ LANGUAGE plpgsql;
>
> I tried it like this first, but this doesn't register as separately
> executed commands for pg_stat_statements.

I was a bit surprised by that so I checked locally.  It does work as
expected provided that you set pg_stat_statements.track to all:
=# select create_tables(5);
=# select queryid, query from pg_stat_statements where query like 'CREATE%';
       queryid        |                  query
----------------------+-----------------------------------------
 -4985234599080337259 | CREATE TABLE IF NOT EXISTS t_5 (id int)
  -790506371630237058 | CREATE TABLE IF NOT EXISTS t_2 (id int)
 -1104545884488896333 | CREATE TABLE IF NOT EXISTS t_3 (id int)
 -2961032912789520428 | CREATE TABLE IF NOT EXISTS t_4 (id int)
  7273321309563119428 | CREATE TABLE IF NOT EXISTS t_1 (id int)



Re: pg_stat_statements: more test coverage

From
Peter Eisentraut
Date:
On 27.12.23 09:08, Julien Rouhaud wrote:
> Hi,
> 
> On Tue, Dec 26, 2023 at 10:03 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 24.12.23 03:03, Michael Paquier wrote:
>>> - Use a DO block of a PL function, say with something like that to
>>> ensure an amount of N queries?  Say with something like that after
>>> tweaking pg_stat_statements.track:
>>> CREATE OR REPLACE FUNCTION create_tables(num_tables int)
>>>     RETURNS VOID AS
>>>     $func$
>>>     BEGIN
>>>     FOR i IN 1..num_tables LOOP
>>>       EXECUTE format('
>>>         CREATE TABLE IF NOT EXISTS %I (id int)', 't_' || i);
>>>     END LOOP;
>>> END
>>> $func$ LANGUAGE plpgsql;
>>
>> I tried it like this first, but this doesn't register as separately
>> executed commands for pg_stat_statements.
> 
> I was a bit surprised by that so I checked locally.  It does work as
> expected provided that you set pg_stat_statements.track to all:

Ok, here is an updated patch set that does it that way.

I have committed the patches 0002 and 0003 from the previous patch set 
already.

I have also enhanced the TAP test a bit to check the actual content of 
the output across restarts.

I'm not too hung up on the 0001 patch if others don't like that approach.
Attachment

Re: pg_stat_statements: more test coverage

From
Julien Rouhaud
Date:
On Wed, Dec 27, 2023 at 8:53 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 27.12.23 09:08, Julien Rouhaud wrote:
> >
> > I was a bit surprised by that so I checked locally.  It does work as
> > expected provided that you set pg_stat_statements.track to all:
>
> Ok, here is an updated patch set that does it that way.

It looks good to me.  One minor complaint, I'm a bit dubious about
those queries:

SELECT count(*) <= 100 AND count(*) > 0 FROM pg_stat_statements;

Is it to actually test that pg_stat_statements won't store more than
pg_stat_statements.max records?  Note also that this query can't
return 0 rows, as the currently executed query will have an entry
added during post_parse_analyze.  Maybe a comment saying what this is
actually testing would help.

It would also be good to test that pg_stat_statements_info.dealloc is
more than 0 once enough statements have been issued.

> I have committed the patches 0002 and 0003 from the previous patch set
> already.
>
> I have also enhanced the TAP test a bit to check the actual content of
> the output across restarts.

Nothing much to say about this one, it all looks good.

> I'm not too hung up on the 0001 patch if others don't like that approach.

I agree with Michael on this one, the only times I saw this pattern
was to comply with some company internal policy for minimal coverage
numbers.



Re: pg_stat_statements: more test coverage

From
Peter Eisentraut
Date:
On 29.12.23 06:14, Julien Rouhaud wrote:

> It looks good to me.  One minor complaint, I'm a bit dubious about
> those queries:
> 
> SELECT count(*) <= 100 AND count(*) > 0 FROM pg_stat_statements;
> 
> Is it to actually test that pg_stat_statements won't store more than
> pg_stat_statements.max records?  Note also that this query can't
> return 0 rows, as the currently executed query will have an entry
> added during post_parse_analyze.  Maybe a comment saying what this is
> actually testing would help.

Yeah, I think I added that query before I added the queries to check the 
contents of pg_stat_statements.query itself, so it's a bit obsolete.  I 
reworked that part.

> It would also be good to test that pg_stat_statements_info.dealloc is
> more than 0 once enough statements have been issued.

I added that.

> 
>> I have committed the patches 0002 and 0003 from the previous patch set
>> already.
>>
>> I have also enhanced the TAP test a bit to check the actual content of
>> the output across restarts.
> 
> Nothing much to say about this one, it all looks good.

Ok, I have committed these two patches.

>> I'm not too hung up on the 0001 patch if others don't like that approach.
> 
> I agree with Michael on this one, the only times I saw this pattern
> was to comply with some company internal policy for minimal coverage
> numbers.

Ok, skipped that.



Re: pg_stat_statements: more test coverage

From
Michael Paquier
Date:
On Sat, Dec 30, 2023 at 08:39:47PM +0100, Peter Eisentraut wrote:
> On 29.12.23 06:14, Julien Rouhaud wrote:
>> I agree with Michael on this one, the only times I saw this pattern
>> was to comply with some company internal policy for minimal coverage
>> numbers.
>
> Ok, skipped that.

Just to close the loop here.  I thought that I had sent a patch on the
lists that made use of these markers, but it looks like that's not the
case.  The only thread I've found is this one:
https://www.postgresql.org/message-id/d8f6bdd536df403b9b33816e9f7e0b9d@G08CNEXMBPEKD05.g08.fujitsu.local

(FWIW, I'm still skeptic about the idea of painting more backend code
with these outside the parsing areas, but I'm OK to be outnumbered.)
--
Michael

Attachment

Re: pg_stat_statements: more test coverage

From
Michael Paquier
Date:
On Sat, Dec 30, 2023 at 08:39:47PM +0100, Peter Eisentraut wrote:
> Ok, I have committed these two patches.

Please note that the buildfarm has turned red, as in:
https://buildfarm.postgresql.org/cgi-bin/show_stagxe_log.pl?nm=pipit&dt=2023-12-31%2001%3A12%3A22&stg=misc-check

pg_stat_statements's regression.diffs holds more details:
SELECT query FROM pg_stat_statements WHERE query LIKE '%t001%' OR query LIKE '%t098%' ORDER BY query;
         query
 --------------------
- select * from t001
  select * from t098
-(2 rows)
+(1 row)
--
Michael

Attachment

Re: pg_stat_statements: more test coverage

From
Julien Rouhaud
Date:
On Sun, Dec 31, 2023 at 2:28 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Dec 30, 2023 at 08:39:47PM +0100, Peter Eisentraut wrote:
> > Ok, I have committed these two patches.
>
> Please note that the buildfarm has turned red, as in:
> https://buildfarm.postgresql.org/cgi-bin/show_stagxe_log.pl?nm=pipit&dt=2023-12-31%2001%3A12%3A22&stg=misc-check
>
> pg_stat_statements's regression.diffs holds more details:
> SELECT query FROM pg_stat_statements WHERE query LIKE '%t001%' OR query LIKE '%t098%' ORDER BY query;
>          query
>  --------------------
> - select * from t001
>   select * from t098
> -(2 rows)
> +(1 row)

That's surprising.  I wanted to see if there was any specific
configuration but I get a 403.  I'm wondering if this is only due to
other tests being run concurrently evicting an entry earlier than
planned.



Re: pg_stat_statements: more test coverage

From
Peter Eisentraut
Date:
On 31.12.23 10:26, Julien Rouhaud wrote:
> On Sun, Dec 31, 2023 at 2:28 PM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Sat, Dec 30, 2023 at 08:39:47PM +0100, Peter Eisentraut wrote:
>>> Ok, I have committed these two patches.
>>
>> Please note that the buildfarm has turned red, as in:
>> https://buildfarm.postgresql.org/cgi-bin/show_stagxe_log.pl?nm=pipit&dt=2023-12-31%2001%3A12%3A22&stg=misc-check
>>
>> pg_stat_statements's regression.diffs holds more details:
>> SELECT query FROM pg_stat_statements WHERE query LIKE '%t001%' OR query LIKE '%t098%' ORDER BY query;
>>           query
>>   --------------------
>> - select * from t001
>>    select * from t098
>> -(2 rows)
>> +(1 row)
> 
> That's surprising.  I wanted to see if there was any specific
> configuration but I get a 403.  I'm wondering if this is only due to
> other tests being run concurrently evicting an entry earlier than
> planned.

These tests are run in a separate instance and serially, so I don't 
think concurrency is an issue.

It looks like the failing configurations are exactly all the big-endian 
ones: s390x, sparc, powerpc.  So it's possible that this is actually a 
bug?  But unless someone can reproduce this locally and debug it, we 
should probably revert this for now.




Re: pg_stat_statements: more test coverage

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> It looks like the failing configurations are exactly all the big-endian 
> ones: s390x, sparc, powerpc.  So it's possible that this is actually a 
> bug?  But unless someone can reproduce this locally and debug it, we 
> should probably revert this for now.

I see it failed on my animal mamba, so I should be able to reproduce
it there.  Will look.

            regards, tom lane



Re: pg_stat_statements: more test coverage

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> It looks like the failing configurations are exactly all the big-endian
> ones: s390x, sparc, powerpc.  So it's possible that this is actually a
> bug?  But unless someone can reproduce this locally and debug it, we
> should probably revert this for now.

The reason for the platform-dependent behavior is that we're dealing
with a bunch of entries with identical "usage", so it's just about
random which ones actually get deleted.  I do not think our qsort()
has platform-dependent behavior --- but the initial contents of
entry_dealloc's array are filled in hash_seq_search order, and that
*is* platform-dependent.

Now, the test script realizes that hazard.  The bug seems to be that
it's wrong about how many competing usage-count-1 entries there are.
Instrumenting the calls to entry_alloc (which'll call entry_dealloc
when we hit 100 entries), I see

2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'SELECT
pg_stat_statements_reset()IS NOT', cur hash size 0 
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for '$1', cur hash size 1
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT:  SQL expression "1"
    PL/pgSQL function inline_code_block line 3 at FOR with integer loop variable
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'format($3,
lpad(i::text,$4, $5))', cur hash size 2 
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT:  SQL expression "format('select * from
t%s',lpad(i::text, 3, '0'))" 
    PL/pgSQL function inline_code_block line 4 at EXECUTE
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'select * from t001',
curhash size 3 
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT:  SQL statement "select * from t001"
    PL/pgSQL function inline_code_block line 4 at EXECUTE
...
2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'select * from t098',
curhash size 100 
2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max CONTEXT:  SQL statement "select * from t098"
    PL/pgSQL function inline_code_block line 4 at EXECUTE
2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max LOG:  entry_dealloc: zapping 10 of 100 victims

So the dealloc happens sooner than the script expects, and it's pure
chance that the test appeared to work anyway.

The test case needs to be rewritten to allow for more competing
usage-count-1 entries than it currently does.  Backing "98" down to
"95" might be enough, but I've not experimented (and I'd recommend
leaving more than the minimum amount of headroom, in case plpgsql
changes behavior about how many subexpressions get put into the
table).

Strongly recommend that while fixing the test, you stick in some
debugging elogs to verify when the dealloc calls actually happen
rather than assuming you predicted it correctly.  I did it as
attached.

            regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f62a531f7..ffdc14a1eb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1373,6 +1373,10 @@ pgss_store(const char *query, uint64 queryId,
         if (!stored)
             goto done;

+        elog(LOG, "calling entry_alloc for '%.40s', cur hash size %ld",
+             norm_query ? norm_query : query,
+             hash_get_num_entries(pgss_hash));
+
         /* OK to create a new hashtable entry */
         entry = entry_alloc(&key, query_offset, query_len, encoding,
                             jstate != NULL);
@@ -2160,6 +2164,8 @@ entry_dealloc(void)
     nvictims = Max(10, i * USAGE_DEALLOC_PERCENT / 100);
     nvictims = Min(nvictims, i);

+    elog(LOG, "entry_dealloc: zapping %d of %d victims", nvictims, i);
+
     for (i = 0; i < nvictims; i++)
     {
         hash_search(pgss_hash, &entries[i]->key, HASH_REMOVE, NULL);