Re: pg_stat_statements: more test coverage - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_stat_statements: more test coverage
Date
Msg-id 3768231.1704048155@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_stat_statements: more test coverage  (Peter Eisentraut <peter@eisentraut.org>)
List pgsql-hackers
Peter Eisentraut <peter@eisentraut.org> writes:
> It looks like the failing configurations are exactly all the big-endian
> ones: s390x, sparc, powerpc.  So it's possible that this is actually a
> bug?  But unless someone can reproduce this locally and debug it, we
> should probably revert this for now.

The reason for the platform-dependent behavior is that we're dealing
with a bunch of entries with identical "usage", so it's just about
random which ones actually get deleted.  I do not think our qsort()
has platform-dependent behavior --- but the initial contents of
entry_dealloc's array are filled in hash_seq_search order, and that
*is* platform-dependent.

Now, the test script realizes that hazard.  The bug seems to be that
it's wrong about how many competing usage-count-1 entries there are.
Instrumenting the calls to entry_alloc (which'll call entry_dealloc
when we hit 100 entries), I see

2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'SELECT
pg_stat_statements_reset()IS NOT', cur hash size 0 
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for '$1', cur hash size 1
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT:  SQL expression "1"
    PL/pgSQL function inline_code_block line 3 at FOR with integer loop variable
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'format($3,
lpad(i::text,$4, $5))', cur hash size 2 
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT:  SQL expression "format('select * from
t%s',lpad(i::text, 3, '0'))" 
    PL/pgSQL function inline_code_block line 4 at EXECUTE
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'select * from t001',
curhash size 3 
2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT:  SQL statement "select * from t001"
    PL/pgSQL function inline_code_block line 4 at EXECUTE
...
2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max LOG:  calling entry_alloc for 'select * from t098',
curhash size 100 
2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max CONTEXT:  SQL statement "select * from t098"
    PL/pgSQL function inline_code_block line 4 at EXECUTE
2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max LOG:  entry_dealloc: zapping 10 of 100 victims

So the dealloc happens sooner than the script expects, and it's pure
chance that the test appeared to work anyway.

The test case needs to be rewritten to allow for more competing
usage-count-1 entries than it currently does.  Backing "98" down to
"95" might be enough, but I've not experimented (and I'd recommend
leaving more than the minimum amount of headroom, in case plpgsql
changes behavior about how many subexpressions get put into the
table).

Strongly recommend that while fixing the test, you stick in some
debugging elogs to verify when the dealloc calls actually happen
rather than assuming you predicted it correctly.  I did it as
attached.

            regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f62a531f7..ffdc14a1eb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1373,6 +1373,10 @@ pgss_store(const char *query, uint64 queryId,
         if (!stored)
             goto done;

+        elog(LOG, "calling entry_alloc for '%.40s', cur hash size %ld",
+             norm_query ? norm_query : query,
+             hash_get_num_entries(pgss_hash));
+
         /* OK to create a new hashtable entry */
         entry = entry_alloc(&key, query_offset, query_len, encoding,
                             jstate != NULL);
@@ -2160,6 +2164,8 @@ entry_dealloc(void)
     nvictims = Max(10, i * USAGE_DEALLOC_PERCENT / 100);
     nvictims = Min(nvictims, i);

+    elog(LOG, "entry_dealloc: zapping %d of %d victims", nvictims, i);
+
     for (i = 0; i < nvictims; i++)
     {
         hash_search(pgss_hash, &entries[i]->key, HASH_REMOVE, NULL);

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Streaming I/O, vectored I/O (WIP)
Next
From: Jeff Davis
Date:
Subject: Re: [17] CREATE SUBSCRIPTION ... SERVER