Re: pg_stat_statements: more test coverage - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: pg_stat_statements: more test coverage |
Date | |
Msg-id | 3768231.1704048155@sss.pgh.pa.us Whole thread Raw |
In response to | Re: pg_stat_statements: more test coverage (Peter Eisentraut <peter@eisentraut.org>) |
List | pgsql-hackers |
Peter Eisentraut <peter@eisentraut.org> writes: > It looks like the failing configurations are exactly all the big-endian > ones: s390x, sparc, powerpc. So it's possible that this is actually a > bug? But unless someone can reproduce this locally and debug it, we > should probably revert this for now. The reason for the platform-dependent behavior is that we're dealing with a bunch of entries with identical "usage", so it's just about random which ones actually get deleted. I do not think our qsort() has platform-dependent behavior --- but the initial contents of entry_dealloc's array are filled in hash_seq_search order, and that *is* platform-dependent. Now, the test script realizes that hazard. The bug seems to be that it's wrong about how many competing usage-count-1 entries there are. Instrumenting the calls to entry_alloc (which'll call entry_dealloc when we hit 100 entries), I see 2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG: calling entry_alloc for 'SELECT pg_stat_statements_reset()IS NOT', cur hash size 0 2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG: calling entry_alloc for '$1', cur hash size 1 2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT: SQL expression "1" PL/pgSQL function inline_code_block line 3 at FOR with integer loop variable 2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG: calling entry_alloc for 'format($3, lpad(i::text,$4, $5))', cur hash size 2 2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT: SQL expression "format('select * from t%s',lpad(i::text, 3, '0'))" PL/pgSQL function inline_code_block line 4 at EXECUTE 2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max LOG: calling entry_alloc for 'select * from t001', curhash size 3 2023-12-31 13:21:39.160 EST client backend[3764867] pg_regress/max CONTEXT: SQL statement "select * from t001" PL/pgSQL function inline_code_block line 4 at EXECUTE ... 2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max LOG: calling entry_alloc for 'select * from t098', curhash size 100 2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max CONTEXT: SQL statement "select * from t098" PL/pgSQL function inline_code_block line 4 at EXECUTE 2023-12-31 13:21:39.165 EST client backend[3764867] pg_regress/max LOG: entry_dealloc: zapping 10 of 100 victims So the dealloc happens sooner than the script expects, and it's pure chance that the test appeared to work anyway. The test case needs to be rewritten to allow for more competing usage-count-1 entries than it currently does. Backing "98" down to "95" might be enough, but I've not experimented (and I'd recommend leaving more than the minimum amount of headroom, in case plpgsql changes behavior about how many subexpressions get put into the table). Strongly recommend that while fixing the test, you stick in some debugging elogs to verify when the dealloc calls actually happen rather than assuming you predicted it correctly. I did it as attached. regards, tom lane diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 6f62a531f7..ffdc14a1eb 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1373,6 +1373,10 @@ pgss_store(const char *query, uint64 queryId, if (!stored) goto done; + elog(LOG, "calling entry_alloc for '%.40s', cur hash size %ld", + norm_query ? norm_query : query, + hash_get_num_entries(pgss_hash)); + /* OK to create a new hashtable entry */ entry = entry_alloc(&key, query_offset, query_len, encoding, jstate != NULL); @@ -2160,6 +2164,8 @@ entry_dealloc(void) nvictims = Max(10, i * USAGE_DEALLOC_PERCENT / 100); nvictims = Min(nvictims, i); + elog(LOG, "entry_dealloc: zapping %d of %d victims", nvictims, i); + for (i = 0; i < nvictims; i++) { hash_search(pgss_hash, &entries[i]->key, HASH_REMOVE, NULL);
pgsql-hackers by date: