Thread: Force testing of query jumbling code in TAP tests
Hi all, As mentioned two times on this thread, there is not much coverage for the query jumbling code, even if it is in core: https://www.postgresql.org/message-id/Y5BHOUhX3zTH/ig6@paquier.xyz Ths issue is that we have the options to enable it, but only pg_stat_statements is able to enable and stress it. This causes coverage to be missed for all query patterns that are not covered directly by pg_stat_statements, like XML expressions, various DML patterns, etc. More aggressive testing would also ensure that no nodes are marked as no_query_jumble while they should be included in a computation. Attached is a patch to improve that. The main regression database is able to cover everything, basically, so I'd like to propose the addition of some extra configuration in 027_stream_regress.pl to enable pg_stat_statements. This could be added in the pg_upgrade tests, but that felt a bit less adapted here. Or can people think about cases where checking pg_stat_statements makes more sense after an upgrade or on a standby? One thing that makes sense for a standby is to check that the contents of pg_stat_statements are empty? With this addition, the query jumbling gets covered at 95%~, while https://coverage.postgresql.org/src/backend/nodes/index.html reports currently 35%. Thoughts or comments? -- Michael
Attachment
Hi, On 2023-02-13 14:00:36 +0900, Michael Paquier wrote: > With this addition, the query jumbling gets covered at 95%~, while > https://coverage.postgresql.org/src/backend/nodes/index.html reports > currently 35%. > > Thoughts or comments? Shouldn't there at least be some basic verification of pg_stat_statements output being sane after running the test? Even if that's perhaps just actually printing the statements. Greetings, Andres Freund
On Mon, Feb 13, 2023 at 09:45:12AM -0800, Andres Freund wrote: > Shouldn't there at least be some basic verification of pg_stat_statements > output being sane after running the test? Even if that's perhaps just actually > printing the statements. There is a total of 20k entries in pg_stat_statements if the max is high enough to store everything. Only dumping the first 100 characters of each query generates at least 1MB worth of logs, which would bloat a lot of the buildfarm in each run. So I would not do that. One thing may be perhaps to show a count of the queries in five categories: select, insert, delete, update and the rest? -- Michael
Attachment
Hi, On 2023-02-14 16:04:16 +0900, Michael Paquier wrote: > On Mon, Feb 13, 2023 at 09:45:12AM -0800, Andres Freund wrote: > > Shouldn't there at least be some basic verification of pg_stat_statements > > output being sane after running the test? Even if that's perhaps just actually > > printing the statements. > > There is a total of 20k entries in pg_stat_statements if the max is > high enough to store everything. Only dumping the first 100 > characters of each query generates at least 1MB worth of logs, which > would bloat a lot of the buildfarm in each run. So I would not do > that. One thing may be perhaps to show a count of the queries in five > categories: select, insert, delete, update and the rest? I didn't mean printing in the sense of outputting the statements to the tap log. Maybe creating a temp table or such for all the queries. And yes, then doing some top-level analysis on it like you describe sounds like a good idea. Greetings, Andres Freund
On Tue, Feb 14, 2023 at 10:11:21AM -0800, Andres Freund wrote: > I didn't mean printing in the sense of outputting the statements to the tap > log. Maybe creating a temp table or such for all the queries. And yes, then > doing some top-level analysis on it like you describe sounds like a good idea. One idea would be something like that, that makes sure that reports are generated for the most common query patterns: WITH select_stats AS (SELECT upper(substr(query, 1, 6)) AS select_query FROM pg_stat_statements WHERE upper(substr(query, 1, 6)) IN ('SELECT', 'UPDATE', 'INSERT', 'DELETE', 'CREATE')) SELECT select_query, count(select_query) > 1 AS some_rows FROM select_stats GROUP BY select_query ORDER BY select_query; Other ideas are welcome. At least this would be a start. -- Michael
Attachment
On Thu, Feb 16, 2023 at 02:08:42PM +0900, Michael Paquier wrote: > Other ideas are welcome. At least this would be a start. The main idea of the patch is here: > +# Check some data from pg_stat_statements. > +$node_primary->safe_psql('postgres', 'CREATE EXTENSION pg_stat_statements'); > +# This gathers data based on the first characters for some common query types, > +# providing coverage for SELECT, DMLs, and some DDLs. > +my $result = $node_primary->safe_psql( > + 'postgres', > + qq{WITH select_stats AS > + (SELECT upper(substr(query, 1, 6)) AS select_query > + FROM pg_stat_statements > + WHERE upper(substr(query, 1, 6)) IN ('SELECT', 'UPDATE', > + 'INSERT', 'DELETE', > + 'CREATE')) > + SELECT select_query, count(select_query) > 1 AS some_rows > + FROM select_stats > + GROUP BY select_query ORDER BY select_query;}); > +is( $result, qq(CREATE|t > +DELETE|t > +INSERT|t > +SELECT|t > +UPDATE|t), 'check contents of pg_stat_statements on regression database'); Are there any objections to do what's proposed in the patch and improve the testing coverage of query jumbling by default? -- Michael
Attachment
On Tue, Feb 28, 2023 at 02:06:05PM +0900, Michael Paquier wrote: > Are there any objections to do what's proposed in the patch and > improve the testing coverage of query jumbling by default? Well, done this one as of d28a449. More validation tests could always be added later if there are better ideas. Coverage of this code has gone up to 94.4% at the end: https://coverage.postgresql.org/src/backend/nodes/queryjumblefuncs.funcs.c.gcov.html -- Michael