Re: pgsql: Move pg_stat_statements query jumbling to core. - Mailing list pgsql-committers

From Julien Rouhaud
Subject Re: pgsql: Move pg_stat_statements query jumbling to core.
Date
Msg-id 20210408105609.ducxkeqidufygwqr@nol
Whole thread Raw
In response to Re: pgsql: Move pg_stat_statements query jumbling to core.  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-committers
On Thu, Apr 08, 2021 at 03:52:02PM +0530, Amit Kapila wrote:
> On Thu, Apr 8, 2021 at 3:11 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Thu, Apr 08, 2021 at 02:52:26PM +0530, Amit Kapila wrote:
> > >
> > > Yeah, we do pass instrumentation flags from leader to worker but not
> > > the one allocated via pg_stat_staments (pgss_ExecutorStart {...
> > > queryDesc->totaltime ...}.
> >
> > Ah indeed, I now remember that problem that I had to deal with in some
> > extensions [1].  For the record this was initially raised by Tomas in [2], but
> > unfortunately that thread got forgotten and probably all third-party extensions
> > ended up allocating some shmem for their need.
> >
> > > However, I guess prior to this patch those
> > > were anyway not allocated separately because queryid was not passed to
> > > workers. So, I think you are right it shouldn't change the previous
> > > behavior.
> >
> > I think that previously parallel workers always added the required
> > instrumentation if pg_stat_statements is loaded,
> >
> 
> How? AFAICS, in pgss_ExecutorStart(), we allocate the instrumentation
> structure only if queryid is non-zero which prior to this patch
> shouldn't happen.

Doh right, I think I need some sleep :)

So I tried to run that scenario against pg13 and master + the ignore parallel
worker patch:

CREATE TABLE pgss AS SELECT generate_series(1, 10000) id;

ALTER TABLE pgss SET (max_parallel_workers = 10);
SET force_parallel_mode = on;
SET parallel_leader_participation = off;
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;

SELECT version();
SELECT pg_prewarm('pgss');

EXPLAIN (ANALYZE, VERBOSE, BUFFERS, COSTS OFF) SELECT COUNT(*) FROM pgss;
SELECT pg_stat_statements_reset();
SELECT COUNT(*) FROM pgss;
SELECT query, shared_blks_hit FROM pg_stat_statements WHERE query LIKE '%pgss%';

On pg13, I get:

 pg_prewarm
------------
         45
(1 row)

                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 13.2@49076fd3ba on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 10.2.0-r5 p6) 10.2.0, 64-bit
(1 row)

                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Gather (actual time=8.124..9.004 rows=1 loops=1)
   Output: (count(*))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   Buffers: shared hit=45
   ->  Aggregate (actual time=3.649..3.650 rows=1 loops=1)
         Output: count(*)
         Buffers: shared hit=45
         Worker 0:  actual time=3.649..3.650 rows=1 loops=1
           Buffers: shared hit=45
         ->  Seq Scan on public.pgss (actual time=0.057..2.331 rows=10000 loops=1)
               Output: id
               Buffers: shared hit=45
               Worker 0:  actual time=0.057..2.331 rows=10000 loops=1
                 Buffers: shared hit=45
 Planning:
   Buffers: shared hit=15 read=5
 Planning Time: 0.211 ms
 Execution Time: 9.051 ms
(20 rows)

 pg_stat_statements_reset 
--------------------------
 
(1 row)

 count 
-------
 10000
(1 row)

           query           | shared_blks_hit 
---------------------------+-----------------
 SELECT COUNT(*) FROM pgss |              45
(1 row)


On head + patch:

 pg_prewarm
------------
         45
(1 row)

------------------------------------------------------------------------------------------------------>
 PostgreSQL 14devel-O0@45885bee58 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 10.2.0-r5 p6) 10.2.0,>
(1 row)

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather (actual time=39.278..42.912 rows=1 loops=1)
   Output: (count(*))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   Buffers: shared hit=45
   ->  Aggregate (actual time=20.251..20.254 rows=1 loops=1)
         Output: count(*)
         Buffers: shared hit=45
         Worker 0:  actual time=20.251..20.254 rows=1 loops=1
           Buffers: shared hit=45
         ->  Seq Scan on public.pgss (actual time=0.214..13.746 rows=10000 loops=1)
               Output: id
               Buffers: shared hit=45
               Worker 0:  actual time=0.214..13.746 rows=10000 loops=1
                 Buffers: shared hit=45
 Query Identifier: 5605505304133078562
 Planning:
   Buffers: shared hit=21
 Planning Time: 0.857 ms
 Execution Time: 43.237 ms
(21 rows)

 pg_stat_statements_reset
--------------------------

(1 row)

 count
-------
 10000
(1 row)

           query           | shared_blks_hit
---------------------------+-----------------
 SELECT COUNT(*) FROM pgss |              45
(1 row)

And I get the exact same results if I change the scenario to SET
force_parallel_mode = off.  So it seems that it was working as expected as I
can see the 45 buffers being access, and is still working as expected, although
I'm not exactly sure why right now if queryDesc->totaltime isn't somehow
gathered by the parallel worker instrastructure. (Note that 45885bee58 is my
local commit for "Ignore parallel workers in pg_stat_statements").

Also note that the buffers shown in the planning part are due to the catcache
population, so they're not expected on the 2nd query planning.



pgsql-committers by date:

Previous
From: David Rowley
Date:
Subject: pgsql: Cleanup partition pruning step generation
Next
From: Thomas Munro
Date:
Subject: pgsql: Remove read_page callback from XLogReader.