Thread: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

[PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
"Imseih (AWS), Sami"
Date:

Adding a plan_id to pg_stat_activity allows users

to determine if a plan for a particular statement

has changed and if the new plan is performing better

or worse for a particular statement.

 

There are several ways the plan_id in pg_stat_activity

can be used:

 

1. In extensions that expose the plan text.

This will allow users to map a plan_id

from pg_stat_activity to the plan text.

 

2. In EXPLAIN output, including auto_explain.

 

3. In statement logging.

 

Computing the plan_id can be done using the same

routines for query jumbling, except plan nodes

will be jumbled. This approach was inspired by

work done in the extension pg_stat_plans,

https://github.com/2ndQuadrant/pg_stat_plans/

 

Attached is a POC patch that computes the plan_id

and presents the top-level plan_id in pg_stat_activity.

 

The patch still has work left:

- Perhaps Moving the plan jumbler outside of queryjumble.c?

- In the POC, the compute_query_id GUC determines if a

  plan_id is to be computed. Should this be a separate GUC?

 

 

-- Below is the output of sampling pg_stat_activity

-- with a pgbench workload running The patch

-- introduces the plan_id column.

 

select count(*),

   query,

   query_id,

   plan_id

from pg_stat_activity

where state='active'

and plan_id is not null and query_id is not null

group by  query, query_id, plan_id

order by 1 desc limit 1;

 

-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------

count    | 1

query    | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (7, 8, 242150, -1471, CURRENT_TIMESTAMP);

query_id | 4535829358544711074

plan_id  | -4913142083940981109

 

 

-- Also, a new view called pg_stat_statements_plan which

-- Includes all the same columns as pg_stat_statements, but

-- with statistics shown per plan.

 

postgres=# select substr(query, 1, 10) as query, queryid, planid, calls from pg_stat_statements_plan where queryid = 4535829358544711074;

-[ RECORD 1 ]-----------------

query   | INSERT INT

queryid | 4535829358544711074

planid  | -4913142083940981109

calls   | 4274428

 

-- the existing pg_stat_statements table

-- shows stats aggregated on

-- the queryid level. This is current behavior.

 

postgres=# select substr(query, 1, 10) as query, queryid, calls from pg_stat_statements where queryid = 4535829358544711074;

-[ RECORD 1 ]----------------

query   | INSERT INT

queryid | 4535829358544711074

calls   | 4377142

 

-- The “%Q” log_line_prefix flag will also include the planid as part of the output

-- the format will be "query_id/plan_id"

 

 

-- An example of using auto_explain with the ‘%Q” flag in log_line_prefix.

2022-06-14 17:08:10.485 CDT [76955] [4912312221998332774/-2294484545013135901] LOG:  duration: 0.144 ms  plan:

    Query Text: UPDATE pgbench_tellers SET tbalance = tbalance + -1952 WHERE tid = 32;

    Update on public.pgbench_tellers  (cost=0.27..8.29 rows=0 width=0)

      ->  Index Scan using pgbench_tellers_pkey on public.pgbench_tellers  (cost=0.27..8.29 rows=1 width=10)

            Output: (tbalance + '-1952'::integer), ctid

            Index Cond: (pgbench_tellers.tid = 32)

 

 

-- the output for EXPLAIN VERBOSE also shows a plan id.

 

postgres=# explain verbose select 1;

                QUERY PLAN

------------------------------------------

Result  (cost=0.00..0.01 rows=1 width=4)

   Output: 1

Query Identifier: -2698492627503961632

Plan Identifier: -7861780579971713347

(4 rows)

 

 

Thanks,

 

Sami Imseih

Amazon Web Services

 

Attachment

Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
Julien Rouhaud
Date:
Hi,

On Wed, Jun 15, 2022 at 06:45:38PM +0000, Imseih (AWS), Sami wrote:
> Adding a plan_id to pg_stat_activity allows users
> to determine if a plan for a particular statement
> has changed and if the new plan is performing better
> or worse for a particular statement.
> [...]
> Attached is a POC patch that computes the plan_id
> and presents the top-level plan_id in pg_stat_activity.

AFAICS you're proposing to add an identifier for a specific plan, but no way to
know what that plan was?  How are users supposed to use the information if they
know something changed but don't know what changed exactly?

> - In the POC, the compute_query_id GUC determines if a
>   plan_id is to be computed. Should this be a separate GUC?

Probably, as computing it will likely be quite expensive.  Some benchmark on
various workloads would be needed here.

I only had a quick look at the patch, but I see that you have some code to
avoid storing the query text multiple times with different planid.  How does it
work exactly, and does it ensure that the query text is only removed once the
last entry that uses it is removed?  It seems that you identify a specific
query text by queryid, but that seems wrong as collision can (easily?) happen
in different databases.  The real identifier of a query text should be (dbid,
queryid).

Note that this problem already exists, as the query texts are now stored per
(userid, dbid, queryid, istoplevel).  Maybe this part could be split in a
different commit as it could already be useful without a planid.



Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
Andrey Lepikhov
Date:
On 15/6/2022 21:45, Imseih (AWS), Sami wrote:
> Adding a plan_id to pg_stat_activity allows users
> to determine if a plan for a particular statement
> has changed and if the new plan is performing better
> or worse for a particular statement.
> There are several ways the plan_id in pg_stat_activity
In general, your idea is quite useful.
But, as discussed earlier [1] extensions would implement many useful 
things if they could add into a plan some custom data.
Maybe implement your feature with some private list of nodes in plan 
structure instead of single-purpose plan_id field?

[1] 
https://www.postgresql.org/message-id/flat/e0de3423-4bba-1e69-c55a-f76bf18dbd74%40postgrespro.ru

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
"Imseih (AWS), Sami"
Date:
>    AFAICS you're proposing to add an identifier for a specific plan, but no way to
>    know what that plan was?  How are users supposed to use the information if they
>    know something changed but don't know what changed exactly?

I see this as a start to do more useful things with plans.

The patch right out the gate exposes the plan_id in EXPLAIN output 
and auto_explain.
This will also be useful for extensions that will provide the plan text.
It is also conceivable that pg_stat_statements can provide an option
To store the plan text?

    > - In the POC, the compute_query_id GUC determines if a
    >   plan_id is to be computed. Should this be a separate GUC?

>    Probably, as computing it will likely be quite expensive.  Some benchmark on
>    various workloads would be needed here.

Yes, more benchmarks will be needed here with more complex plans. I have
Only benchmarked with pgbench at this point. 
However, separating this into Its own GUC is what I am leaning towards as well 
and will update the patch.

>    I only had a quick look at the patch, but I see that you have some code to
>    avoid storing the query text multiple times with different planid.  How does it
>    work exactly, and does it ensure that the query text is only removed once the
>    last entry that uses it is removed?  It seems that you identify a specific
>    query text by queryid, but that seems wrong as collision can (easily?) happen
>    in different databases.  The real identifier of a query text should be (dbid,
>    queryid).

The idea is to lookup the offsets and length of the text in the external file by querid
only. Therefore we can store similar query text for multiple pgss_hash entries
only once. 

When a new entry in pgss is not found, the new qtext_hash is consulted to 
see if it has information about the offsets/length of the queryid. If found in
qtext_hash, the new pgss_hash entry is created with the offsets found. 

If not found in qtext_hash, the query text will be (normalized) and stored in 
the external file. Then, a new entry will be created in qtext_hash and 
an entry in pgss_hash.

Of course we need to also handle the gc_qtext cleanups, entry_reset, startup
and shutdown scenarios. The patch does this, but I will go back and do more
testing.

>    Note that this problem already exists, as the query texts are now stored per
>    (userid, dbid, queryid, istoplevel).  Maybe this part could be split in a
>    different commit as it could already be useful without a planid.

Good point. I will separate this patch.

Regards, 

Sami Imseih
Amazon Web Services




Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
"Imseih (AWS), Sami"
Date:
>    Good point. I will separate this patch.

I separated the pg_stat_statements patch. The patch
Introduces a secondary hash that tracks locations of
A query ( by queryid ) in the external file. The hash
remains in lockstep with the pgss_hash using a
synchronization routine. For the default
pg_stat_statements.max = 5000, this hash requires 2MB megabytes
of additional shared memory.

My testing does not show any regression for workloads
In which statements are not issues by multiple users/databases.

However, it shows good improvement, 10-15%, when there
are similar statements that are issues by multiple 
users/databases/tracking levels.

Besides this improvement, this will open up the opportunity
to also track plan_id's as discussed earlier in the thread.

Thanks for the feedback.

Regards, 

Sami Imseih
Amazon Web Services





Attachment

Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
Julien Rouhaud
Date:
Hi,

On Tue, Jun 21, 2022 at 08:04:01PM +0000, Imseih (AWS), Sami wrote:
>
> I separated the pg_stat_statements patch. The patch
> Introduces a secondary hash that tracks locations of
> A query ( by queryid ) in the external file.

I still think that's wrong.

> The hash
> remains in lockstep with the pgss_hash using a
> synchronization routine.

Can you describe how it's kept in sync, and how it makes sure that the property
is maintained over restart / gc?  I don't see any change in the code for the
2nd part so I don't see how it could work (and after a quick test it indeed
doesn't).

I also don't see any change in the heuristics for need_gc_qtext(), isn't that
going to lead to too frequent gc?

> My testing does not show any regression for workloads
> In which statements are not issues by multiple users/databases.
>
> However, it shows good improvement, 10-15%, when there
> are similar statements that are issues by multiple
> users/databases/tracking levels.

"no regression" and "10-15% improvement" on what?

Can you share more details on the benchmarks you did?  Did you also run
benchmark on workloads that induce entry eviction, with and without need for
gc?  Eviction in pgss is already very expensive, and making things worse just
to save a bit of disk space doesn't seem like a good compromise.



Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
"Imseih (AWS), Sami"
Date:
>    Can you describe how it's kept in sync, and how it makes sure that the property
>    is maintained over restart / gc?  I don't see any change in the code for the
>    2nd part so I don't see how it could work (and after a quick test it indeed
>    doesn't).

There is a routine called qtext_hash_sync which removed all entries from 
the qtext_hash and reloads it will all the query ids from pgss_hash. 

This routine is called during:

1. gc_qtexts()
2. entry_reset()
3. entry_dealloc(), although this can be moved to the end of entry_alloc() instead.
4. pgss_shmem_startup()

All the points when it's called, an exclusive lock is held.this allows or syncing all
The present queryid's in pgss_hash with qtext_hash.
.

>    2nd part so I don't see how it could work (and after a quick test it indeed
>    doesn't).

Can you tell me what test you used to determine it is not in sync?


>    Can you share more details on the benchmarks you did?  Did you also run
>    benchmark on workloads that induce entry eviction, with and without need for
>    gc?  Eviction in pgss is already very expensive, and making things worse just
>    to save a bit of disk space doesn't seem like a good compromise.

Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is
The script and results. But here is a summary:
On a EC2 r5.2xlarge. The benchmark I performed is:
1. create 10k tables
2. create 5 users
3. run a pgbench script that performs per transaction a select on 
A randomly chosen table for each of the 5 users.
4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000
and one test with a larger pg_stat_statements.max = 10000. 

So 10-15% is not accurate. I originally tested on a less powered machine. For this
Benchmark I see a 6% increase in TPS (732k vs  683k) when we have a larger sized 
pg_stat_statements.max is used and less gc/deallocations. 
Both tests show a drop in gc/deallocations and a net increase
In tps. Less GC makes sense since the external file has less duplicate SQLs.



##################################
## pg_stat_statements.max = 15000
##################################

## with patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 732604
number of failed transactions: 0 (0.000%)
latency average = 9.828 ms
initial connection time = 33.349 ms
tps = 2035.051541 (without initial connection time)
[ec2-user@ip- pg_stat_statements]$
(1 row)

42 gc_qtext calls
3473 deallocations

## no patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 683434
number of failed transactions: 0 (0.000%)
latency average = 10.535 ms
initial connection time = 32.788 ms
tps = 1898.452025 (without initial connection time)

154 garbage collections
3239 deallocations

##################################
## pg_stat_statements.max = 5000
##################################


## with patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 673135
number of failed transactions: 0 (0.000%)
latency average = 10.696 ms
initial connection time = 32.908 ms
tps = 1869.829843 (without initial connection time)

400 garbage collections
12501 deallocations

## no patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 656160
number of failed transactions: 0 (0.000%)
latency average = 10.973 ms
initial connection time = 33.275 ms
tps = 1822.678069 (without initial connection time)

580 garbage collections
12180 deallocations

Thanks

Sami
Amazon Web Services



Attachment

Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
Julien Rouhaud
Date:
On Wed, Jun 22, 2022 at 11:05:54PM +0000, Imseih (AWS), Sami wrote:
> >    Can you describe how it's kept in sync, and how it makes sure that the property
> >    is maintained over restart / gc?  I don't see any change in the code for the
> >    2nd part so I don't see how it could work (and after a quick test it indeed
> >    doesn't).
>
> There is a routine called qtext_hash_sync which removed all entries from
> the qtext_hash and reloads it will all the query ids from pgss_hash.
> [...]
> All the points when it's called, an exclusive lock is held.this allows or syncing all
> The present queryid's in pgss_hash with qtext_hash.

So your approach is to let the current gc / file loading behavior happen as
before and construct your mapping hash using the resulting query text / offset
info.  That can't work.

> >    2nd part so I don't see how it could work (and after a quick test it indeed
> >    doesn't).
>
> Can you tell me what test you used to determine it is not in sync?

What test did you use to determine it is in sync?  Have you checked how the gc/
file loading actually work?

In my case I just checked the size of the query text file after running some
script that makes sure that there are the same few queries ran by multiple
different roles, then:

Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B
pg_ctl restart
Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B

> >    Can you share more details on the benchmarks you did?  Did you also run
> >    benchmark on workloads that induce entry eviction, with and without need for
> >    gc?  Eviction in pgss is already very expensive, and making things worse just
> >    to save a bit of disk space doesn't seem like a good compromise.
>
> Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is
> The script and results. But here is a summary:
> On a EC2 r5.2xlarge. The benchmark I performed is:
> 1. create 10k tables
> 2. create 5 users
> 3. run a pgbench script that performs per transaction a select on
> A randomly chosen table for each of the 5 users.
> 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000
> and one test with a larger pg_stat_statements.max = 10000.

But you wrote:

> ##################################
> ## pg_stat_statements.max = 15000
> ##################################

So which one is it?

>
> So 10-15% is not accurate. I originally tested on a less powered machine. For this
> Benchmark I see a 6% increase in TPS (732k vs  683k) when we have a larger sized
> pg_stat_statements.max is used and less gc/deallocations.
> Both tests show a drop in gc/deallocations and a net increase
> In tps. Less GC makes sense since the external file has less duplicate SQLs.

On the other hand you're rebuilding the new query_offset hashtable every time
there's an entry eviction, which seems quite expensive.

Also, as I mentioned you didn't change any of the heuristic for
need_gc_qtexts().  So if the query texts are indeed deduplicated, doesn't it
mean that  gc  will artificially
be called less often?  The wanted target of "50% bloat" will become "50%
bloat assuming no deduplication is done" and the average query text file size
will stay the same whether the query texts are deduplicated or not.

I'm wondering the improvements you see due to the patch or simply due to
artificially calling gc less often?  What are the results if instead of using
vanilla pg_stat_statements you patch it to perform roughly the same number of
gc as your version does?

Also your benchmark workload is very friendly with your feature, what are the
results with other workloads?  Having the results with query texts that aren't
artificially long would be interesting for instance, after fixing the problems
mentioned previously.

Also, you said that if you run that benchmark with a single user you don't see
any regression.   I don't see how rebuilding an extra hashtable in
entry_dealloc(), so when holding an exclusive lwlock, while not saving any
other work elsewhere could be free?

Looking at the script, you have:
echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \

Is that really necessary?  Couldn't you upgrade the gc message to a higher
level for your benchmark need, or expose some new counter in
pg_stat_statements_info maybe?  Have you done the benchmark using a debug build
or normal build?



Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
Antonin Houska
Date:
Shouldn't the patch status be set to "Waiting on Author"?

(I was curious if this is a patch that I can review.)

Julien Rouhaud <rjuju123@gmail.com> wrote:

> On Wed, Jun 22, 2022 at 11:05:54PM +0000, Imseih (AWS), Sami wrote:
> > >    Can you describe how it's kept in sync, and how it makes sure that the property
> > >    is maintained over restart / gc?  I don't see any change in the code for the
> > >    2nd part so I don't see how it could work (and after a quick test it indeed
> > >    doesn't).
> >
> > There is a routine called qtext_hash_sync which removed all entries from
> > the qtext_hash and reloads it will all the query ids from pgss_hash.
> > [...]
> > All the points when it's called, an exclusive lock is held.this allows or syncing all
> > The present queryid's in pgss_hash with qtext_hash.
>
> So your approach is to let the current gc / file loading behavior happen as
> before and construct your mapping hash using the resulting query text / offset
> info.  That can't work.
>
> > >    2nd part so I don't see how it could work (and after a quick test it indeed
> > >    doesn't).
> >
> > Can you tell me what test you used to determine it is not in sync?
>
> What test did you use to determine it is in sync?  Have you checked how the gc/
> file loading actually work?
>
> In my case I just checked the size of the query text file after running some
> script that makes sure that there are the same few queries ran by multiple
> different roles, then:
>
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B
> pg_ctl restart
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B
>
> > >    Can you share more details on the benchmarks you did?  Did you also run
> > >    benchmark on workloads that induce entry eviction, with and without need for
> > >    gc?  Eviction in pgss is already very expensive, and making things worse just
> > >    to save a bit of disk space doesn't seem like a good compromise.
> >
> > Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is
> > The script and results. But here is a summary:
> > On a EC2 r5.2xlarge. The benchmark I performed is:
> > 1. create 10k tables
> > 2. create 5 users
> > 3. run a pgbench script that performs per transaction a select on
> > A randomly chosen table for each of the 5 users.
> > 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000
> > and one test with a larger pg_stat_statements.max = 10000.
>
> But you wrote:
>
> > ##################################
> > ## pg_stat_statements.max = 15000
> > ##################################
>
> So which one is it?
>
> >
> > So 10-15% is not accurate. I originally tested on a less powered machine. For this
> > Benchmark I see a 6% increase in TPS (732k vs  683k) when we have a larger sized
> > pg_stat_statements.max is used and less gc/deallocations.
> > Both tests show a drop in gc/deallocations and a net increase
> > In tps. Less GC makes sense since the external file has less duplicate SQLs.
>
> On the other hand you're rebuilding the new query_offset hashtable every time
> there's an entry eviction, which seems quite expensive.
>
> Also, as I mentioned you didn't change any of the heuristic for
> need_gc_qtexts().  So if the query texts are indeed deduplicated, doesn't it
> mean that  gc  will artificially
> be called less often?  The wanted target of "50% bloat" will become "50%
> bloat assuming no deduplication is done" and the average query text file size
> will stay the same whether the query texts are deduplicated or not.
>
> I'm wondering the improvements you see due to the patch or simply due to
> artificially calling gc less often?  What are the results if instead of using
> vanilla pg_stat_statements you patch it to perform roughly the same number of
> gc as your version does?
>
> Also your benchmark workload is very friendly with your feature, what are the
> results with other workloads?  Having the results with query texts that aren't
> artificially long would be interesting for instance, after fixing the problems
> mentioned previously.
>
> Also, you said that if you run that benchmark with a single user you don't see
> any regression.   I don't see how rebuilding an extra hashtable in
> entry_dealloc(), so when holding an exclusive lwlock, while not saving any
> other work elsewhere could be free?
>
> Looking at the script, you have:
> echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \
>
> Is that really necessary?  Couldn't you upgrade the gc message to a higher
> level for your benchmark need, or expose some new counter in
> pg_stat_statements_info maybe?  Have you done the benchmark using a debug build
> or normal build?
>
>

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
Julien Rouhaud
Date:
Hi,

On Thu, Jul 14, 2022 at 08:51:24AM +0200, Antonin Houska wrote:
> Shouldn't the patch status be set to "Waiting on Author"?
>
> (I was curious if this is a patch that I can review.)

Ah indeed, I just updated the CF entry!



Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From
Jacob Champion
Date:
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3700/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob