Thread: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Adding a plan_id to pg_stat_activity allows users
to determine if a plan for a particular statement
has changed and if the new plan is performing better
or worse for a particular statement.
There are several ways the plan_id in pg_stat_activity
can be used:
1. In extensions that expose the plan text.
This will allow users to map a plan_id
from pg_stat_activity to the plan text.
2. In EXPLAIN output, including auto_explain.
3. In statement logging.
Computing the plan_id can be done using the same
routines for query jumbling, except plan nodes
will be jumbled. This approach was inspired by
work done in the extension pg_stat_plans,
https://github.com/2ndQuadrant/pg_stat_plans/
Attached is a POC patch that computes the plan_id
and presents the top-level plan_id in pg_stat_activity.
The patch still has work left:
- Perhaps Moving the plan jumbler outside of queryjumble.c?
- In the POC, the compute_query_id GUC determines if a
plan_id is to be computed. Should this be a separate GUC?
-- Below is the output of sampling pg_stat_activity
-- with a pgbench workload running The patch
-- introduces the plan_id column.
select count(*),
query,
query_id,
plan_id
from pg_stat_activity
where state='active'
and plan_id is not null and query_id is not null
group by query, query_id, plan_id
order by 1 desc limit 1;
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------
count | 1
query | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (7, 8, 242150, -1471, CURRENT_TIMESTAMP);
query_id | 4535829358544711074
plan_id | -4913142083940981109
-- Also, a new view called pg_stat_statements_plan which
-- Includes all the same columns as pg_stat_statements, but
-- with statistics shown per plan.
postgres=# select substr(query, 1, 10) as query, queryid, planid, calls from pg_stat_statements_plan where queryid = 4535829358544711074;
-[ RECORD 1 ]-----------------
query | INSERT INT
queryid | 4535829358544711074
planid | -4913142083940981109
calls | 4274428
-- the existing pg_stat_statements table
-- shows stats aggregated on
-- the queryid level. This is current behavior.
postgres=# select substr(query, 1, 10) as query, queryid, calls from pg_stat_statements where queryid = 4535829358544711074;
-[ RECORD 1 ]----------------
query | INSERT INT
queryid | 4535829358544711074
calls | 4377142
-- The “%Q” log_line_prefix flag will also include the planid as part of the output
-- the format will be "query_id/plan_id"
-- An example of using auto_explain with the ‘%Q” flag in log_line_prefix.
2022-06-14 17:08:10.485 CDT [76955] [4912312221998332774/-2294484545013135901] LOG: duration: 0.144 ms plan:
Query Text: UPDATE pgbench_tellers SET tbalance = tbalance + -1952 WHERE tid = 32;
Update on public.pgbench_tellers (cost=0.27..8.29 rows=0 width=0)
-> Index Scan using pgbench_tellers_pkey on public.pgbench_tellers (cost=0.27..8.29 rows=1 width=10)
Output: (tbalance + '-1952'::integer), ctid
Index Cond: (pgbench_tellers.tid = 32)
-- the output for EXPLAIN VERBOSE also shows a plan id.
postgres=# explain verbose select 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
Output: 1
Query Identifier: -2698492627503961632
Plan Identifier: -7861780579971713347
(4 rows)
Thanks,
Sami Imseih
Amazon Web Services
Attachment
Hi, On Wed, Jun 15, 2022 at 06:45:38PM +0000, Imseih (AWS), Sami wrote: > Adding a plan_id to pg_stat_activity allows users > to determine if a plan for a particular statement > has changed and if the new plan is performing better > or worse for a particular statement. > [...] > Attached is a POC patch that computes the plan_id > and presents the top-level plan_id in pg_stat_activity. AFAICS you're proposing to add an identifier for a specific plan, but no way to know what that plan was? How are users supposed to use the information if they know something changed but don't know what changed exactly? > - In the POC, the compute_query_id GUC determines if a > plan_id is to be computed. Should this be a separate GUC? Probably, as computing it will likely be quite expensive. Some benchmark on various workloads would be needed here. I only had a quick look at the patch, but I see that you have some code to avoid storing the query text multiple times with different planid. How does it work exactly, and does it ensure that the query text is only removed once the last entry that uses it is removed? It seems that you identify a specific query text by queryid, but that seems wrong as collision can (easily?) happen in different databases. The real identifier of a query text should be (dbid, queryid). Note that this problem already exists, as the query texts are now stored per (userid, dbid, queryid, istoplevel). Maybe this part could be split in a different commit as it could already be useful without a planid.
On 15/6/2022 21:45, Imseih (AWS), Sami wrote: > Adding a plan_id to pg_stat_activity allows users > to determine if a plan for a particular statement > has changed and if the new plan is performing better > or worse for a particular statement. > There are several ways the plan_id in pg_stat_activity In general, your idea is quite useful. But, as discussed earlier [1] extensions would implement many useful things if they could add into a plan some custom data. Maybe implement your feature with some private list of nodes in plan structure instead of single-purpose plan_id field? [1] https://www.postgresql.org/message-id/flat/e0de3423-4bba-1e69-c55a-f76bf18dbd74%40postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
> AFAICS you're proposing to add an identifier for a specific plan, but no way to > know what that plan was? How are users supposed to use the information if they > know something changed but don't know what changed exactly? I see this as a start to do more useful things with plans. The patch right out the gate exposes the plan_id in EXPLAIN output and auto_explain. This will also be useful for extensions that will provide the plan text. It is also conceivable that pg_stat_statements can provide an option To store the plan text? > - In the POC, the compute_query_id GUC determines if a > plan_id is to be computed. Should this be a separate GUC? > Probably, as computing it will likely be quite expensive. Some benchmark on > various workloads would be needed here. Yes, more benchmarks will be needed here with more complex plans. I have Only benchmarked with pgbench at this point. However, separating this into Its own GUC is what I am leaning towards as well and will update the patch. > I only had a quick look at the patch, but I see that you have some code to > avoid storing the query text multiple times with different planid. How does it > work exactly, and does it ensure that the query text is only removed once the > last entry that uses it is removed? It seems that you identify a specific > query text by queryid, but that seems wrong as collision can (easily?) happen > in different databases. The real identifier of a query text should be (dbid, > queryid). The idea is to lookup the offsets and length of the text in the external file by querid only. Therefore we can store similar query text for multiple pgss_hash entries only once. When a new entry in pgss is not found, the new qtext_hash is consulted to see if it has information about the offsets/length of the queryid. If found in qtext_hash, the new pgss_hash entry is created with the offsets found. If not found in qtext_hash, the query text will be (normalized) and stored in the external file. Then, a new entry will be created in qtext_hash and an entry in pgss_hash. Of course we need to also handle the gc_qtext cleanups, entry_reset, startup and shutdown scenarios. The patch does this, but I will go back and do more testing. > Note that this problem already exists, as the query texts are now stored per > (userid, dbid, queryid, istoplevel). Maybe this part could be split in a > different commit as it could already be useful without a planid. Good point. I will separate this patch. Regards, Sami Imseih Amazon Web Services
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
> Good point. I will separate this patch. I separated the pg_stat_statements patch. The patch Introduces a secondary hash that tracks locations of A query ( by queryid ) in the external file. The hash remains in lockstep with the pgss_hash using a synchronization routine. For the default pg_stat_statements.max = 5000, this hash requires 2MB megabytes of additional shared memory. My testing does not show any regression for workloads In which statements are not issues by multiple users/databases. However, it shows good improvement, 10-15%, when there are similar statements that are issues by multiple users/databases/tracking levels. Besides this improvement, this will open up the opportunity to also track plan_id's as discussed earlier in the thread. Thanks for the feedback. Regards, Sami Imseih Amazon Web Services
Attachment
Hi, On Tue, Jun 21, 2022 at 08:04:01PM +0000, Imseih (AWS), Sami wrote: > > I separated the pg_stat_statements patch. The patch > Introduces a secondary hash that tracks locations of > A query ( by queryid ) in the external file. I still think that's wrong. > The hash > remains in lockstep with the pgss_hash using a > synchronization routine. Can you describe how it's kept in sync, and how it makes sure that the property is maintained over restart / gc? I don't see any change in the code for the 2nd part so I don't see how it could work (and after a quick test it indeed doesn't). I also don't see any change in the heuristics for need_gc_qtext(), isn't that going to lead to too frequent gc? > My testing does not show any regression for workloads > In which statements are not issues by multiple users/databases. > > However, it shows good improvement, 10-15%, when there > are similar statements that are issues by multiple > users/databases/tracking levels. "no regression" and "10-15% improvement" on what? Can you share more details on the benchmarks you did? Did you also run benchmark on workloads that induce entry eviction, with and without need for gc? Eviction in pgss is already very expensive, and making things worse just to save a bit of disk space doesn't seem like a good compromise.
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
> Can you describe how it's kept in sync, and how it makes sure that the property > is maintained over restart / gc? I don't see any change in the code for the > 2nd part so I don't see how it could work (and after a quick test it indeed > doesn't). There is a routine called qtext_hash_sync which removed all entries from the qtext_hash and reloads it will all the query ids from pgss_hash. This routine is called during: 1. gc_qtexts() 2. entry_reset() 3. entry_dealloc(), although this can be moved to the end of entry_alloc() instead. 4. pgss_shmem_startup() All the points when it's called, an exclusive lock is held.this allows or syncing all The present queryid's in pgss_hash with qtext_hash. . > 2nd part so I don't see how it could work (and after a quick test it indeed > doesn't). Can you tell me what test you used to determine it is not in sync? > Can you share more details on the benchmarks you did? Did you also run > benchmark on workloads that induce entry eviction, with and without need for > gc? Eviction in pgss is already very expensive, and making things worse just > to save a bit of disk space doesn't seem like a good compromise. Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is The script and results. But here is a summary: On a EC2 r5.2xlarge. The benchmark I performed is: 1. create 10k tables 2. create 5 users 3. run a pgbench script that performs per transaction a select on A randomly chosen table for each of the 5 users. 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000 and one test with a larger pg_stat_statements.max = 10000. So 10-15% is not accurate. I originally tested on a less powered machine. For this Benchmark I see a 6% increase in TPS (732k vs 683k) when we have a larger sized pg_stat_statements.max is used and less gc/deallocations. Both tests show a drop in gc/deallocations and a net increase In tps. Less GC makes sense since the external file has less duplicate SQLs. ################################## ## pg_stat_statements.max = 15000 ################################## ## with patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 732604 number of failed transactions: 0 (0.000%) latency average = 9.828 ms initial connection time = 33.349 ms tps = 2035.051541 (without initial connection time) [ec2-user@ip- pg_stat_statements]$ (1 row) 42 gc_qtext calls 3473 deallocations ## no patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 683434 number of failed transactions: 0 (0.000%) latency average = 10.535 ms initial connection time = 32.788 ms tps = 1898.452025 (without initial connection time) 154 garbage collections 3239 deallocations ################################## ## pg_stat_statements.max = 5000 ################################## ## with patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 673135 number of failed transactions: 0 (0.000%) latency average = 10.696 ms initial connection time = 32.908 ms tps = 1869.829843 (without initial connection time) 400 garbage collections 12501 deallocations ## no patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 656160 number of failed transactions: 0 (0.000%) latency average = 10.973 ms initial connection time = 33.275 ms tps = 1822.678069 (without initial connection time) 580 garbage collections 12180 deallocations Thanks Sami Amazon Web Services
Attachment
On Wed, Jun 22, 2022 at 11:05:54PM +0000, Imseih (AWS), Sami wrote: > > Can you describe how it's kept in sync, and how it makes sure that the property > > is maintained over restart / gc? I don't see any change in the code for the > > 2nd part so I don't see how it could work (and after a quick test it indeed > > doesn't). > > There is a routine called qtext_hash_sync which removed all entries from > the qtext_hash and reloads it will all the query ids from pgss_hash. > [...] > All the points when it's called, an exclusive lock is held.this allows or syncing all > The present queryid's in pgss_hash with qtext_hash. So your approach is to let the current gc / file loading behavior happen as before and construct your mapping hash using the resulting query text / offset info. That can't work. > > 2nd part so I don't see how it could work (and after a quick test it indeed > > doesn't). > > Can you tell me what test you used to determine it is not in sync? What test did you use to determine it is in sync? Have you checked how the gc/ file loading actually work? In my case I just checked the size of the query text file after running some script that makes sure that there are the same few queries ran by multiple different roles, then: Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B pg_ctl restart Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B > > Can you share more details on the benchmarks you did? Did you also run > > benchmark on workloads that induce entry eviction, with and without need for > > gc? Eviction in pgss is already very expensive, and making things worse just > > to save a bit of disk space doesn't seem like a good compromise. > > Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is > The script and results. But here is a summary: > On a EC2 r5.2xlarge. The benchmark I performed is: > 1. create 10k tables > 2. create 5 users > 3. run a pgbench script that performs per transaction a select on > A randomly chosen table for each of the 5 users. > 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000 > and one test with a larger pg_stat_statements.max = 10000. But you wrote: > ################################## > ## pg_stat_statements.max = 15000 > ################################## So which one is it? > > So 10-15% is not accurate. I originally tested on a less powered machine. For this > Benchmark I see a 6% increase in TPS (732k vs 683k) when we have a larger sized > pg_stat_statements.max is used and less gc/deallocations. > Both tests show a drop in gc/deallocations and a net increase > In tps. Less GC makes sense since the external file has less duplicate SQLs. On the other hand you're rebuilding the new query_offset hashtable every time there's an entry eviction, which seems quite expensive. Also, as I mentioned you didn't change any of the heuristic for need_gc_qtexts(). So if the query texts are indeed deduplicated, doesn't it mean that gc will artificially be called less often? The wanted target of "50% bloat" will become "50% bloat assuming no deduplication is done" and the average query text file size will stay the same whether the query texts are deduplicated or not. I'm wondering the improvements you see due to the patch or simply due to artificially calling gc less often? What are the results if instead of using vanilla pg_stat_statements you patch it to perform roughly the same number of gc as your version does? Also your benchmark workload is very friendly with your feature, what are the results with other workloads? Having the results with query texts that aren't artificially long would be interesting for instance, after fixing the problems mentioned previously. Also, you said that if you run that benchmark with a single user you don't see any regression. I don't see how rebuilding an extra hashtable in entry_dealloc(), so when holding an exclusive lwlock, while not saving any other work elsewhere could be free? Looking at the script, you have: echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \ Is that really necessary? Couldn't you upgrade the gc message to a higher level for your benchmark need, or expose some new counter in pg_stat_statements_info maybe? Have you done the benchmark using a debug build or normal build?
Shouldn't the patch status be set to "Waiting on Author"? (I was curious if this is a patch that I can review.) Julien Rouhaud <rjuju123@gmail.com> wrote: > On Wed, Jun 22, 2022 at 11:05:54PM +0000, Imseih (AWS), Sami wrote: > > > Can you describe how it's kept in sync, and how it makes sure that the property > > > is maintained over restart / gc? I don't see any change in the code for the > > > 2nd part so I don't see how it could work (and after a quick test it indeed > > > doesn't). > > > > There is a routine called qtext_hash_sync which removed all entries from > > the qtext_hash and reloads it will all the query ids from pgss_hash. > > [...] > > All the points when it's called, an exclusive lock is held.this allows or syncing all > > The present queryid's in pgss_hash with qtext_hash. > > So your approach is to let the current gc / file loading behavior happen as > before and construct your mapping hash using the resulting query text / offset > info. That can't work. > > > > 2nd part so I don't see how it could work (and after a quick test it indeed > > > doesn't). > > > > Can you tell me what test you used to determine it is not in sync? > > What test did you use to determine it is in sync? Have you checked how the gc/ > file loading actually work? > > In my case I just checked the size of the query text file after running some > script that makes sure that there are the same few queries ran by multiple > different roles, then: > > Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B > pg_ctl restart > Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B > > > > Can you share more details on the benchmarks you did? Did you also run > > > benchmark on workloads that induce entry eviction, with and without need for > > > gc? Eviction in pgss is already very expensive, and making things worse just > > > to save a bit of disk space doesn't seem like a good compromise. > > > > Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is > > The script and results. But here is a summary: > > On a EC2 r5.2xlarge. The benchmark I performed is: > > 1. create 10k tables > > 2. create 5 users > > 3. run a pgbench script that performs per transaction a select on > > A randomly chosen table for each of the 5 users. > > 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000 > > and one test with a larger pg_stat_statements.max = 10000. > > But you wrote: > > > ################################## > > ## pg_stat_statements.max = 15000 > > ################################## > > So which one is it? > > > > > So 10-15% is not accurate. I originally tested on a less powered machine. For this > > Benchmark I see a 6% increase in TPS (732k vs 683k) when we have a larger sized > > pg_stat_statements.max is used and less gc/deallocations. > > Both tests show a drop in gc/deallocations and a net increase > > In tps. Less GC makes sense since the external file has less duplicate SQLs. > > On the other hand you're rebuilding the new query_offset hashtable every time > there's an entry eviction, which seems quite expensive. > > Also, as I mentioned you didn't change any of the heuristic for > need_gc_qtexts(). So if the query texts are indeed deduplicated, doesn't it > mean that gc will artificially > be called less often? The wanted target of "50% bloat" will become "50% > bloat assuming no deduplication is done" and the average query text file size > will stay the same whether the query texts are deduplicated or not. > > I'm wondering the improvements you see due to the patch or simply due to > artificially calling gc less often? What are the results if instead of using > vanilla pg_stat_statements you patch it to perform roughly the same number of > gc as your version does? > > Also your benchmark workload is very friendly with your feature, what are the > results with other workloads? Having the results with query texts that aren't > artificially long would be interesting for instance, after fixing the problems > mentioned previously. > > Also, you said that if you run that benchmark with a single user you don't see > any regression. I don't see how rebuilding an extra hashtable in > entry_dealloc(), so when holding an exclusive lwlock, while not saving any > other work elsewhere could be free? > > Looking at the script, you have: > echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \ > > Is that really necessary? Couldn't you upgrade the gc message to a higher > level for your benchmark need, or expose some new counter in > pg_stat_statements_info maybe? Have you done the benchmark using a debug build > or normal build? > > -- Antonin Houska Web: https://www.cybertec-postgresql.com
Hi, On Thu, Jul 14, 2022 at 08:51:24AM +0200, Antonin Houska wrote: > Shouldn't the patch status be set to "Waiting on Author"? > > (I was curious if this is a patch that I can review.) Ah indeed, I just updated the CF entry!
This entry has been waiting on author input for a while (our current threshold is roughly two weeks), so I've marked it Returned with Feedback. Once you think the patchset is ready for review again, you (or any interested party) can resurrect the patch entry by visiting https://commitfest.postgresql.org/38/3700/ and changing the status to "Needs Review", and then changing the status again to "Move to next CF". (Don't forget the second step; hopefully we will have streamlined this in the near future!) Thanks, --Jacob