Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity - Mailing list pgsql-hackers
From | Imseih (AWS), Sami |
---|---|
Subject | Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity |
Date | |
Msg-id | 545ABFC1-D5E9-4048-9737-2A7AB2C29174@amazon.com Whole thread Raw |
In response to | Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
|
List | pgsql-hackers |
> Can you describe how it's kept in sync, and how it makes sure that the property > is maintained over restart / gc? I don't see any change in the code for the > 2nd part so I don't see how it could work (and after a quick test it indeed > doesn't). There is a routine called qtext_hash_sync which removed all entries from the qtext_hash and reloads it will all the query ids from pgss_hash. This routine is called during: 1. gc_qtexts() 2. entry_reset() 3. entry_dealloc(), although this can be moved to the end of entry_alloc() instead. 4. pgss_shmem_startup() All the points when it's called, an exclusive lock is held.this allows or syncing all The present queryid's in pgss_hash with qtext_hash. . > 2nd part so I don't see how it could work (and after a quick test it indeed > doesn't). Can you tell me what test you used to determine it is not in sync? > Can you share more details on the benchmarks you did? Did you also run > benchmark on workloads that induce entry eviction, with and without need for > gc? Eviction in pgss is already very expensive, and making things worse just > to save a bit of disk space doesn't seem like a good compromise. Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is The script and results. But here is a summary: On a EC2 r5.2xlarge. The benchmark I performed is: 1. create 10k tables 2. create 5 users 3. run a pgbench script that performs per transaction a select on A randomly chosen table for each of the 5 users. 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000 and one test with a larger pg_stat_statements.max = 10000. So 10-15% is not accurate. I originally tested on a less powered machine. For this Benchmark I see a 6% increase in TPS (732k vs 683k) when we have a larger sized pg_stat_statements.max is used and less gc/deallocations. Both tests show a drop in gc/deallocations and a net increase In tps. Less GC makes sense since the external file has less duplicate SQLs. ################################## ## pg_stat_statements.max = 15000 ################################## ## with patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 732604 number of failed transactions: 0 (0.000%) latency average = 9.828 ms initial connection time = 33.349 ms tps = 2035.051541 (without initial connection time) [ec2-user@ip- pg_stat_statements]$ (1 row) 42 gc_qtext calls 3473 deallocations ## no patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 683434 number of failed transactions: 0 (0.000%) latency average = 10.535 ms initial connection time = 32.788 ms tps = 1898.452025 (without initial connection time) 154 garbage collections 3239 deallocations ################################## ## pg_stat_statements.max = 5000 ################################## ## with patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 673135 number of failed transactions: 0 (0.000%) latency average = 10.696 ms initial connection time = 32.908 ms tps = 1869.829843 (without initial connection time) 400 garbage collections 12501 deallocations ## no patch transaction type: /tmp/wl.sql scaling factor: 1 query mode: simple number of clients: 20 number of threads: 1 maximum number of tries: 1 duration: 360 s number of transactions actually processed: 656160 number of failed transactions: 0 (0.000%) latency average = 10.973 ms initial connection time = 33.275 ms tps = 1822.678069 (without initial connection time) 580 garbage collections 12180 deallocations Thanks Sami Amazon Web Services
Attachment
pgsql-hackers by date: