Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity - Mailing list pgsql-hackers

From Imseih (AWS), Sami
Subject Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Date
Msg-id 545ABFC1-D5E9-4048-9737-2A7AB2C29174@amazon.com
Whole thread Raw
In response to Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
List pgsql-hackers
>    Can you describe how it's kept in sync, and how it makes sure that the property
>    is maintained over restart / gc?  I don't see any change in the code for the
>    2nd part so I don't see how it could work (and after a quick test it indeed
>    doesn't).

There is a routine called qtext_hash_sync which removed all entries from 
the qtext_hash and reloads it will all the query ids from pgss_hash. 

This routine is called during:

1. gc_qtexts()
2. entry_reset()
3. entry_dealloc(), although this can be moved to the end of entry_alloc() instead.
4. pgss_shmem_startup()

All the points when it's called, an exclusive lock is held.this allows or syncing all
The present queryid's in pgss_hash with qtext_hash.
.

>    2nd part so I don't see how it could work (and after a quick test it indeed
>    doesn't).

Can you tell me what test you used to determine it is not in sync?


>    Can you share more details on the benchmarks you did?  Did you also run
>    benchmark on workloads that induce entry eviction, with and without need for
>    gc?  Eviction in pgss is already very expensive, and making things worse just
>    to save a bit of disk space doesn't seem like a good compromise.

Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is
The script and results. But here is a summary:
On a EC2 r5.2xlarge. The benchmark I performed is:
1. create 10k tables
2. create 5 users
3. run a pgbench script that performs per transaction a select on 
A randomly chosen table for each of the 5 users.
4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000
and one test with a larger pg_stat_statements.max = 10000. 

So 10-15% is not accurate. I originally tested on a less powered machine. For this
Benchmark I see a 6% increase in TPS (732k vs  683k) when we have a larger sized 
pg_stat_statements.max is used and less gc/deallocations. 
Both tests show a drop in gc/deallocations and a net increase
In tps. Less GC makes sense since the external file has less duplicate SQLs.



##################################
## pg_stat_statements.max = 15000
##################################

## with patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 732604
number of failed transactions: 0 (0.000%)
latency average = 9.828 ms
initial connection time = 33.349 ms
tps = 2035.051541 (without initial connection time)
[ec2-user@ip- pg_stat_statements]$
(1 row)

42 gc_qtext calls
3473 deallocations

## no patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 683434
number of failed transactions: 0 (0.000%)
latency average = 10.535 ms
initial connection time = 32.788 ms
tps = 1898.452025 (without initial connection time)

154 garbage collections
3239 deallocations

##################################
## pg_stat_statements.max = 5000
##################################


## with patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 673135
number of failed transactions: 0 (0.000%)
latency average = 10.696 ms
initial connection time = 32.908 ms
tps = 1869.829843 (without initial connection time)

400 garbage collections
12501 deallocations

## no patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 656160
number of failed transactions: 0 (0.000%)
latency average = 10.973 ms
initial connection time = 33.275 ms
tps = 1822.678069 (without initial connection time)

580 garbage collections
12180 deallocations

Thanks

Sami
Amazon Web Services



Attachment

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Tightening behaviour for non-immutable behaviour in immutable functions
Next
From: Andres Freund
Date:
Subject: Re: fix stats_fetch_consistency value in postgresql.conf.sample