Thread: RE: minimizing pg_stat_statements performance overhead

RE: minimizing pg_stat_statements performance overhead

From
Raymond Martin
Date:
Hi Fabien,
Thank you for your time. Apologies for not being more specific about my testing methodology.

> > PGSS not loaded: 0.18ms
>
> This means 0.0018 ms latency per transaction, which seems rather fast, on my laptop I have typically 0.0XX ms...

This actually means 0.18 milliseconds. I agree that this is a bit high, so I instead created an Ubuntu VM to get
resultsthat would align with yours.  

> I could not reproduce these results on my ubuntu laptop. Could you be more precise about the test? Did you use
pgbench?Did it run in parallel? What options were used? What is the test script? 

I did not use pgbench. It is important to call pg_stat_statements_reset before every query. This simulates a user that
isperforming distinct and non-repeated queries on their database. If you use prepared statements or the same set of
querieseach time, you would remove the contention on the pgss query text file.  
I re-tested this on an Ubuntu machine with 4cores and 14GB ram. I did not run it in parallel. I used a python script
thatimplements the follow logic:  
    - select pg_stat_statements_reset() -- this is important because we are making pgss treat the 'select 1' like a new
querywhich it has not cached into pgss_hash.  
    - time 'select 1'
Repeat 100 times for each configuration.

Here are my Ubuntu results:
  pgss unloaded
  Mean: 0.076
  Standard Deviation: 0.038

  pgss.track=none
  Mean: 0.099
  Standard Deviation: 0.040

  pgss.track=top
  Mean: 0.098
  Standard Deviation: 0.107

  pgss.track=none + patch
  Mean: 0.078
  Standard Deviation: 0.042

The results are less noticeable, but I still see about a 20% performance improvement here.

> There I have an impact of 10% in these ideal testing conditions wrt latency where the DB does basically nothing, thus
whichwould not warrant to disable pg_stat_statements given the great service this extension brings to performance
analysis.

I agree that pg_stat_statements should not be disabled based on these performance results.

> Note that this does not mean that the patch should not be applied, it looks like an oversight, but really I do not
havethe performance degradation you are suggesting. 

I appreciate your input and I want to come up with a canonical test that makes this contention more obvious.
Unfortunately, it is difficult because the criteria that causes this slow down (large query sizes and distinct
non-repeatedqueries) are difficult to reproduce with pgbench. I would be open to any suggestions here.  

So even though the performance gains in this specific scenario are not as great, do you still think it would make sense
tosubmit a patch like this?  

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL



RE: minimizing pg_stat_statements performance overhead

From
legrand legrand
Date:
my test case:

drop table a;
create table a ();

DO
$$
DECLARE
i int;
BEGIN
for i in 1..20
loop
execute 'alter table a add column a'||i::text||' int';
end loop;
END
$$;

select  pg_stat_statements_reset();
set pg_stat_statements.track='none';

DO
$$
DECLARE
i int;
j int;
BEGIN
for j in 1..20
loop
for i in 1..20
loop
execute 'select a'||i::text||',a'||j::text||' from a where 1=2';
end loop;
end loop;
END
$$;




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



RE: minimizing pg_stat_statements performance overhead

From
Fabien COELHO
Date:
Hello Raymond,

>> Note that this does not mean that the patch should not be applied, it 
>> looks like an oversight, but really I do not have the performance 
>> degradation you are suggesting.
>
> I appreciate your input and I want to come up with a canonical test that 
> makes this contention more obvious. Unfortunately, it is difficult 
> because the criteria that causes this slow down (large query sizes and 
> distinct non-repeated queries) are difficult to reproduce with pgbench. 
> I would be open to any suggestions here.
>
> So even though the performance gains in this specific scenario are not 
> as great, do you still think it would make sense to submit a patch like 
> this?

Sure, it definitely makes sense to reduce the overhead when the extension 
is disabled. I wanted to understand the source of performance issue, and 
your explanations where not enough for reproducing it.

-- 
Fabien.



RE: minimizing pg_stat_statements performance overhead

From
Raymond Martin
Date:
Hi Fabien,

> Sure, it definitely makes sense to reduce the overhead when the extension is disabled. I wanted to understand the
sourceof performance issue, and your explanations where not enough for reproducing it. 
Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next commit
fest. 
I will continue trying to find a simple performance test to exemplify the performance degradation that I have seen with
morecomplex workloads.  

--
Raymond Martin
ramarti@Microsoft.com
Azure Database for PostgreSQL


Attachment

RE: minimizing pg_stat_statements performance overhead

From
legrand legrand
Date:
Hi,

it seems that your patch is not readable.
If you want it to be included in a commitfest, you should add it by yourself
in https://commitfest.postgresql.org/

Not sure that there is any room left in pg12 commitfest.

Regard
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: minimizing pg_stat_statements performance overhead

From
Christoph Berg
Date:
Re: Raymond Martin 2019-04-01 <BN8PR21MB121708579A3782866DF1F745B1550@BN8PR21MB1217.namprd21.prod.outlook.com>
> Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next commit
fest.
 

Raymond,

you sent the patch as UTF-16, could you re-send it as plain ascii?

Christoph



Re: minimizing pg_stat_statements performance overhead

From
Robert Haas
Date:
On Tue, Apr 2, 2019 at 5:37 AM Christoph Berg <myon@debian.org> wrote:
> Re: Raymond Martin 2019-04-01 <BN8PR21MB121708579A3782866DF1F745B1550@BN8PR21MB1217.namprd21.prod.outlook.com>
> > Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next
commitfest.
 
>
> you sent the patch as UTF-16, could you re-send it as plain ascii?

One thing that needs some thought here is what happens if the value of
pgss_enabled() changes.  For example we don't want a situation where
if the value changes from off to on between one stage of processing
and another, the server crashes.

I don't know whether that's a risk here or not; it's just something to
think about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: minimizing pg_stat_statements performance overhead

From
legrand legrand
Date:
Robert Haas wrote
> On Tue, Apr 2, 2019 at 5:37 AM Christoph Berg <

> myon@

> > wrote:
>> Re: Raymond Martin 2019-04-01 <

> BN8PR21MB121708579A3782866DF1F745B1550@.outlook

> >
>> > Thanks again Fabien. I am attaching the patch to this email in the hope
>> of getting it approved during the next commit fest.
>>
>> you sent the patch as UTF-16, could you re-send it as plain ascii?
> 
> One thing that needs some thought here is what happens if the value of
> pgss_enabled() changes.  For example we don't want a situation where
> if the value changes from off to on between one stage of processing
> and another, the server crashes.
> 
> I don't know whether that's a risk here or not; it's just something to
> think about.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Hi, here is a simple test where I commented that line
in pgss_post_parse_analyze
   to force return; (as if pgss_enabled() was disabled)
but kept pgss_enabled() every where else 

    /* Safety check... */
//    if (!pgss || !pgss_hash || !pgss_enabled())
        return;

This works without crash as you can see here after:


postgres=# select pg_stat_statements_reset();
 pg_stat_statements_reset
--------------------------

(1 row)


postgres=# show pg_stat_statements.track;
 pg_stat_statements.track
--------------------------
 all
(1 row)


postgres=# create table a(id int);
CREATE TABLE


postgres=# select * from a where id=1;
 id
----
(0 rows)


postgres=# select queryid,query,calls from pg_stat_statements;
       queryid       |             query             | calls
---------------------+-------------------------------+-------
 1033669194118974675 | show pg_stat_statements.track |     1
 3022461129400094363 | create table a(id int)        |     1
(2 rows)

regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



RE: minimizing pg_stat_statements performance overhead

From
Raymond Martin
Date:
Hi Christoph,

> you sent the patch as UTF-16, could you re-send it as plain ascii?

Apologies. I re-attached the plain ascii version.

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL


Attachment

RE: minimizing pg_stat_statements performance overhead

From
Raymond Martin
Date:
From: Robert Haas
>>
>> One thing that needs some thought here is what happens if the value of
>> pgss_enabled() changes.  For example we don't want a situation where
>> if the value changes from off to on between one stage of processing
>> and another, the server crashes.
>>
>> I don't know whether that's a risk here or not; it's just something to
>> think about.
This is definitely an important consideration for this change. A hook could
have the implicit assumption that a query ID is always generated.

From: PAscal
> Hi, here is a simple test where I commented that line in pgss_post_parse_analyze
>  to force return; (as if pgss_enabled() was disabled) but kept pgss_enabled() every where else
>
>    /* Safety check... */
> //    if (!pgss || !pgss_hash || !pgss_enabled())
>        return;
>
> This works without crash as you can see here after:

In theory, the rest of the hooks look solid.
As mentioned above, I think the major concern would be a hook that depends
on a variable generated in pgss_post_parse_analyze. Two hooks
(pgss_ExecutorStart, pgss_ExecutorEnd) depend on the query ID
generated from pgss_post_parse_analyze. Fortunately, both of these
functions already check for query ID before doing work.

I really appreciate you putting this change into practice.
Great to see your results align with mine.
Thanks Pascal!!!

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL



Re: minimizing pg_stat_statements performance overhead

From
Raymond Martin
Date:
Hi, 
Apologies, but I had already created a commit here: https://commitfest.postgresql.org/23/2080/ . 
Any preference on which to keep?

Thanks, 
Raymond Martin 
ramarti@microsoft.com

RE: minimizing pg_stat_statements performance overhead

From
Fabien COELHO
Date:
Hello Raymond,

>> Sure, it definitely makes sense to reduce the overhead when the extension is disabled. I wanted to understand the
sourceof performance issue, and your explanations where not enough for reproducing it.
 
> Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next commit
fest.
> I will continue trying to find a simple performance test to exemplify the performance degradation that I have seen
withmore complex workloads.
 

Patch applies and compiles cleanly. Global and local make check ok.

The patch adds an early exit in one of the hook when pgss is not enabled 
on a given query. This seems to be a long time oversight of some earlier 
additions which only had some (small or large depending) performance 
impact.

About the comment "...and" -> "... and" (add a space)

Otherwise all is well.

-- 
Fabien.



Re: minimizing pg_stat_statements performance overhead

From
Jeff Davis
Date:
On Wed, 2019-04-03 at 23:20 +0000, Raymond Martin wrote:
> Hi Christoph, 
> 
> > you sent the patch as UTF-16, could you re-send it as plain ascii?
> 
> Apologies. I re-attached the plain ascii version. 

Committed. Thanks!

Regards,
    Jeff Davis