Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02) - Mailing list pgsql-hackers

From Julian Markwort
Subject Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
Date
Msg-id permail-201803021707328218e1ae00007aa7-j_mark05@message-id.uni-muenster.de
Whole thread Raw
In response to Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund wrote on 2018-03-01:
> I think the patch probably doesn't apply anymore, due to other changes
> to pg_stat_statements since its posting. Could you refresh?

pgss_plans_v02.patch applies cleanly to master, there were no changes to pg_stat_statements since the copyright updates
atthe beginning of January. 
(pgss_plans_v02.patch is attached to message 1bd396a9-4573-55ad-7ce8-fe7adffa1bd9@uni-muenster.de and can be found in
thecurrent commitfest as well.) 

> I've not done any sort of review. Scrolling through I noticed //
> comments which aren't pg coding style.

I'll fix that along with any other problems that might be found in a review.


> I'd like to see a small benchmark showing the overhead of the feature.
> Both in runtime and storage size.

I've tried to gather some meaningful results, however either my testing methodology was flawed (as variance between all
mypasses of pgbench was rather high) or the takeaway is that the feature only generates little overhead. 
This is what I've run on my workstation using a Ryzen 1700 and 16GB of RAM and an old Samsung 840 Evo as boot drive,
whichalso held the database: 
The database used for the tests was dropped and pgbench initialized anew for each test (pgss off, pgss on, pgss on with
plancollection) using a scaling of 16437704*0.003~=50 (roughly what the phoronix test suite uses for a buffer test). 
Also similar to the phoronix test suite, I used 8 jobs and 32 connections for a normal multithreaded load.

I then ran 10 passes, each for 60 seconds, with a 30 second pause between them, as well as another test which ran for
10minutes. 

With pg_stat_statements on, the latter test (10 minutes) resulted in 1833 tps, while the patched version resulted in
1700tps, so a little over 7% overhead? Well, the "control run", without pg_stat_statements delivered only 1806 tps, so
varianceseems to be quite high. 

The results of the ten successive tests, each running 60 seconds and then waiting for 30 seconds, are displayed in the
attachedplot. 
I've tinkered with different settings with pgbench for quite some time now and all I can come up with are runs with
highvariance between them. 

If anybody has any recommendations for a setup that generates less variance, I'll try this again.

Finally, the more interesting metric regarding this patch is the size of the pg_stat_statements.stat file, which stores
allthe metrics while the database is shut down. I reckon that the size of pgss_query_texts.stat (which holds only the
querystrings and plan strings while the database is running) will be similar, however it might fluctuate more as new
stringsare simply appended to the file until the garbagecollector decides that it has to be cleaned up. 
After running the aforementioned tests, the file was 8566 bytes in size for pgss in it's unmodified form, while the
testsresulted in 32607 bytes for the pgss that collects plans as well. This seems reasonable as plans strings are
usuallylonger than the statements from which they result. Worst case, the pg_stat_statements.stat holds two plans for
eachtype of statement. 
I've not tested the length of the file with different encodings, such as JSON, YAML, or XML, however I do not expect
anyhugely different results. 

Greetings
Julian

Attachment

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: New gist vacuum.
Next
From: Greg Stark
Date:
Subject: Re: jsonlog logging only some messages?