Thread: [PATCH] Enhancements to pg_stat_statements contrib extension

[PATCH] Enhancements to pg_stat_statements contrib extension

From
Yoan SULTAN
Date:
Hi -hackers,

This is my first patch here in the mailing list, so I tried to explain the "why" and the "how" of this enhancement.


Needs :
Formerly, Oracle DBA, I used to query v$sql to know the latest queries issued by each session with their timestamp. I found postgresql pg_stat_statements very useful for this need, but the aggregation did not always permitted me to analyze correctly the queries issued (at least for a buffer stats per query overview). So, I enhanced the existing pg_stat_statements.

Changes overview :
 - new configuration pg_stat_statements.track_every = (TRUE|FALSE)
 -> generating per query data in a new view : pg_stat_sql
 -> can be resetted by using : pg_stat_sql_reset(userid,dbid,queryid)
 - added the timestamp per query to the view (replacing the number of calls of pg_stat_statements)
 -> the view column itself :

userid oid,
dbid oid,
queryid bigint,
query text,
start timestamp, *
total_time float8,
rows int8,
shared_blks_hit int8,     
shared_blks_read int8,     
shared_blks_dirtied int8,     
shared_blks_written int8,     
local_blks_hit int8,     
local_blks_read int8,     
local_blks_dirtied int8,     
local_blks_written int8,     
temp_blks_read int8,     
temp_blks_written int8,     
blk_read_time float8,     
blk_write_time float8,     
wal_records int8,     
wal_fpi int8,     
wal_bytes numeric


 - data are stored in a hash in a new shared memory area. 
 - query texts are still stored in the same file.

The goal was to avoid generating too much data with track_every option enabled.

 - added some tests to the sql/pg_stat_statements.sql
 - added views to pg_stat_statements--1.7--1.8.sql

Bug fix :
 - with UTF8 encoding, the "\0" to delimit the end of the query text was buggy; modified to query[query_len]=0;

Note :
I didn't want to change version number by myself, the attached files are still pointing to 1.8
This is my first code for pgsql.


I wanted to share with you this enhancement, hope you'll find it useful.

--
Regards,
Yoan SULTAN
Attachment

Re: [PATCH] Enhancements to pg_stat_statements contrib extension

From
Julien Rouhaud
Date:
Hi,

On Sun, Mar 28, 2021 at 09:05:10AM +0200, Yoan SULTAN wrote:
> 
> *Needs :*
> Formerly, Oracle DBA, I used to query v$sql to know the latest queries
> issued by each session with their timestamp. I found postgresql
> pg_stat_statements very useful for this need, but the aggregation did not
> always permitted me to analyze correctly the queries issued (at least for a
> buffer stats per query overview). So, I enhanced the existing
> pg_stat_statements.

I'm not an Oracle DBA so I may be lacking some background, but I don't really
understand what this feature is about.  Is it about having the statistics for
the last query execution for each backend, or simply the last N queries
executed even if they're all from the same backend?  More details of what this
feature should do and how it should interact with existing version would be
welcome.

> *Changes overview :*
>  - new configuration pg_stat_statements.track_every = (TRUE|FALSE)
>  -> generating per query data in a new view : pg_stat_sql
>  -> can be resetted by using : pg_stat_sql_reset(userid,dbid,queryid)
>  - added the timestamp per query to the view (replacing the number of calls
> of pg_stat_statements)
>  -> the view column itself :
> 
> *userid oid,*
> *dbid oid,queryid bigint,query text,start timestamp,
> *total_time float8,rows int8,shared_blks_hit int8,
> shared_blks_read int8,     shared_blks_dirtied int8,
> shared_blks_written int8,     local_blks_hit int8,     local_blks_read
> int8,     local_blks_dirtied int8,     local_blks_written int8,
> temp_blks_read int8,     temp_blks_written int8,     blk_read_time
> float8,     blk_write_time float8,     wal_records int8,     wal_fpi
> int8,     wal_bytes numeric*
> 
> 
>  - data are stored in a hash in a new shared memory area.
>  - query texts are still stored in the same file.*

If this is about storing the last query (or N queries), shouldn't it be stored
on some kind of ring buffer rather than a hash table?  I also don't understand
what is the EVERY_FACTOR supposed to do, given that pgss_every_max seems to be
used to request more memory but then never used to actually store additional
entries in the hash table.

> *Bug fix :*
>  - with UTF8 encoding, the "\0" to delimit the end of the query text was
> buggy; modified to query[query_len]=0;

Do you have an example on how to reproduce that?

Also, if there's a bug it should be fixed outside of a new feature proposal.

> *Note :*
> I didn't want to change version number by myself, the attached files are
> still pointing to 1.8
> This is my first code for pgsql.

I tried to have a look at the patch but it's unfortunately a bit hard to do
with the current format.  You should instead submit a diff (e.g. with "git
diff") or a patch (git format-patch) to ease review, ideally with a version
number.  If you're looking for pointers you can look at
https://wiki.postgresql.org/wiki/Submitting_a_Patch, and for larger
documentation maybe
https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F and
https://wiki.postgresql.org/wiki/Developer_FAQ.