Re: idea: log_statement_sample_rate - bottom limit for sampling - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: idea: log_statement_sample_rate - bottom limit for sampling
Date
Msg-id 20190727221948.irg6sfqh57dynoc7@development
Whole thread Raw
In response to Re: idea: log_statement_sample_rate - bottom limit for sampling  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: idea: log_statement_sample_rate - bottom limit for sampling
Re: idea: log_statement_sample_rate - bottom limit for sampling
List pgsql-hackers
Hi,

I've started reviewing this patch, thinking that maybe I could get it
committed as it's marked as RFC. In general I agree with having this
fuature, but I think we need to rethink the GUC because the current
approach is just confusing.

The way the current patch works is that we have three GUCs:

  log_min_duration_statement
  log_statement_sample_limit
  log_statement_sample_rate

and it essentially works like this:

- If the duration exceeds log_min_duration_statement, we start sampling
  the commands with log_statement_sample rate.

- If the duration exceeds log_statement_sample_limit, we just log the
  command every time (i.e. we disable sampling, using sample rate 1.0).

IMO that's bound to be confusing for users, because one threshold
behaves as minimum while the other behaves as maximum.


What I think we should do instead is to use two minimum thresholds.

1) log_min_duration_sample - enables sampling of commands, using the
existing GUC log_statement_sample_rate

2) log_min_duration_statement - logs all commands exceeding this


I think this is going to be much easier for users to understand.


The one difference between those approaches is in how they work with
existing current settings. That is, let's say you have

  log_min_duration_statement = 1000
  log_statement_sample_rate = 0.01

then no queries below 1000ms will be logged, and 1% of longer queries
will be sampled. And with the original config (as proposed in v3 of the
patch), this would still work the same way.

With the new approach (two min thresholds) it'd behave differently,
because we'd log *all* queries longer than 1000ms (not just 1%). And
whether we'd sample any queries (using log_statement_sample_rate) would
depend on how we'd pick the default value for the other threshold.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: tap tests driving the database via psql
Next
From: Tom Lane
Date:
Subject: Re: Testing LISTEN/NOTIFY more effectively