Thread: log_duration_sample config option patch

log_duration_sample config option patch

From
Timo Savola
Date:
Hello.  The attached patch has made it more feasible for us to gather
profiling data on a production system for analysis with pgFouine.  It
has been written against PostgreSQL 8.3.5 and tested on Linux.  Comments
welcome.

timo
>From a4d5b489f462ad31b62774ec98af08f184cb0754 Mon Sep 17 00:00:00 2001
From: Timo Savola <timo.savola@dynamoid.com>
Date: Thu, 22 Jan 2009 13:02:10 +0200
Subject: [PATCH] log_duration_sample config option

When used with the log_duration or the log_min_duration_statement option,
it specifies the percentage of statements to log using random sampling.
---
 src/backend/tcop/postgres.c  |   30 +++++++++++++++++++++++++++---
 src/backend/utils/misc/guc.c |   12 ++++++++++++
 src/include/utils/guc.h      |    2 ++
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 12ebce4..14f472a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -22,6 +22,9 @@
 #include <unistd.h>
 #include <signal.h>
 #include <fcntl.h>
+#include <byteswap.h>
+#include <endian.h>
+#include <sys/time.h>
 #include <sys/socket.h>
 #ifdef HAVE_SYS_SELECT_H
 #include <sys/select.h>
@@ -2001,6 +2004,8 @@ check_log_statement(List *stmt_list)
 int
 check_log_duration(char *msec_str, bool was_logged)
 {
+    int retval = 0;
+
     if (log_duration || log_min_duration_statement >= 0)
     {
         long        secs;
@@ -2028,13 +2033,32 @@ check_log_duration(char *msec_str, bool was_logged)
             snprintf(msec_str, 32, "%ld.%03d",
                      secs * 1000 + msecs, usecs % 1000);
             if (exceeded && !was_logged)
-                return 2;
+                retval = 2;
             else
-                return 1;
+                retval = 1;
+        }
+
+        if (retval > 0 && log_duration_sample < 1.0)
+        {
+            static uint64_t seed;
+
+            if (seed == 0)
+            {
+                struct timeval tv;
+
+                gettimeofday(&tv, NULL);
+                seed = (uint64_t) tv.tv_sec * 1000000 + tv.tv_usec;
+#if BYTE_ORDER != LITTLE_ENDIAN
+                bswap_64(seed);
+#endif
+            }
+
+            if (erand48((unsigned short *) &seed) > log_duration_sample)
+                retval = 0;
         }
     }

-    return 0;
+    return retval;
 }

 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b1c6e85..1e76cd5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -225,6 +225,8 @@ int            tcp_keepalives_idle;
 int            tcp_keepalives_interval;
 int            tcp_keepalives_count;

+double        log_duration_sample = 1.0;
+
 /*
  * These variables are all dummies that don't do anything, except in some
  * cases provide the value for SHOW to display.  The real state is elsewhere
@@ -1884,6 +1886,16 @@ static struct config_real ConfigureNamesReal[] =
         0.5, 0.0, 1.0, NULL, NULL
     },

+    {
+        {"log_duration_sample", PGC_SUSET, LOGGING_WHEN,
+            gettext_noop("Sets the sampling frequency (probability) used for "
+                         "logging statement durations."),
+            gettext_noop("1.0 prints all queries.")
+        },
+        &log_duration_sample,
+        1.0, 0.0, 1.0, NULL, NULL
+    },
+
     /* End-of-list marker */
     {
         {NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 49b88f5..fde6a26 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -145,6 +145,8 @@ extern int    tcp_keepalives_idle;
 extern int    tcp_keepalives_interval;
 extern int    tcp_keepalives_count;

+extern double log_duration_sample;
+
 extern void SetConfigOption(const char *name, const char *value,
                 GucContext context, GucSource source);

--
1.5.6.5


log_duration_sample config option patch

From
Timo Savola
Date:
Sorry for the attachment; here's the patch inline.

timo



Re: log_duration_sample config option patch

From
Euler Taveira de Oliveira
Date:
Timo Savola escreveu:
> Hello.  The attached patch has made it more feasible for us to gather
> profiling data on a production system for analysis with pgFouine.  It
> has been written against PostgreSQL 8.3.5 and tested on Linux.  Comments
> welcome.
> 
IIRC pgFouine shows exact percentage of query by type; this new GUC implies we
can not rely on it anymore. Also, you could skip us from logging long (and
bad) running queries. Another statistics data in pgFouine will suffer from the
same problem. I see your point in reducing amount of log data but I think it
will limit the tool usability (you can always start/stop statement collection
at runtime).


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: log_duration_sample config option patch

From
Timo Savola
Date:
Euler Taveira de Oliveira wrote:

> IIRC pgFouine shows exact percentage of query by type; this new GUC implies we
> can not rely on it anymore. Also, you could skip us from logging long (and
> bad) running queries. Another statistics data in pgFouine will suffer from the
> same problem. I see your point in reducing amount of log data but I think it
> will limit the tool usability (you can always start/stop statement collection
> at runtime).

The accuracy of numbers suffers - yes.  But this option basically gives 
you the choice of logging all queries for a short time (leaving out 
queries before and after that time period) or some queries for a long 
time (leaving out randomly selected queries in between).  If you're 
interested in a bigger picture, the numbers will be more meaningful. 
(Also, if the number of queries is high enough for someone to use this 
option, it means that there will be a lot of samples anyway.)

For example, it might be more useful to log 1% of queries during the 
peak hours of each day of the month, than all queries during the peak 
hours of one day.  Or you could log 0.01% of all queries always... 
Especially queries which are executed many times can be found just as 
well with this approach, and chronically slow queries will eventually 
pop up also.  But if someone wants to specifically track any and all 
long queries, then there's the option to not enable log_duration_sample 
but to set log_min_duration_statement to something appropriate instead.

Another point is that during the time of the biggest load (= the most 
interesting time) full logging would increase I/O load and bias the 
results (and degrade the service quality).  By spreading out the load, 
the bias is smaller and the logging can actually be enabled at all 
during peak hours.

Thanks for the feedback.

timo