Re: pg_stat_statements: calls under-estimation propagation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: pg_stat_statements: calls under-estimation propagation
Date
Msg-id CAEYLb_Uic+mfDHokX0UWKSSx-5WcFzNjU5ngBJ-MAHOambL9Qw@mail.gmail.com
Whole thread Raw
In response to pg_stat_statements: calls under-estimation propagation  (Daniel Farina <drfarina@acm.org>)
Responses Re: pg_stat_statements: calls under-estimation propagation
List pgsql-hackers
On 28 December 2012 11:43, Daniel Farina <drfarina@acm.org> wrote:
> Without further ado, the cover letter taken from the top of the patch:
>
> This tries to establish a maximum under-estimate of the number of
> calls for a given pg_stat_statements entry.  That means the number of
> calls to the canonical form of the query is between 'calls' and 'calls
> + calls_underest'.

Cool.

One possible issue I see with this is that this code:

+
+         /* propagate calls under-estimation bound */
+         entry->counters.calls_underest = pgss->calls_max_underest;
+

which appears within entry_alloc(). So if the first execution of the
query results in an error during planning or (much more likely)
execution, as in the event of an integrity constraint violation,
you're going to get a dead sticky entry that no one will ever see.
Now, we currently decay sticky entries much more aggressively, so the
mere fact that we waste an entry isn't a real problem. This was
established by this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d5375491f8e391224b48e4bb449995a4642183ea

However, with this approach, calls_underest values might appear to the
user in what might be considered an astonishing order. Now, I'm not
suggesting that that's a real problem - just that they may not be the
semantics we want, particularly as we can reasonably defer assigning a
calls_underest until a sticky entry is "unstuck", and an entry becomes
user-visible, within pgss_store().

Also, it seems like you should initialise pgss->calls_max_underest,
within pgss_shmem_startup(). You should probably serialise the value
to disk, and initialise it to 0 if there is no such value to begin
with.

I wonder if the way that pg_stat_statements throws its hands up when
it comes to crash safety (i.e. the contents of the hash table are
completely lost) could be a concern here. In other words, a program
tasked with tracking execution costs over time and graphing
time-series data from snapshots has to take responsibility for
ensuring that there hasn't been a crash (or, indeed, a reset).

Another issue is that I don't think that what you've done here solves
the problem of uniquely identify each entry over time, in the same way
that simply exposing the hash value would. I'm concerned with the
related problem to the problem solved here - simply identifying the
entry uniquely. As we've already discussed, the query string is an
imperfect proxy for each entry, even with constants swapped with '?'
characters (and even when combined with the userid and dbid values -
it's still not the same as the hashtable key, in a way that is likely
to bother people that use pg_stat_statements for long enough).

I think you probably should have created a
PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module
is now legacy, like *V1_0 is in HEAD.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: proposal: ANSI SQL 2011 syntax for named parameters
Next
From: Alvaro Herrera
Date:
Subject: Re: XLByte* usage