Re: pg_stat_statements: calls under-estimation propagation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: pg_stat_statements: calls under-estimation propagation |
Date | |
Msg-id | CAEYLb_Uic+mfDHokX0UWKSSx-5WcFzNjU5ngBJ-MAHOambL9Qw@mail.gmail.com Whole thread Raw |
In response to | pg_stat_statements: calls under-estimation propagation (Daniel Farina <drfarina@acm.org>) |
Responses |
Re: pg_stat_statements: calls under-estimation propagation
|
List | pgsql-hackers |
On 28 December 2012 11:43, Daniel Farina <drfarina@acm.org> wrote: > Without further ado, the cover letter taken from the top of the patch: > > This tries to establish a maximum under-estimate of the number of > calls for a given pg_stat_statements entry. That means the number of > calls to the canonical form of the query is between 'calls' and 'calls > + calls_underest'. Cool. One possible issue I see with this is that this code: + + /* propagate calls under-estimation bound */ + entry->counters.calls_underest = pgss->calls_max_underest; + which appears within entry_alloc(). So if the first execution of the query results in an error during planning or (much more likely) execution, as in the event of an integrity constraint violation, you're going to get a dead sticky entry that no one will ever see. Now, we currently decay sticky entries much more aggressively, so the mere fact that we waste an entry isn't a real problem. This was established by this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d5375491f8e391224b48e4bb449995a4642183ea However, with this approach, calls_underest values might appear to the user in what might be considered an astonishing order. Now, I'm not suggesting that that's a real problem - just that they may not be the semantics we want, particularly as we can reasonably defer assigning a calls_underest until a sticky entry is "unstuck", and an entry becomes user-visible, within pgss_store(). Also, it seems like you should initialise pgss->calls_max_underest, within pgss_shmem_startup(). You should probably serialise the value to disk, and initialise it to 0 if there is no such value to begin with. I wonder if the way that pg_stat_statements throws its hands up when it comes to crash safety (i.e. the contents of the hash table are completely lost) could be a concern here. In other words, a program tasked with tracking execution costs over time and graphing time-series data from snapshots has to take responsibility for ensuring that there hasn't been a crash (or, indeed, a reset). Another issue is that I don't think that what you've done here solves the problem of uniquely identify each entry over time, in the same way that simply exposing the hash value would. I'm concerned with the related problem to the problem solved here - simply identifying the entry uniquely. As we've already discussed, the query string is an imperfect proxy for each entry, even with constants swapped with '?' characters (and even when combined with the userid and dbid values - it's still not the same as the hashtable key, in a way that is likely to bother people that use pg_stat_statements for long enough). I think you probably should have created a PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module is now legacy, like *V1_0 is in HEAD. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: