Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Date
Msg-id CAEYLb_WuSdWferKgx3WZWbPLLTvpKkH315+nA7UnyP3PtF4gSQ@mail.gmail.com
Whole thread Raw
In response to pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
List pgsql-hackers
On 28 March 2012 15:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is there any actual benefit in providing the
> "pg_stat_statements.string_key" GUC?  It looks to me more like something
> that was thrown in because it was easy than because anybody would want
> it.  I'd just as soon leave it out and avoid the incremental API
> complexity increase.  (While on that subject, I see no documentation
> updates in the patch...)

Personally, I don't care for it, and I'm sure most users wouldn't
either, but I thought that someone somewhere might be relying on the
existing behaviour.

I will produce a doc-patch. It would have been premature to do so
until quite recently.

> Also, I'm not terribly happy with the "sticky entries" hack.  The way
> you had it set up with a 1e10 bias for a sticky entry was completely
> unreasonable IMO, because if the later pgss_store call never happens
> (which is quite possible if the statement contains an error detected
> during planning or execution), that entry is basically never going to
> age out, and will just uselessly consume a precious table slot for a
> long time.  In the extreme, somebody could effectively disable query
> tracking by filling the hashtable with variants of "SELECT 1/0".
> The best quick answer I can think of is to reduce USAGE_NON_EXEC_STICK
> to maybe 10 or so, but I wonder whether there's some less klugy way to
> get the result in the first place.  I thought about keeping the
> canonicalized query string around locally in the backend rather than
> having the early pgss_store call at all, but am not sure it's worth
> the complexity of an additional local hashtable or some such to hold
> such pending entries.

I was troubled by that too, and had considered various ways of at
least polishing the kludge. Maybe a better approach would be to start
with a usage of 1e10 (or something rather high, anyway), but apply a
much more aggressive multiplier than USAGE_DECREASE_FACTOR for sticky
entries only? That way, in earlier calls of entry_dealloc() the sticky
entries, easily identifiable as having 0 calls, are almost impossible
to evict, but after a relatively small number of calls they soon
become more readily evictable.

This seems better than simply having some much lower usage that is
only a few times the value of USAGE_INIT.

Let's suppose we set sticky entries to have a usage value of 10. If
all other queries have more than 10 calls, which is not unlikely
(under the current usage format, 1.0 usage = 1 call, at least until
entry_dealloc() dampens usage) then when we entry_dealloc(), the
sticky entry might as well have a usage of 1, and has no way of
increasing its usage short of becoming a "real" entry.

On the other hand, with the multiplier trick, how close the sticky
entry is to eviction is, importantly, far more strongly influenced by
the number of entry_dealloc() calls, which in turn is influenced by
churn in the system, rather than being largely influenced by how the
magic sticky usage value happens to compare to those usage values of
some random set of "real" entries on some random database. If entries
really are precious, then the sticky entry is freed much sooner. If
not, then why not allow the sticky entry to stick around pending its
execution/ promotion to a "real" entry?

It would probably be pretty inexpensive to maintain what is currently
the largest usage value in the hash table at entry_dealloc() time -
that would likely be far more suitable than 1e10, and might even work
well. We could perhaps cut that in half every entry_dealloc().

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Finer Extension dependencies
Next
From: Peter Eisentraut
Date:
Subject: Re: [COMMITTERS] pgsql: Remove dead assignment