Re: Less than ideal error reporting in pg_stat_statements - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Less than ideal error reporting in pg_stat_statements
Date
Msg-id 32385.1443976035@sss.pgh.pa.us
Whole thread Raw
In response to Re: Less than ideal error reporting in pg_stat_statements  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Less than ideal error reporting in pg_stat_statements  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
Peter Geoghegan <pg@heroku.com> writes:
> Attached, revised patch deals with the issues around removing the
> query text file when garbage collection encounters trouble. There is
> no reason to be optimistic about any error within gc_qtexts() not
> recurring during a future garbage collection. OOM might be an
> exception, but probably not, since gc_qtexts() is reached when a new
> entry is created with a new query text, which in general makes OOM
> progressively more likely.

Hm.  I'm unconvinced by the aspects of this that involve using
mean_query_len as a filter on which texts will be accepted.  While that's
not necessarily bad in the abstract, there are way too many implementation
artifacts here that will result in random-seeming decisions about whether
to normalize.  For instance:

* mean_query_len only gets recomputed in entry_dealloc(), which is only
run if we exceed pgss_max, and gc_qtexts(), which is only run if we decide
the query texts file is more than 50% bloat.  So there could be quite a
long startup transient before the value gets off its initialization
minimum, and I'm suspicious that there might be plausible use-cases where
it never does.  So it's not so much "restrict to a multiple of the mean
query len" as "restrict to some number that might once upon a time have
had some relation to the mean query len, or maybe not".

* One could expect that after changing mean_query_len, the population of
query texts would change character as a result of the filter behavior
changing, so that convergence to stable behavior over the long haul is
not exactly self-evident.

* As you've got it here, entry_dealloc() and gc_qtexts() don't compute
mean_query_len the same way, because only one of them discriminates
against sticky entries.  So the value would bounce about rather randomly
based on which one had run last.

* I'm not exactly convinced that sticky entries should be ignored for
this purpose anyway.


Taking a step back, ISTM the real issue you're fighting here is lots of
orphaned sticky entries, but the patch doesn't do anything directly to fix
that problem.  I wonder if we should do something like making
entry_dealloc() and/or gc_qtexts() aggressively remove sticky entries,
or at least those with "large" texts.

I think the aspects of this patch that are reasonably uncontroversial are
increasing the allowed malloc attempt size in gc_qtexts, flushing the
query text file on malloc failure, fixing the missing cleanup steps after
a gc failure, and making entry_dealloc's recomputation of mean_query_len
sane (which I'll define for the moment as "the same as gc_qtexts would
get").  Since we're hard against a release deadline, I propose to commit
just those changes, and we can consider the idea of a query size filter
and/or redefining mean_query_len at leisure.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: check fails on Fedora 23
Next
From: Tom Lane
Date:
Subject: Re: Odd query execution behavior with extended protocol