Thread: Less than ideal error reporting in pg_stat_statements

Less than ideal error reporting in pg_stat_statements

From

Jim Nasby

Date:

23 September 2015, 01:17:14

A client was getting some hard to diagnose out of memory errors. What
made this especially confusing was that there was no context reported at
all, other than the (enormous) statement that triggered the error.

At first I thought the lack of context indicated a palloc had failed
during ereport() (since we apparently just toss the previous error when
that happens), but it turns out there's some error reporting in
pg_stat_statements that's less than ideal. Attached patch fixes, though
I'm not sure if %lld is portable or not.

I'll also argue that this is a bug and should be backpatched, but I'm
not hell-bent on that.

At the same time I looked for other messages that don't explicitly
reference pg_stat_statements; the only others are in
pg_stat_statements_internal() complaining about being called in an
inappropriate function context. Presumably at that point there's a
reasonable error context stack so I didn't bother with them.

This still seems a bit fragile to me though. Anyone working in here has
to notice that most every errmsg mentions pg_stat_statements and decide
there's a good reason for that. ISTM it'd be better to push a new
ErrorContextCallback onto the stack any time we enter the module. If
folks think that's a good idea I'll pursue it as a separate patch.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment

patch.diff

Re: Less than ideal error reporting in pg_stat_statements

From

David Rowley

Date:

23 September 2015, 01:48:49

On 23 September 2015 at 10:16, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Attached patch fixes, though I'm not sure if %lld is portable or not.

I think you could probably use INT64_FORMAT, and cast the stat.st_size to int64 too.

There's an example in FileRead() in fd.c

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

23 September 2015, 01:58:57

On Tue, Sep 22, 2015 at 3:16 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> At first I thought the lack of context indicated a palloc had failed during
> ereport() (since we apparently just toss the previous error when that
> happens), but it turns out there's some error reporting in
> pg_stat_statements that's less than ideal. Attached patch fixes, though I'm
> not sure if %lld is portable or not.

+ ereport(LOG,
+              (errcode(ERRCODE_OUT_OF_MEMORY),
+               errmsg("out of memory attempting to pg_stat_statement file"),
+               errdetail("file \"%s\": size %lld", PGSS_TEXT_FILE,
stat.st_size)));

Uh, what?

I'm not opposed to this basic idea, but I think the message should be
reworded, and that the presence of two separate ereport() call sites
like the above is totally unnecessary. The existing MaxAllocSize check
is just defensive; no user-visible distinction needs to be made.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Jim Nasby

Date:

23 September 2015, 02:27:35

On 9/22/15 5:58 PM, Peter Geoghegan wrote:
> On Tue, Sep 22, 2015 at 3:16 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> At first I thought the lack of context indicated a palloc had failed during
>> ereport() (since we apparently just toss the previous error when that
>> happens), but it turns out there's some error reporting in
>> pg_stat_statements that's less than ideal. Attached patch fixes, though I'm
>> not sure if %lld is portable or not.
>
> + ereport(LOG,
> +              (errcode(ERRCODE_OUT_OF_MEMORY),
> +               errmsg("out of memory attempting to pg_stat_statement file"),
> +               errdetail("file \"%s\": size %lld", PGSS_TEXT_FILE,
> stat.st_size)));
>
> Uh, what?

Oops. I'll fix that and address David's concern tomorrow.

> I'm not opposed to this basic idea, but I think the message should be
> reworded, and that the presence of two separate ereport() call sites
> like the above is totally unnecessary. The existing MaxAllocSize check
> is just defensive; no user-visible distinction needs to be made.

I disagree. If you're running this on a 200+GB machine with plenty of 
free memory and get that error you're going to be scratching your head. 
I can see an argument for using the OOM SQLSTATE, but treating an 
artificial limit the same as a system error seems pretty bogus.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

23 September 2015, 02:38:15

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 23 September 2015 at 10:16, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>> Attached patch fixes, though I'm not sure if %lld is portable or not.

It is not.

> I think you could probably use INT64_FORMAT,

Not in a message you expect to be translatable.

There are ways around that, but TBH I do not think that including the file
size in the errdetail is valuable enough to be worth the trouble.  I'd
just leave it out.  "insufficient memory to load statement file" seems
quite enough.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

23 September 2015, 02:40:38

Peter Geoghegan <pg@heroku.com> writes:
> I'm not opposed to this basic idea, but I think the message should be
> reworded, and that the presence of two separate ereport() call sites
> like the above is totally unnecessary. The existing MaxAllocSize check
> is just defensive; no user-visible distinction needs to be made.

I wonder whether the real problem here is failure to truncate statement
texts to something sane.  Do we really need to record the whole text of
multi-megabyte statements?  Especially if doing so could render the entire
feature nonfunctional?
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

23 September 2015, 03:01:47

On Tue, Sep 22, 2015 at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wonder whether the real problem here is failure to truncate statement
> texts to something sane.  Do we really need to record the whole text of
> multi-megabyte statements?  Especially if doing so could render the entire
> feature nonfunctional?

I recently encountered a 9.4 customer database that had an insanely
large query text stored by pg_stat_statements, apparently created as
part of a process of kicking the tires of their new installation. I
don't know how large it actually was, but it caused psql to stall for
over 10 seconds. Insane queries happen, so truncating query text could
conceal the extent of how unreasonable a query is.

I think that the real problem here is that garbage collection needs to
deal with OOM more appropriately. That's the only way there could be a
problem with an in-flight query as opposed to a query that looks at
pg_stat_statements, which seems to be Nasby's complaint.

My guess is that this very large query involved a very large number of
constants, possibly contained inside an " IN ( )". Slight variants of
the same query, that a human would probably consider to be equivalent
have caused artificial pressure on garbage collection.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

23 September 2015, 04:02:00

On Tue, Sep 22, 2015 at 5:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
> My guess is that this very large query involved a very large number of
> constants, possibly contained inside an " IN ( )". Slight variants of
> the same query, that a human would probably consider to be equivalent
> have caused artificial pressure on garbage collection.

I could write a patch to do compaction in-place. The basic idea is
that there'd be a slow path in the event of an OOM-like condition
(i.e. an actual OOM, or when the MaxAllocSize limitation is violated)
that first scans through entries, and determines the exact required
buffer size for every non-garbage query text. As this
iteration/scanning occurs, the entries' offsets in shared memory are
rewritten assuming that the first entry starts at 0, the second at 0 +
length of first + 1 (for NUL sentinal byte), and so on. We then
allocate a minimal buffer, lseek() and copy into the buffer, so that
the expectation of finding query texts at those offsets is actually
met. Finally, unlink() old file, create new one, and write new buffer
out. I think I wanted to do things that way originally.

If even that exact, minimal buffer size cannot be allocated, then ISTM
that the user is out of luck. That will be very rare in practice, but
should it occur we log the issue and give up on storing query texts
entirely, so as to avoid thrashing while still giving the user
something to go on. This new code path is never hit until a garbage
collection is required, so hopefully the garbage created was not a
pathological issue with a weird workload, but rather something that
will not recur for a very long time.

That seems to me to be better than getting into the business of
deciding how long of a query text is too long.

I'm doubtful that this had anything to do with MaxAllocSize. You'd
certainly need a lot of bloat to be affected by that in any way. I
wonder how high pg_stat_statements.max was set to on this system, and
how long each query text was on average.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Alvaro Herrera

Date:

23 September 2015, 04:56:10

Peter Geoghegan wrote:

> My guess is that this very large query involved a very large number of
> constants, possibly contained inside an " IN ( )". Slight variants of
> the same query, that a human would probably consider to be equivalent
> have caused artificial pressure on garbage collection.

So if I have multiple queries like

SELECT foo FROM bar WHERE baz IN (a, b)
SELECT foo FROM bar WHERE baz IN (a, b, c)

they are not normalized down to the same?  That seems odd.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

23 September 2015, 04:57:49

On Tue, Sep 22, 2015 at 6:55 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> So if I have multiple queries like
>
> SELECT foo FROM bar WHERE baz IN (a, b)
> SELECT foo FROM bar WHERE baz IN (a, b, c)
>
> they are not normalized down to the same?  That seems odd.

Yes, although in practice it's usually down to a variable number of
constants appearing within the "IN ( )", which is more odd IMV.

We discussed changing this before. I don't have strong feelings either way.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Jim Nasby

Date:

23 September 2015, 23:41:40

On 9/22/15 8:01 PM, Peter Geoghegan wrote:
> I'm doubtful that this had anything to do with MaxAllocSize. You'd
> certainly need a lot of bloat to be affected by that in any way. I
> wonder how high pg_stat_statements.max was set to on this system, and
> how long each query text was on average.

max was set to 10000. I don't know about average query text size, but 
the command that was causing the error was a very large number of 
individual INSERT ... VALUES statements all in one command.

The machine had plenty of free memory and no ulimit, so I don't see how 
this could have been anything but MaxAllocSize, unless there's some 
other failure mode in malloc I don't know about.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Less than ideal error reporting in pg_stat_statements

From

Jim Nasby

Date:

23 September 2015, 23:43:27

On 9/22/15 6:27 PM, Jim Nasby wrote:
>> + ereport(LOG,
>> +              (errcode(ERRCODE_OUT_OF_MEMORY),
>> +               errmsg("out of memory attempting to pg_stat_statement
>> file"),
>> +               errdetail("file \"%s\": size %lld", PGSS_TEXT_FILE,
>> stat.st_size)));
>>
>> Uh, what?
>
> Oops. I'll fix that and address David's concern tomorrow.

New patch attached. I stripped the size reporting out and simplified the
conditionals a bit as well.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment

patch.diff

Re: Less than ideal error reporting in pg_stat_statements

From

Marti Raudsepp

Date:

25 September 2015, 18:52:12

On Wed, Sep 23, 2015 at 3:01 AM, Peter Geoghegan <pg@heroku.com> wrote:
> I think that the real problem here is that garbage collection needs to
> deal with OOM more appropriately.

+1

I've also been seeing lots of log messages saying "LOG:  out of
memory" on a server that's hosting development databases. I put off
debugging this until now because it didn't seem to have any adverse
effects on the system.

The file on my system is currently 5.1GB (!). I don't know how it got
there -- under normal circumstances we don't have any enormous
queries, but perhaps our application bugs during development triggered
that.

The configuration on this system is pg_stat_statements.max = 10000 and
pg_stat_statements.track = all.

----
The comment near gc_qtexts says:
* This won't be called often in the typical case, since it's likely that
* there won't be too much churn, and besides, a similar compaction process
* occurs when serializing to disk at shutdown or as part of resetting.
* Despite this, it seems prudent to plan for the edge case where the file
* becomes unreasonably large, with no other method of compaction likely to
* occur in the foreseeable future.
[...]
* Load the old texts file.  If we fail (out of memory, for instance) just
* skip the garbage collection.

So, as I understand it: if the system runs low on memory for an
extended period, and/or the file grows beyond 1GB (MaxAlloc), garbage
collection stops entirely, meaning it starts leaking disk space until
a manual intervention.

It's very frustrating when debugging aides cause further problems on a
system. If the in-line compaction doesn't materialize (or it's decided
not to backport it), I would propose instead to add a test to
pgss_store() to avoid growing the file beyond MaxAlloc (or perhaps
even a lower limit). Surely dropping some statistics is better than
this pathology.

Regards,
Marti

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

25 September 2015, 21:37:47

On Fri, Sep 25, 2015 at 8:51 AM, Marti Raudsepp <marti@juffo.org> wrote:
> I've also been seeing lots of log messages saying "LOG:  out of
> memory" on a server that's hosting development databases. I put off
> debugging this until now because it didn't seem to have any adverse
> effects on the system.
>
> The file on my system is currently 5.1GB (!). I don't know how it got
> there -- under normal circumstances we don't have any enormous
> queries, but perhaps our application bugs during development triggered
> that.

It could be explained by legitimate errors during planning, for
example. The query text is still stored.

> So, as I understand it: if the system runs low on memory for an
> extended period, and/or the file grows beyond 1GB (MaxAlloc), garbage
> collection stops entirely, meaning it starts leaking disk space until
> a manual intervention.

I don't think that there is much more to discuss here: this is a bug.
I will try and write a patch to fix it shortly. It will be
non-trivial, and I'm quite busy right now, so it might take a while. A
short-term remediation is to call pg_stat_statements_reset() on
systems affected like this.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

25 September 2015, 21:48:40

On Fri, Sep 25, 2015 at 11:37 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> So, as I understand it: if the system runs low on memory for an
>> extended period, and/or the file grows beyond 1GB (MaxAlloc), garbage
>> collection stops entirely, meaning it starts leaking disk space until
>> a manual intervention.
>
> I don't think that there is much more to discuss here: this is a bug.
> I will try and write a patch to fix it shortly.

I should add that it only leaks disk space at the rate at which new
queries are observed that are not stored within pg_stat_statements
(due to an error originating in the planner or something -- they
remain "sticky" entries). The reason we've not heard far more problem
reports is that it usually never gets out of hand in the first place.

Come to think of it, you'd have to repeatedly have new queries that
are never "unstickied"; if you have substantively the same query as an
error-during-planning "sticky" entry, it will still probably be able
to use that existing entry (it will become "unstickied" by this second
execution of what the fingerprinting logic considers to be the same
query).

In short, you have to have just the right workload to hit the bug.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

25 September 2015, 21:55:56

On Tue, Sep 22, 2015 at 6:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I'm doubtful that this had anything to do with MaxAllocSize. You'd
> certainly need a lot of bloat to be affected by that in any way. I
> wonder how high pg_stat_statements.max was set to on this system, and
> how long each query text was on average.

To clarify: I think it probably starts off not having much to do with
MaxAllocSize. However, it might well be the case that transient memory
pressure results in the problematic code path hitting the MaxAllocSize
imitation. So it starts with malloc() returning NULL, which
temporarily blocks garbage collection, but in bad cases the
MaxAllocSize limitation becomes a permanent barrier to performing a
garbage collection (without a manual intervention).

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

29 September 2015, 23:56:43

On Fri, Sep 25, 2015 at 8:51 AM, Marti Raudsepp <marti@juffo.org> wrote:
> I've also been seeing lots of log messages saying "LOG:  out of
> memory" on a server that's hosting development databases. I put off
> debugging this until now because it didn't seem to have any adverse
> effects on the system.

That's unfortunate.

> It's very frustrating when debugging aides cause further problems on a
> system. If the in-line compaction doesn't materialize (or it's decided
> not to backport it), I would propose instead to add a test to
> pgss_store() to avoid growing the file beyond MaxAlloc (or perhaps
> even a lower limit). Surely dropping some statistics is better than
> this pathology.

I heard a customer complaint today that seems similar. A Heroku
customer attempted a migration from MySQL to PostgreSQL in this
manner:

mysqldump | psql

This at least worked well enough to cause problems for
pg_stat_statements (some queries were not rejected by the parser, I
suppose).

While I'm opposed to arbitrary limits for tools like
pg_stat_statements, I think the following defensive measure might be
useful on top of better OOM defenses:

Test for query text length within pgss_store() where a pgssJumbleState
is passed by caller (the post-parse-analysis hook caller -- not
executor hook caller that has query costs to store). If it is greater
than, say, 10 * Max(ASSUMED_MEDIAN_INIT, pgss->cur_median_usage), do
not bother to normalize the query text, or store anything at all.
Simply return.

Any entry we create at that point will be a sticky entry (pending
actually completing execution without the entry being evicted), and it
doesn't seem worth worrying about normalizing very large query texts,
which tend to be qualitatively similar to utility statements from the
user's perspective anyway. Besides, query text normalization always
occurred on a best-effort basis. It's not very uncommon for
pg_stat_statements to fail to normalize query texts today for obscure
reasons.

This would avoid storing very large query texts again and again when a
very large DML statement repeatedly fails (e.g. due to a data
integration task that can run into duplicate violations) and is
repeatedly rerun with tweaks. Maybe there is a substantively distinct
table in each case, because a CREATE TABLE is performed as part of the
same transaction, so each failed query gets a new pg_stat_statements
entry, and a new, large query text.

I should probably also assume that sticky entries have a generic
length (existing pgss->mean_query_len) for the purposes of
accumulating query text length within entry_dealloc(). They should not
get to contribute to median query length (which throttles garbage
collection to prevent thrashing).

Anyone have an opinion on that?

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 00:05:08

On Tue, Sep 22, 2015 at 6:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Sep 22, 2015 at 5:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> My guess is that this very large query involved a very large number of
>> constants, possibly contained inside an " IN ( )". Slight variants of
>> the same query, that a human would probably consider to be equivalent
>> have caused artificial pressure on garbage collection.
>
> I could write a patch to do compaction in-place.

In the end, I decided on a simpler approach to fixing this general
sort of problem with the attached patch. See commit message for
details.

I went this way not because compaction in-place was necessarily a bad
idea, but because I think that a minimal approach will work just as
well in real world cases.

It would be nice to get this committed before the next point releases
are tagged, since I've now heard a handful of complaints like this.

--
Peter Geoghegan

Attachment

0001-Fix-pg_stat_statements-garbage-collection-bugs.patch

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 00:25:43

On Fri, Oct 2, 2015 at 2:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
> It would be nice to get this committed before the next point releases
> are tagged, since I've now heard a handful of complaints like this.

I grep'd for SIZE_MAX to make sure that that was something that is
available on all supported platforms, since it's C99. What I
originally thought was code now turns out to actually be a code-like
comment within aset.c.

I think that SIZE_MAX should be replaced by MaxAllocHugeSize before
the patch is committed. That should be perfectly portable.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

03 October 2015, 00:29:47

Peter Geoghegan <pg@heroku.com> writes:
> It would be nice to get this committed before the next point releases
> are tagged, since I've now heard a handful of complaints like this.

I'm not too impressed with this bit:    /* Allocate buffer; beware that off_t might be wider than size_t */
-    if (stat.st_size <= MaxAllocSize)
+    if (stat.st_size <= SIZE_MAX)        buf = (char *) malloc(stat.st_size);

because there are no, zero, not one uses of SIZE_MAX in our code today,
and I do not see such a symbol required by the POSIX v2 spec either.
Perhaps this will work, but you're asking us to introduce a brand new
portability hazard just hours before a wrap deadline.  That is not
happening.

Other than that, this seems roughly sane, though I've not read it in
detail or tested it.  Does anyone have an objection to trying to squeeze
in something along this line?
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 00:31:02

On Fri, Oct 2, 2015 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not too impressed with this bit:
>
>         /* Allocate buffer; beware that off_t might be wider than size_t */
> -       if (stat.st_size <= MaxAllocSize)
> +       if (stat.st_size <= SIZE_MAX)
>                 buf = (char *) malloc(stat.st_size);
>
> because there are no, zero, not one uses of SIZE_MAX in our code today,
> and I do not see such a symbol required by the POSIX v2 spec either.

See my remarks just now. This can easily be fixed.


-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

03 October 2015, 00:42:45

Peter Geoghegan <pg@heroku.com> writes:
> I think that SIZE_MAX should be replaced by MaxAllocHugeSize before
> the patch is committed. That should be perfectly portable.

Hmm ... only back to 9.4, but I guess that's far enough.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 01:57:55

On Fri, Oct 2, 2015 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> I think that SIZE_MAX should be replaced by MaxAllocHugeSize before
>> the patch is committed. That should be perfectly portable.
>
> Hmm ... only back to 9.4, but I guess that's far enough.

I just realized that the existing gc_fail handler label within
gc_qtexts() lacks something like this, too:

unlink(PGSS_TEXT_FILE);
(void) AllocateFile(PGSS_TEXT_FILE, PG_BINARY_W);
SpinLockAcquire(&s->mutex);
pgss->extent = 0;
SpinLockRelease(&s->mutex);

I think it should do this anyway, but it makes particular sense in
light of the proposed changes. All existing failure cases within
gc_qtexts() seem like a good reason to give up forever.

The spinlock acquisition above is actually necessary despite the
n_writers trick, because that's only used by qtext_store().

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 02:01:31

On Fri, Oct 2, 2015 at 3:57 PM, Peter Geoghegan <pg@heroku.com> wrote:
> (void) AllocateFile(PGSS_TEXT_FILE, PG_BINARY_W);

Naturally, a FreeFile() call will also be required for any
successfully allocated file.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 02:11:35

On Fri, Oct 2, 2015 at 3:57 PM, Peter Geoghegan <pg@heroku.com> wrote:
> The spinlock acquisition above is actually necessary despite the
> n_writers trick, because that's only used by qtext_store().

Actually, isn't that another bug? The fact that we don't do the same
from within gc_qtexts() in normal cases, even with an exclusive lock
held? We do this:

/* Reset the shared extent pointer */
pgss->extent = extent;

I saw one really weird case on a customer database, with an enormous
although totally repetitive query text and one entry total (I
mentioned this in passing up-thread). Although I'd be willing to
believe it was just a very odd use of the database, since apparently
they were doing some kind of stress-test, perhaps it could be better
explained by a bug like this.

To recap, for other people: pg_stat_statements_internal() may do this
without any shared lock held:

/* No point in loading file now if there are active writers */
if (n_writers == 0)   qbuffer = qtext_load_file(&qbuffer_size);

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 02:24:04

On Fri, Oct 2, 2015 at 4:11 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Actually, isn't that another bug? The fact that we don't do the same
> from within gc_qtexts() in normal cases, even with an exclusive lock
> held? We do this:

Ah, no. We check pgss->gc_count in any case, so it should be fine.
That will also make it safe to do the unlink() as outlined already,
because a new qtext_load_file() call from
pg_stat_statements_internal() (due to gc_count bump) will allocate the
file again by name.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

03 October 2015, 02:27:41

... do you want to produce an updated patch?  I'm not planning to look at
it until tomorrow anyway.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 02:37:33

On Fri, Oct 2, 2015 at 4:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... do you want to produce an updated patch?  I'm not planning to look at
> it until tomorrow anyway.

I'll post a new patch by about midday tomorrow west coast time.
Hopefully that works for you.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

03 October 2015, 21:37:42

On Fri, Oct 2, 2015 at 4:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... do you want to produce an updated patch?  I'm not planning to look at
> it until tomorrow anyway.

Attached, revised patch deals with the issues around removing the
query text file when garbage collection encounters trouble. There is
no reason to be optimistic about any error within gc_qtexts() not
recurring during a future garbage collection. OOM might be an
exception, but probably not, since gc_qtexts() is reached when a new
entry is created with a new query text, which in general makes OOM
progressively more likely.

Thanks for accommodating me here.

--
Peter Geoghegan

Attachment

0001-Fix-pg_stat_statements-garbage-collection-bugs.patch

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

04 October 2015, 19:27:31

Peter Geoghegan <pg@heroku.com> writes:
> Attached, revised patch deals with the issues around removing the
> query text file when garbage collection encounters trouble. There is
> no reason to be optimistic about any error within gc_qtexts() not
> recurring during a future garbage collection. OOM might be an
> exception, but probably not, since gc_qtexts() is reached when a new
> entry is created with a new query text, which in general makes OOM
> progressively more likely.

Hm.  I'm unconvinced by the aspects of this that involve using
mean_query_len as a filter on which texts will be accepted.  While that's
not necessarily bad in the abstract, there are way too many implementation
artifacts here that will result in random-seeming decisions about whether
to normalize.  For instance:

* mean_query_len only gets recomputed in entry_dealloc(), which is only
run if we exceed pgss_max, and gc_qtexts(), which is only run if we decide
the query texts file is more than 50% bloat.  So there could be quite a
long startup transient before the value gets off its initialization
minimum, and I'm suspicious that there might be plausible use-cases where
it never does.  So it's not so much "restrict to a multiple of the mean
query len" as "restrict to some number that might once upon a time have
had some relation to the mean query len, or maybe not".

* One could expect that after changing mean_query_len, the population of
query texts would change character as a result of the filter behavior
changing, so that convergence to stable behavior over the long haul is
not exactly self-evident.

* As you've got it here, entry_dealloc() and gc_qtexts() don't compute
mean_query_len the same way, because only one of them discriminates
against sticky entries.  So the value would bounce about rather randomly
based on which one had run last.

* I'm not exactly convinced that sticky entries should be ignored for
this purpose anyway.

Taking a step back, ISTM the real issue you're fighting here is lots of
orphaned sticky entries, but the patch doesn't do anything directly to fix
that problem.  I wonder if we should do something like making
entry_dealloc() and/or gc_qtexts() aggressively remove sticky entries,
or at least those with "large" texts.

I think the aspects of this patch that are reasonably uncontroversial are
increasing the allowed malloc attempt size in gc_qtexts, flushing the
query text file on malloc failure, fixing the missing cleanup steps after
a gc failure, and making entry_dealloc's recomputation of mean_query_len
sane (which I'll define for the moment as "the same as gc_qtexts would
get").  Since we're hard against a release deadline, I propose to commit
just those changes, and we can consider the idea of a query size filter
and/or redefining mean_query_len at leisure.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

04 October 2015, 22:41:50

On Sun, Oct 4, 2015 at 9:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm.  I'm unconvinced by the aspects of this that involve using
> mean_query_len as a filter on which texts will be accepted.  While that's
> not necessarily bad in the abstract, there are way too many implementation
> artifacts here that will result in random-seeming decisions about whether
> to normalize.

There are already plausible race conditions that can make query text
normalization not occur. I think that's much more likely in practice
to cause a failure to normalize than anything proposed here; I've
personally observed such things in the wild a few times already. Also,
note that mean_query_len is not used directly there -- I decided on
Max(ASSUMED_LENGTH_INIT, mean_query_len) instead. mean_query_len is
just for cases with very large query texts.

> * mean_query_len only gets recomputed in entry_dealloc(), which is only
> run if we exceed pgss_max, and gc_qtexts(), which is only run if we decide
> the query texts file is more than 50% bloat.  So there could be quite a
> long startup transient before the value gets off its initialization
> minimum, and I'm suspicious that there might be plausible use-cases where
> it never does.  So it's not so much "restrict to a multiple of the mean
> query len" as "restrict to some number that might once upon a time have
> had some relation to the mean query len, or maybe not".

ASSUMED_LENGTH_INIT * 5 is a pretty conservative lower bound, I'd say.
mean_query_len is only really used for cases where query texts are
much longer on average. So in order for that to be a problem, you'd
have to have what are, in an absolute sense, very large query texts. I
think I noticed no more than a handful of changes in the regression
tests, for example.

> * One could expect that after changing mean_query_len, the population of
> query texts would change character as a result of the filter behavior
> changing, so that convergence to stable behavior over the long haul is
> not exactly self-evident.

FWIW, I think that there is a feedback loop today, and that in problem
cases that was what allowed it to get out of hand.

> * As you've got it here, entry_dealloc() and gc_qtexts() don't compute
> mean_query_len the same way, because only one of them discriminates
> against sticky entries.  So the value would bounce about rather randomly
> based on which one had run last.

entry_dealloc() will naturally run far more frequently than
gc_qtexts(). That said, it would be better if they matched.

> * I'm not exactly convinced that sticky entries should be ignored for
> this purpose anyway.

I think that data integration transactions that fail repeatedly are
strongly implicated here in practice. That's behind the query size
filter thing that you may also take issue with, as well as this.

> Taking a step back, ISTM the real issue you're fighting here is lots of
> orphaned sticky entries, but the patch doesn't do anything directly to fix
> that problem.  I wonder if we should do something like making
> entry_dealloc() and/or gc_qtexts() aggressively remove sticky entries,
> or at least those with "large" texts.

Sticky entries are (almost by definition) always aggressively removed,
and I hesitate to give certain ones a lower usage_count to begin with,
which is the only way to directly be more aggressive that might work
better.

> I think the aspects of this patch that are reasonably uncontroversial are
> increasing the allowed malloc attempt size in gc_qtexts, flushing the
> query text file on malloc failure, fixing the missing cleanup steps after
> a gc failure, and making entry_dealloc's recomputation of mean_query_len
> sane (which I'll define for the moment as "the same as gc_qtexts would
> get").  Since we're hard against a release deadline, I propose to commit
> just those changes, and we can consider the idea of a query size filter
> and/or redefining mean_query_len at leisure.

I'm not clear on what you actually propose to do to "make
entry_dealloc's recomputation of mean_query_len sane", but I think you
are talking about something distinct from what I've proposed based on
your separate remarks about entry_dealloc and the extra discrimination
against sticky entries there (vis-a-vis calculating mean query
length). I can't decide exactly what you mean, though: neither
entry_dealloc nor gc_qtexts care about orphaned query texts in my
patch (or in master). Please clarify.

I'd be quite happy if you did everything listed, and also left the
extra discrimination against sticky entries within entry_dealloc in --
consider what happens when a huge malloc() ends up swapping with an
exclusive lock held, and consider that repeated, failed data
integration transactions are implicated in this in a big way when a
problem appears in the wild. A big part of the problem here was that
garbage collection did not run often enough.

In other words, I'd be fine with *not* doing the query size filter
thing for now, since that is something that seems like an extra
defense and not core to the problem. I was kind of ambivalent about
doing that part myself, actually.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

04 October 2015, 23:04:17

Peter Geoghegan <pg@heroku.com> writes:
> I'm not clear on what you actually propose to do to "make
> entry_dealloc's recomputation of mean_query_len sane", but I think you
> are talking about something distinct from what I've proposed

Ah, right, sorry.  I meant to make its result match what gc_texts would
get, by not falsely counting entries with dropped texts.  That's not
what you have in your patch but it seems like an easy enough fix.

> I'd be quite happy if you did everything listed, and also left the
> extra discrimination against sticky entries within entry_dealloc in --
> consider what happens when a huge malloc() ends up swapping with an
> exclusive lock held, and consider that repeated, failed data
> integration transactions are implicated in this in a big way when a
> problem appears in the wild. A big part of the problem here was that
> garbage collection did not run often enough.

Hm.  The problem I've got with this is that then mean_query_len means
something significantly different after entry_dealloc than it does
after gc_texts.

I'd be okay with changing *both* of those functions to ignore sticky
entries in the calculation, if that seems reasonable to you.

> In other words, I'd be fine with *not* doing the query size filter
> thing for now, since that is something that seems like an extra
> defense and not core to the problem. I was kind of ambivalent about
> doing that part myself, actually.

Agreed on that part.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

04 October 2015, 23:12:20

On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ah, right, sorry.  I meant to make its result match what gc_texts would
> get, by not falsely counting entries with dropped texts.  That's not
> what you have in your patch but it seems like an easy enough fix.

I'm trying to make mean_query_len representative of *useful* entry
query length. I guess I don't have that within gc_texts in my patch,
but I do have it within entry_dealloc (up to and including considering
dropped texts), which FWIW is far more important.

>> I'd be quite happy if you did everything listed, and also left the
>> extra discrimination against sticky entries within entry_dealloc in --
>> consider what happens when a huge malloc() ends up swapping with an
>> exclusive lock held, and consider that repeated, failed data
>> integration transactions are implicated in this in a big way when a
>> problem appears in the wild. A big part of the problem here was that
>> garbage collection did not run often enough.
>
> Hm.  The problem I've got with this is that then mean_query_len means
> something significantly different after entry_dealloc than it does
> after gc_texts.
>
> I'd be okay with changing *both* of those functions to ignore sticky
> entries in the calculation, if that seems reasonable to you.

That seems perfectly reasonable, yes. Should I leave that to you?

>> In other words, I'd be fine with *not* doing the query size filter
>> thing for now, since that is something that seems like an extra
>> defense and not core to the problem. I was kind of ambivalent about
>> doing that part myself, actually.
>
> Agreed on that part.

We're in full agreement on what needs to happen for the next point
release, then. Excellent.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

04 October 2015, 23:14:34

Peter Geoghegan <pg@heroku.com> writes:
> On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm.  The problem I've got with this is that then mean_query_len means
>> something significantly different after entry_dealloc than it does
>> after gc_texts.
>> 
>> I'd be okay with changing *both* of those functions to ignore sticky
>> entries in the calculation, if that seems reasonable to you.

> That seems perfectly reasonable, yes. Should I leave that to you?

Sure, I can take it.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

04 October 2015, 23:25:26

On Sun, Oct 4, 2015 at 1:12 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ah, right, sorry.  I meant to make its result match what gc_texts would
>> get, by not falsely counting entries with dropped texts.  That's not
>> what you have in your patch but it seems like an easy enough fix.
>
> I'm trying to make mean_query_len representative of *useful* entry
> query length. I guess I don't have that within gc_texts in my patch,
> but I do have it within entry_dealloc (up to and including considering
> dropped texts), which FWIW is far more important.

To be clear: I wasn't sure why you though I falsely count entries with
dropped texts within entry_dealloc(). I suppose my sense was that
dropped texts ought to not make garbage collection occur too
frequently, which could also be a problem.

Garbage collection ought to occur when the size of the query text file
becomes excessive relative to useful entries. I was worried about the
thrashing risk from dropped text entries. Maybe we could, as an
alternative, not forget the original size of dropped query texts,
relying only on their offset to indicate the text is invalid. Dropped
query texts would then not be special in that sense, which seems like
a good thing all around.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

05 October 2015, 01:10:33

Peter Geoghegan <pg@heroku.com> writes:
> On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm.  The problem I've got with this is that then mean_query_len means
>> something significantly different after entry_dealloc than it does
>> after gc_texts.
>> 
>> I'd be okay with changing *both* of those functions to ignore sticky
>> entries in the calculation, if that seems reasonable to you.

> That seems perfectly reasonable, yes. Should I leave that to you?

After a closer look I decided that wasn't reasonable at all.  Discounting
sticky texts would then mean that after a GC cycle, we might still think
the query texts file is bloated and issue another GC request, which of
course would not shrink the file, so that we'd be doing GC cycles every
time we added a new query.  The mean query len recorded by gc_qtexts()
*has to* match the mean length of what it actually put in the file, not
the mean length of what we might wish it would put in the file.

By the same token, I'm back to believing that it's fundamentally bogus for
entry_dealloc() to compute mean_query_len that way.  The most likely
result of that is useless GC cycles.  The only thing that will actually
free memory when you've got a lot of dead sticky entries is getting rid of
the sticky hashtable entries, and the only way to do that is to wait for
entry_dealloc() to get rid of 'em.  You were unenthused about making that
removal more aggressive, which is fine, but you can't have it both ways.

It does strike me that when we do get rid of the sticky entries, cleanup
of the texts file might lag a bit longer than it needs to because
mean_query_len is computed before not after deleting whatever entries
we're going to delete.  On average, that shouldn't matter ... but if we
are tossing a bunch of dead sticky entries, maybe they would have a higher
mean length than the rest?  Not sure about it.  I put a comment about
this into entry_dealloc() for the moment, but we can revisit whether it
is worth adding code/cycles to get a more up-to-date mean length.

Anyway, I've committed the aspects of this that we're agreed on.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

05 October 2015, 01:16:46

Peter Geoghegan <pg@heroku.com> writes:
> To be clear: I wasn't sure why you though I falsely count entries with
> dropped texts within entry_dealloc().

In the existing^H^H^Hprevious code, dropped-text entries would essentially
act as length-zero summands in the average calculation, whereas I think
we agree that they ought to be ignored; otherwise they decrease the
computed mean and thereby increase the probability of (useless) GC cycles.
In the worst case where the hashtable is mostly dropped-text entries,
which would for instance be the prevailing situation shortly after a GC
failure, we'd be calculating ridiculously small mean values and that'd
prompt extra GC cycles no?
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Andrew Dunstan

Date:

05 October 2015, 01:59:01


On 10/04/2015 06:16 PM, Tom Lane wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> To be clear: I wasn't sure why you though I falsely count entries with
>> dropped texts within entry_dealloc().
> In the existing^H^H^Hprevious code, dropped-text entries would essentially
> act as length-zero summands in the average calculation, whereas I think
> we agree that they ought to be ignored; otherwise they decrease the
> computed mean and thereby increase the probability of (useless) GC cycles.
> In the worst case where the hashtable is mostly dropped-text entries,
> which would for instance be the prevailing situation shortly after a GC
> failure, we'd be calculating ridiculously small mean values and that'd
> prompt extra GC cycles no?
>
>             


Sorry, I'm a bit late to this party. Does what you have committed mean 
people are less likely to see "Out of Memory" coming from 
pg_stat_statements? If not, what can be done about them short of a 
restart? And what bad effects follow from an event generating them?

The docs seem to be quite silent on these points.

cheers

andrew

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

05 October 2015, 02:11:05

Andrew Dunstan <andrew@dunslane.net> writes:
> Sorry, I'm a bit late to this party. Does what you have committed mean 
> people are less likely to see "Out of Memory" coming from 
> pg_stat_statements? If not, what can be done about them short of a 
> restart? And what bad effects follow from an event generating them?

The main thing we've done that will alleviate that is increase the size of
query text file that the garbage-collection routine can cope with from
MaxAllocSize (1GB) to MaxAllocHugeSize (at least 2GB, lots more on 64bit
machines, though on 32-bit you probably can't get to 2GB anyway ...).

Also, what will now happen if you do get an out-of-memory is that the code
will discard stored query texts and truncate the file, so that the problem
doesn't recur (at least not till you build up a new set of stored query
texts).  At this point you still have statistics, but they can only be
identified by query ID since the text has been forgotten.  I'm not sure
how useful that situation really is ...

> The docs seem to be quite silent on these points.

The docs probably ought to describe this situation and recommend reducing
pg_stat_statements.max if you want to preserve query texts.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

05 October 2015, 02:13:11

On Sun, Oct 4, 2015 at 3:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That seems perfectly reasonable, yes. Should I leave that to you?
>
> After a closer look I decided that wasn't reasonable at all.  Discounting
> sticky texts would then mean that after a GC cycle, we might still think
> the query texts file is bloated and issue another GC request, which of
> course would not shrink the file, so that we'd be doing GC cycles every
> time we added a new query.

That seems unlikely to me. Firstly, there is a generic check of 512
bytes per entry within need_gc_qtexts() -- we never GC if that generic
limit is not exceeded, and that's far from tiny. Secondly, how long do
you think those sticky entries will stay around in the hastable to
continue to cause trouble, and dominate over regular entries? There'd
have to be constant evictions/cache pressure for this situation to
occur, because the threshold within need_gc_qtexts() is based on
pg_stat_statements.max, and yet many evictions aggressively evict
sticky entries very quickly. Seems like a very unusual workload to me.

I think that you're overestimating the cost of discounting sticky
entries, which are usually very much in the minority (at least when it
matters, with cache pressure), and underestimating the cost of
continuing to weigh sticky entries' contribution to mean query text.

As I said, my proposal to not have sticky entries contribute to
overall mean query text length is based on a problem report involving
a continually failing data integration process. That in particular is
what I hope to stop having a negative impact on mean query length -- a
unique queryid makes for an entry that is bound to remain useless
forever (as opposed to just failing the first few times). With the
larger limit of MaxAllocHugeSize, I worry about swapping with the
exclusive lock held.

The fact that there is a big disparity between mean query length for
sticky and non-sticky entries is weird. It was seen to happen in the
wild only because the sticky entries that clogged things up were not
really distinct to a human -- only to the fingerprinting/jumbling
code, more or less by accident, which in a practical sense caused a
distortion.  That's what I'm targeting. I have a hard time imagining
any harm from discounting sticky entries with a realistic case.

> The mean query len recorded by gc_qtexts()
> *has to* match the mean length of what it actually put in the file, not
> the mean length of what we might wish it would put in the file.

Why? Why not simply care about whether or not the file was
unreasonably large relative to available, useful query statistics? I
think that mean_query_len isn't something that swings all that much
with realistic workloads and 5,000 representative entries -- it
certainly should not swing wildly. The problem to an extent was that
that accidentally stopped being true.

> By the same token, I'm back to believing that it's fundamentally bogus for
> entry_dealloc() to compute mean_query_len that way.  The most likely
> result of that is useless GC cycles.  The only thing that will actually
> free memory when you've got a lot of dead sticky entries is getting rid of
> the sticky hashtable entries, and the only way to do that is to wait for
> entry_dealloc() to get rid of 'em.  You were unenthused about making that
> removal more aggressive, which is fine, but you can't have it both ways.

Meanwhile, mean_query_len grows as more "distinct" entries are
created, pushing out their garbage collection further and further.
Especially when there is a big flood of odd queries that stay sticky.
Once again, I'm having a really hard time imagining a minority of
current, non-sticky hashtable entries dominating a majority of
current, sticky hashtable entries come garbage collection time.
Garbage collection is linked to creation of entries, which is also
linked to eviction, which aggressively evicts sticky entries in this
thrashing scenario that you describe.

> It does strike me that when we do get rid of the sticky entries, cleanup
> of the texts file might lag a bit longer than it needs to because
> mean_query_len is computed before not after deleting whatever entries
> we're going to delete.  On average, that shouldn't matter ... but if we
> are tossing a bunch of dead sticky entries, maybe they would have a higher
> mean length than the rest?

Yes, that is something that was observed to happen in the problem case
I looked into. You know my solution already.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

05 October 2015, 02:29:07

On Sun, Oct 4, 2015 at 3:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> To be clear: I wasn't sure why you though I falsely count entries with
>> dropped texts within entry_dealloc().
>
> In the existing^H^H^Hprevious code, dropped-text entries would essentially
> act as length-zero summands in the average calculation, whereas I think
> we agree that they ought to be ignored; otherwise they decrease the
> computed mean and thereby increase the probability of (useless) GC cycles.
> In the worst case where the hashtable is mostly dropped-text entries,
> which would for instance be the prevailing situation shortly after a GC
> failure, we'd be calculating ridiculously small mean values and that'd
> prompt extra GC cycles no?

Yes, but my patch changed that, too. I suggested that first.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

05 October 2015, 04:03:57

On Wed, Sep 23, 2015 at 1:41 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> max was set to 10000. I don't know about average query text size, but the
> command that was causing the error was a very large number of individual
> INSERT ... VALUES statements all in one command.
>
> The machine had plenty of free memory and no ulimit, so I don't see how this
> could have been anything but MaxAllocSize, unless there's some other failure
> mode in malloc I don't know about.

The patch that Tom committed does __nothing__ to actually address
*how* you were able to get into this position in the first place.
Sure, now you might be able to recover if you're not affected by the
MaxAllocSize limitation, but: What did it take in the first place to
need memory > MaxAllocSize to do a garbage collection? Whatever it was
that allowed things to get that bad before a garbage collection was
attempted is not even considered in the patch committed. The patch
just committed is, in short, nothing more than a band-aid, and may do
more harm than good due to allocating huge amounts of memory while a
very contended exclusive LWLock is held.

You say you have plenty of memory and no ulimit, and I believe that
you're right that there was never a conventional OOM where malloc()
returns NULL. If we conservatively assume that you were only barely
over the MaxAllocSize limitation before a garbage collection was first
attempted, then the average query text length would have to be about
50 kilobytes, since only then 20,000 texts (pg_stat_statements.max *
2) would put us over.

Does anyone actually think that the *average* SQL query on Jim's
client's application was in excess of 50% the size of the ~3,000 line
file pg_stat_statements.c? It beggars belief.

I'm annoyed and disappointed that the patch committed does not even
begin to address the underlying problem -- it just adds an escape
hatch, and fixes another theoretical issue that no one was affected
by. Honestly, next time I won't bother.

-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

05 October 2015, 18:16:07

Peter Geoghegan <pg@heroku.com> writes:
> I'm annoyed and disappointed that the patch committed does not even
> begin to address the underlying problem -- it just adds an escape
> hatch, and fixes another theoretical issue that no one was affected
> by. Honestly, next time I won't bother.

The problem as I see it is that what you submitted is a kluge that will
have weird and unpredictable side effects.  Moreover, it seems to be
targeting an extremely narrow problem case, ie large numbers of queries
that (a) have long query texts and (b) are distinct to the fingerprinting
code and (c) fail.  It seems to me that you could get into equal trouble
with situations where (c) is not satisfied, and what then?

I'm certainly amenable to doing further work on this problem.  But I do
not think that what we had was well-enough-thought-out to risk pushing
it just hours before a release deadline.  Let's arrive at a more
carefully considered fix in a leisurely fashion.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Andrew Dunstan

Date:

05 October 2015, 18:27:56


On 10/05/2015 11:15 AM, Tom Lane wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> I'm annoyed and disappointed that the patch committed does not even
>> begin to address the underlying problem -- it just adds an escape
>> hatch, and fixes another theoretical issue that no one was affected
>> by. Honestly, next time I won't bother.
> The problem as I see it is that what you submitted is a kluge that will
> have weird and unpredictable side effects.  Moreover, it seems to be
> targeting an extremely narrow problem case, ie large numbers of queries
> that (a) have long query texts and (b) are distinct to the fingerprinting
> code and (c) fail.  It seems to me that you could get into equal trouble
> with situations where (c) is not satisfied, and what then?
>
> I'm certainly amenable to doing further work on this problem.  But I do
> not think that what we had was well-enough-thought-out to risk pushing
> it just hours before a release deadline.  Let's arrive at a more
> carefully considered fix in a leisurely fashion.
>
>             


FWIW, (a) and (b) but not (c) is probably the right description for my 
client who has been seeing problems here.

cheers

andrew

Re: Less than ideal error reporting in pg_stat_statements

From

Alvaro Herrera

Date:

05 October 2015, 18:40:41

Andrew Dunstan wrote:
> 
> On 10/05/2015 11:15 AM, Tom Lane wrote:
> >Peter Geoghegan <pg@heroku.com> writes:
> >>I'm annoyed and disappointed that the patch committed does not even
> >>begin to address the underlying problem -- it just adds an escape
> >>hatch, and fixes another theoretical issue that no one was affected
> >>by. Honestly, next time I won't bother.
> >The problem as I see it is that what you submitted is a kluge that will
> >have weird and unpredictable side effects.  Moreover, it seems to be
> >targeting an extremely narrow problem case, ie large numbers of queries
> >that (a) have long query texts and (b) are distinct to the fingerprinting
> >code and (c) fail.  It seems to me that you could get into equal trouble
> >with situations where (c) is not satisfied, and what then?

> FWIW, (a) and (b) but not (c) is probably the right description for my
> client who has been seeing problems here.

I think the fact that long IN lists are fingerprinted differently
according to the number of elements in the list makes the scenario
rather very likely -- not particularly narrow.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Less than ideal error reporting in pg_stat_statements

From

Tom Lane

Date:

05 October 2015, 18:50:50

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Andrew Dunstan wrote:
>> FWIW, (a) and (b) but not (c) is probably the right description for my
>> client who has been seeing problems here.

> I think the fact that long IN lists are fingerprinted differently
> according to the number of elements in the list makes the scenario
> rather very likely -- not particularly narrow.

That's certainly something worth looking at, but I think it's probably
more complicated than that.  If you just write "WHERE x IN (1,2,3,4)",
that gets folded to a ScalarArrayOp with a single array constant, which
the existing code would deal with just fine.  We need to identify
situations where that's not the case but yet we shouldn't distinguish.

In any case, that's just a marginal tweak for one class of query.
I suspect the right fix for the core problem is the one Peter mentioned
in passing earlier, namely make it possible to do garbage collection
without having to slurp the entire file into memory at once.  It'd be
slower, without a doubt, but we could continue to use the existing code
path unless the file gets really large.
        regards, tom lane

Re: Less than ideal error reporting in pg_stat_statements

From

Peter Geoghegan

Date:

05 October 2015, 20:38:32

On Mon, Oct 5, 2015 at 8:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That's certainly something worth looking at, but I think it's probably
> more complicated than that.  If you just write "WHERE x IN (1,2,3,4)",
> that gets folded to a ScalarArrayOp with a single array constant, which
> the existing code would deal with just fine.  We need to identify
> situations where that's not the case but yet we shouldn't distinguish.

I have a certain amount of sympathy for doing that kind of thing, but
significantly less than I have for cases with many failed queries,
which is why I focused on that.

> In any case, that's just a marginal tweak for one class of query.

Sometimes narrow cases are also important and representative cases. I
don't care if someone has thrashing type issues with
pg_stat_statements when they're doing something really odd that calls
into question the purpose of using it to begin with [1]. The two
classes of queries we talked about (1. Many aborted data integration
transactions, and 2. A variable number of constants) are interesting
because a reasonable person could have those cases, and run into
trouble with pg_stat_statements as a consequence.

> I suspect the right fix for the core problem is the one Peter mentioned
> in passing earlier, namely make it possible to do garbage collection
> without having to slurp the entire file into memory at once.  It'd be
> slower, without a doubt, but we could continue to use the existing code
> path unless the file gets really large.

While it would be nice to not lose query texts on OOM, that's not my
primary concern. My primary concern is infrequent garbage collection.
This fix certainly isn't going to help with the fact that garbage
collection can be stalled for far too long in at least
quasi-reasonable cases.

[1] http://www.postgresql.org/message-id/52E9D98A.4000007@lab.ntt.co.jp
-- 
Peter Geoghegan

Re: Less than ideal error reporting in pg_stat_statements

From

Jim Nasby

Date:

09 October 2015, 21:35:35

On 10/5/15 10:50 AM, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Andrew Dunstan wrote:
>>> FWIW, (a) and (b) but not (c) is probably the right description for my
>>> client who has been seeing problems here.
>
>> I think the fact that long IN lists are fingerprinted differently
>> according to the number of elements in the list makes the scenario
>> rather very likely -- not particularly narrow.
>
> That's certainly something worth looking at, but I think it's probably
> more complicated than that.  If you just write "WHERE x IN (1,2,3,4)",
> that gets folded to a ScalarArrayOp with a single array constant, which
> the existing code would deal with just fine.  We need to identify
> situations where that's not the case but yet we shouldn't distinguish.
>
> In any case, that's just a marginal tweak for one class of query.
> I suspect the right fix for the core problem is the one Peter mentioned
> in passing earlier, namely make it possible to do garbage collection
> without having to slurp the entire file into memory at once.  It'd be
> slower, without a doubt, but we could continue to use the existing code
> path unless the file gets really large.

To address what Peter raised up-thread, according to my client the 
process that was doing this was generating 10,000 inserts per 
transaction and sending them all as a single statement. They tried 
cutting it to 1000 inserts and it still had the problem. Each overall 
command string could have been megabytes in size. Perhaps it's not worth 
supporting that, but if that's the decision then there needs to at least 
be better error reporting around this.

I'll try to test Tom's patch next week to see what affect it has on this.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Less than ideal error reporting in pg_stat_statements

From

Jim Nasby

Date:

21 October 2015, 01:15:34

On 10/4/15 6:10 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> >Sorry, I'm a bit late to this party. Does what you have committed mean
>> >people are less likely to see "Out of Memory" coming from
>> >pg_stat_statements? If not, what can be done about them short of a
>> >restart? And what bad effects follow from an event generating them?
> The main thing we've done that will alleviate that is increase the size of
> query text file that the garbage-collection routine can cope with from
> MaxAllocSize (1GB) to MaxAllocHugeSize (at least 2GB, lots more on 64bit
> machines, though on 32-bit you probably can't get to 2GB anyway ...).

FWIW, I've verified on $CLIENT's system that this works as Tom 
described. The truncation happened somewhere a bit north of 3GB, which 
seems odd as this is a 64 bit system. But at least there were no OOM errors.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com