Home > mailing lists

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres - Mailing list pgsql-general

From	Tom Lane
Subject	Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Date	February 15, 2017 05:36:39
Msg-id	2270.1487115399@sss.pgh.pa.us Whole thread Raw
In response to	Re: [GENERAL] Bad planning data resulting in OOM killing of postgres (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

I wrote:
> David Hinkle <hinkle@cipafilter.com> writes:
>> Thanks guys, here's the information you requested:
>> psql:postgres@cipafilter = show work_mem;
>> work_mem
>> ──────────
>> 10MB
>> (1 row)

> [ squint... ]  It should absolutely not have tried to hash a 500M-row
> table if it thought work_mem was only 10MB.  I wonder if there's an
> integer-overflow problem or something like that.

Ah, nah, scratch that: I was thinking of the hash aggregation case,
where there's no run-time ability to spill to disk so the planner
will not risk using hash aggregation if it estimates the hash table
would exceed work_mem.  Hash joins do have the ability to restrict
memory consumption by increasing the number of batches, so the planner
doesn't worry about it in that case.

I think what must be happening is that there's some one value of
log_raw.titleid that occurs a ridiculous number of times, so that
the executor is unable to split up that particular hash bucket,
leading to OOM when it tries to load all those rows to process
the hash bucket.

The planner does attempt to estimate the worst-case bucket size,
but in what now seems like brain fade, it doesn't do more with that
information than charge an appropriate number of tuple comparisons.
That would somewhat discourage use of a hash join, but a merge join
on this many tuples would be pretty expensive too, so it's not overly
surprising that it went with hashing anyway.

I am thinking we ought to fix it so it rejects (or at least heavily
penalizes) a hash join if it estimates that the rows containing
the inner side's most common value wouldn't all fit in work_mem.

What's not completely certain however is whether this diagnosis is
accurate for your case, or whether the proposed remedy would fix it.
It would be useful to see the contents of pg_stats.most_common_freqs
for log_raw.titleid.

            regards, tom lane

pgsql-general by date:

From: James Sewell
Date: 15 February 2017, 05:29:53
Subject: Re: [GENERAL] PostgreSQL corruption

From: Adrian Klaver
Date: 15 February 2017, 06:47:21
Subject: Re: [GENERAL] database folder name and tables filenames

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres - Mailing list pgsql-general

Previous

Next