Re: [GENERAL] Bad planning data resulting in OOM killing of postgres - Mailing list pgsql-general

From David Hinkle
Subject Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Date
Msg-id CACw4T0pNRus3gaxUu4wmsb+m4WER9YerqtmvPfmSRQ4WnF5TiQ@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] Bad planning data resulting in OOM killing of postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [GENERAL] Bad planning data resulting in OOM killing of postgres  (David Hinkle <hinkle@cipafilter.com>)
List pgsql-general
Yep, 420ish million records out of 540 million records have a titleid
of 1.   There are about 880,000 other unique values, but most of the
records are 1.   Of course, n_distinct is only 292.   I'm surprised
it's not eliminating the duplicates while it builds that hash table.

This is what I'm doing for a work around right now.  Getting
n_distinct right seems to be preventing the system from breaking.
It's going to be executed once a week during the weekly maintenance.
It's setting the n_distinct of each column to the number of rows in
the associated table.

    CREATE OR REPLACE FUNCTION patch_ndistinct(_table varchar, _column
varchar, _string_table varchar)
      RETURNS real AS
    $$
         DECLARE _cnt REAL;
    BEGIN
         SELECT reltuples INTO _cnt from pg_class where relname = _string_table;
       EXECUTE 'ALTER TABLE ' || _table || ' ALTER COLUMN ' || _column
|| ' SET (n_distinct=' || _cnt || ')';
       RETURN _cnt;
    END
    $$ LANGUAGE plpgsql;
    select patch_ndistinct('log_raw', 'titleid', 'titles');
    select patch_ndistinct('log_raw', 'urlid', 'urls');
    select patch_ndistinct('log_raw', 'hostid', 'hosts');
    ANALYZE log_raw;

On Thu, Feb 16, 2017 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Hinkle <hinkle@cipafilter.com> writes:
>> Tom, there are three columns in this table that exhibit the problem,
>> here is the statistics data after an analyze, and the real data to
>> compare it to.
>
>>  attname | n_distinct |  most_common_freqs
>
>>  titleid |        292 | {0.767167}
>
> Ouch.  That's saying there's some single value of titleid that accounts
> for more than three-quarters of the entries ... does that square with
> reality?  That'd certainly explain why a hash join goes nuts.
>
>                         regards, tom lane



--
David Hinkle

Senior Software Developer

Phone:  800.243.3729x3000

Email:  hinkle@cipafilter.com

Hours:  Mon-Fri   8:00AM-5:00PM (CT)


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: [GENERAL] disk writes within a transaction
Next
From: Tom Lane
Date:
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries