Re: [GENERAL] Bad planning data resulting in OOM killing of postgres - Mailing list pgsql-general

From David Hinkle
Subject Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Date
Msg-id CACw4T0rHY6EnyW0A+4-C-YPkhkFxvwhy-C12GYFd4dZcQ=AGuQ@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] Bad planning data resulting in OOM killing of postgres  (David Hinkle <hinkle@cipafilter.com>)
List pgsql-general
Maybe also useful to know:

cipafilter=# select reltuples from pg_class where relname = 'log_raw';
  reltuples
-------------
 5.40531e+08
(1 row)

On Wed, Feb 15, 2017 at 7:55 PM, David Hinkle <hinkle@cipafilter.com> wrote:
> Thanks for your help!
>
> Karsten: The system does fill up swap before it blows up.  This
> particular model has 8G of ram and 4G of swap and runs kernel
> 4.4.0-53-generic #74~14.04.1-Ubuntu.
>
> Tom, there are three columns in this table that exhibit the problem,
> here is the statistics data after an analyze, and the real data to
> compare it to.
>
> cipafilter=# select attname, n_distinct, most_common_freqs from
> pg_stats where tablename = 'log_raw' and (attname = 'urlid' or attname
> = 'titleid' or attname = 'hostid');
>  attname | n_distinct |
>
>
>
>
>
>
>  most_common_freqs
>
---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  urlid   |      38963 |
>
{0.0188,0.009,0.00853333,0.0079,0.00686667,0.0068,0.00616667,0.0061,0.00606667,0.005,0.0049,0.00483333,0.00463333,0.00456667,0.0044,0.0044,0.0039,0.0032,0.00263333,0.00263333,0.00256667,0.00256667,0.00246667,0.0023,0.0023,0.00223333,0.00203333,0.002,0.002,0.002,0.0019,0.00186667,0.00183333,0.0018,0.0018,0.00176667,0.00176667,0.00176667,0.00176667,0.00173333,0.00173333,0.00173333,0.0017,0.0017,0.00166667,0.0016,0.00156667,0.00153333,0.0015,0.00146667,0.00143333,0.0014,0.0014,0.0014,0.0014,0.0014,0.0014,0.00136667,0.00133333,0.0013,0.0013,0.0013,0.0013,0.00126667,0.00126667,0.00123333,0.00123333,0.00123333,0.00123333,0.0012,0.0012,0.00113333,0.0011,0.00106667,0.00106667,0.001,0.001,0.001,0.001,0.000966667,0.000966667,0.000966667,0.000966667,0.000966667,0.000933333,0.000933333,0.000933333,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.0009,0.000866667,0.000866667,0.000866667,0.000866667}
>  hostid  |       2478 |
>
{0.2109,0.0330667,0.0321667,0.0245,0.0223,0.0177667,0.0165,0.0152,0.0151333,0.014,0.0132,0.0115667,0.00993333,0.00963333,0.00873333,0.00853333,0.00853333,0.00746667,0.0074,0.00653333,0.0065,0.0065,0.00646667,0.00513333,0.00506667,0.00503333,0.00496667,0.00493333,0.0049,0.00486667,0.00483333,0.00453333,0.0044,0.00433333,0.00426667,0.00413333,0.00396667,0.00386667,0.00383333,0.00363333,0.00363333,0.00363333,0.00346667,0.00323333,0.00323333,0.0032,0.00316667,0.00303333,0.00303333,0.0029,0.00286667,0.00273333,0.00273333,0.00266667,0.00263333,0.00263333,0.0026,0.0026,0.0025,0.00243333,0.00243333,0.0024,0.00236667,0.00226667,0.00223333,0.00216667,0.00213333,0.0021,0.0021,0.00206667,0.00206667,0.00203333,0.002,0.002,0.00193333,0.0019,0.0019,0.00186667,0.00186667,0.00186667,0.0018,0.0018,0.0018,0.00176667,0.00173333,0.0017,0.0017,0.00166667,0.00166667,0.00166667,0.00163333,0.00163333,0.00153333,0.0015,0.0015,0.0015,0.00146667,0.00146667,0.00146667,0.00146667}
>  titleid |        292 | {0.767167}
> (3 rows)
>
> I have to patch the pg_stats table to get postgres to run the
> following queries without crashing:
>
> cipafilter=#     UPDATE pg_statistic AS s
> cipafilter-#         SET stadistinct = (select reltuples from pg_class
> where relname = 'titles')
> cipafilter-#     FROM pg_class c, pg_attribute a where c.oid =
> s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and
> cipafilter-#         relname = 'log_raw' and attname = 'titleid';
> UPDATE 1
> cipafilter=#     UPDATE pg_statistic AS s
> cipafilter-#         SET stadistinct = (select reltuples from pg_class
> where relname = 'urls')
> cipafilter-#     FROM pg_class c, pg_attribute a where c.oid =
> s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and
> cipafilter-#         relname = 'log_raw' and attname = 'urlid';
> UPDATE 1
> cipafilter=#     UPDATE pg_statistic AS s
> cipafilter-#         SET stadistinct = (select reltuples from pg_class
> where relname = 'hosts')
> cipafilter-#     FROM pg_class c, pg_attribute a where c.oid =
> s.starelid and c.oid = a.attrelid and a.attnum = s.staattnum and
> cipafilter-#         relname = 'log_raw' and attname = 'hostid';
> UPDATE 1
>
> cipafilter=#  select attname, n_distinct from pg_stats where tablename
> = 'log_raw' and (attname = 'urlid' or attname = 'titleid' or attname =
> 'hostid');
>  attname | n_distinct
> ---------+-------------
>  urlid   | 1.51452e+08
>  hostid  |      303756
>  titleid |      879485
>
> cipafilter=# select titleid, count(titleid) from log_raw group by
> titleid order by count(titleid) desc limit 10;
>  titleid |   count
> ---------+-----------
>        1 | 423848049
>    49547 |    403432
>      238 |    188151
>       12 |    151640
>        5 |    149524
>     6196 |    139445
>    32014 |    123828
>      200 |     88682
>    58921 |     86451
>       10 |     84264
> (10 rows)
>
> cipafilter=# select urlid, count(urlid) from log_raw group by urlid
> order by count(urlid) desc limit 10;
>  urlid  |  count
> --------+----------
>  129991 | 10843088
>       1 |  4953757
>      21 |  4345503
>    2765 |  4266981
>      12 |  3654127
>     920 |  3609054
>    1135 |  3562185
>      20 |  3495023
>  283567 |  3019675
>    2494 |  2655301
> (10 rows)
>
> cipafilter=# select hostid, count(hostid) from log_raw group by hostid
> order by count(hostid) desc limit 10;
>  hostid |   count
> --------+-----------
>       7 | 117342686
>       5 |  18016481
>      53 |  17408992
>      57 |  12947564
>     543 |  12698269
>       1 |  10068246
>  127544 |   8746204
>      27 |   8618595
>      40 |   8507278
>      36 |   7424412
> (10 rows)
>
> Fun fact, hostid 7 is 'google.com'.
>
> On Wed, Feb 15, 2017 at 6:11 AM, Karsten Hilbert
> <Karsten.Hilbert@gmx.net> wrote:
>> On Wed, Feb 15, 2017 at 01:04:51PM +0100, Karsten Hilbert wrote:
>>
>>> > Nope, that pops too.  The query runs for a long time at a somewhat
>>> > normal rate of ram consumption, using ~1G of RSS then suddenly spikes
>>> > to about 6G, at which point the OOM killer pops it.  Box has 8G of ram
>>> > and 4G of swap.
>>>
>>> By any chance:
>>>
>>> - when it happens has the kernel considered using swap ?
>>>
>>> - which kernel are you running ?
>>>
>>> There's been (for some workloads) massive problems with RAM
>>> exhaustion / swapping / OOM killer going wild with
>>> 4.7/4.8/some 4.9 kernels.
>>
>> I guess what I'm trying to say is that it may actually not be
>> PostgreSQL's fault but rather the kernel invoking the OOM
>> killer way prematurely.
>>
>> Karsten
>> --
>> GPG key ID E4071346 @ eu.pool.sks-keyservers.net
>> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> David Hinkle
>
> Senior Software Developer
>
> Phone:  800.243.3729x3000
>
> Email:  hinkle@cipafilter.com
>
> Hours:  Mon-Fri   8:00AM-5:00PM (CT)



--
David Hinkle

Senior Software Developer

Phone:  800.243.3729x3000

Email:  hinkle@cipafilter.com

Hours:  Mon-Fri   8:00AM-5:00PM (CT)


pgsql-general by date:

Previous
From: David Hinkle
Date:
Subject: Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Next
From: Patrick B
Date:
Subject: Re: [GENERAL] How to evaluate "explain analyze" correctly after"explain" for the same statement ?