Re: [GENERAL] Bad planning data resulting in OOM killing of postgres - Mailing list pgsql-general

From Jeff Janes
Subject Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Date
Msg-id CAMkU=1zdL_04dNCZMgxRYs9w4-tJHumj+o+_3-uGVf584f7wWw@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] Bad planning data resulting in OOM killing of postgres  (David Hinkle <hinkle@cipafilter.com>)
List pgsql-general
On Thu, Feb 16, 2017 at 3:27 PM, David Hinkle <hinkle@cipafilter.com> wrote:
I guess this doesn't work, latest test run crashed.  It still uses the
bad plan for the hostid column even after n_distinct is updated.

cipafilter=# select attname, n_distinct from pg_stats where tablename
cipafilter-# = 'log_raw' and (attname = 'urlid' or attname = 'titleid'
or attname =
cipafilter(# 'hostid');
 attname | n_distinct
---------+-------------
 urlid   | 1.51625e+08
 hostid  |      304710
 titleid |      886499
(3 rows)

cipafilter=# explain DELETE FROM hosts WHERE NOT EXISTS ( SELECT
log_raw.hostid FROM log_raw WHERE log_raw.hostid = hosts.hostid );
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Delete on hosts  (cost=22249475.67..74844813.47 rows=1 width=12)
   ->  Hash Anti Join  (cost=22249475.67..74844813.47 rows=1 width=12)
         Hash Cond: (hosts.hostid = log_raw.hostid)
         ->  Seq Scan on hosts  (cost=0.00..5017.10 rows=304710 width=10)
         ->  Hash  (cost=12799395.52..12799395.52 rows=543645052 width=10)
               ->  Seq Scan on log_raw  (cost=0.00..12799395.52
rows=543645052 width=10)
(6 rows)

I guess I will also try throwing in 'set enable_hashjoin = false;' and
see if that gets these purges to go.



Another option would be to force the de-dup to happen, with:

explain with t as (select distinct hostid from log_raw) delete from hosts where not exists (select 1 from t where t.hostid=hosts.hostid)

That way you can use the hash join without running out of memory, in case the hash join is actually faster than the merge join.  Also, it just seems cleaner than fiddling with enable_* parameters and then having to remember to reset them when done.

Cheers,

Jeff

pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: [GENERAL] postgresql how to duplicate rows in result.
Next
From: Tom Lane
Date:
Subject: Re: [GENERAL] postgresql how to duplicate rows in result.