Home > mailing lists

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres - Mailing list pgsql-general

From	Jeff Janes
Subject	Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Date	February 21, 2017 23:46:12
Msg-id	CAMkU=1zdL_04dNCZMgxRYs9w4-tJHumj+o+_3-uGVf584f7wWw@mail.gmail.com Whole thread
In response to	Re: [GENERAL] Bad planning data resulting in OOM killing of postgres (David Hinkle <hinkle@cipafilter.com>)
List	pgsql-general

Tree view

On Thu, Feb 16, 2017 at 3:27 PM, David Hinkle <hinkle@cipafilter.com> wrote:

I guess this doesn't work, latest test run crashed. It still uses the
bad plan for the hostid column even after n_distinct is updated.

cipafilter=# select attname, n_distinct from pg_stats where tablename
cipafilter-# = 'log_raw' and (attname = 'urlid' or attname = 'titleid'
or attname =
cipafilter(# 'hostid');
attname | n_distinct
---------+-------------
urlid | 1.51625e+08
hostid | 304710
titleid | 886499
(3 rows)

cipafilter=# explain DELETE FROM hosts WHERE NOT EXISTS ( SELECT
log_raw.hostid FROM log_raw WHERE log_raw.hostid = hosts.hostid );
QUERY PLAN
-----------------------------------------------------------------------------------------
Delete on hosts (cost=22249475.67..74844813.47 rows=1 width=12)
-> Hash Anti Join (cost=22249475.67..74844813.47 rows=1 width=12)
Hash Cond: (hosts.hostid = log_raw.hostid)
-> Seq Scan on hosts (cost=0.00..5017.10 rows=304710 width=10)
-> Hash (cost=12799395.52..12799395.52 rows=543645052 width=10)
-> Seq Scan on log_raw (cost=0.00..12799395.52
rows=543645052 width=10)
(6 rows)

I guess I will also try throwing in 'set enable_hashjoin = false;' and
see if that gets these purges to go.

Another option would be to force the de-dup to happen, with:

explain with t as (select distinct hostid from log_raw) delete from hosts where not exists (select 1 from t where t.hostid=hosts.hostid)

That way you can use the hash join without running out of memory, in case the hash join is actually faster than the merge join. Also, it just seems cleaner than fiddling with enable_* parameters and then having to remember to reset them when done.

Cheers,

Jeff

pgsql-general by date:

From: Merlin Moncure
Date: 21 February 2017, 23:14:36
Subject: Re: [GENERAL] postgresql how to duplicate rows in result.

From: Tom Lane
Date: 21 February 2017, 23:58:00
Subject: Re: [GENERAL] postgresql how to duplicate rows in result.

Re: [GENERAL] Bad planning data resulting in OOM killing of postgres - Mailing list pgsql-general

Previous

Next