Thread: Query became very slow after 9.6 -> 10 upgrade
Hello!
What about :
select name,setting from pg_settings where name like '%_cost';
--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: Dmitry Shalashov [mailto:skaurus@gmail.com]
Sent: Wednesday, November 22, 2017 5:14 PM
To: pgsql-performance@postgresql.org
Subject: Query became very slow after 9.6 -> 10 upgrade
Hi!
I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.
It's "nestloop hits again" situation.
I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.
It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.
\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)
\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column)
\d domains: https://pastebin.com/65hk7YCm (73000 rows)
All three tables are analyzed.
EXPLAIN ANALYZE: https://pastebin.com/PenHEgf0
EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)
Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.
What can I do with it?
Also maybe this will be useful:
1st query, runs under 1ms
select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts
2nd query that uses 1st one, runs under 3 ms
select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1
3rd query which returns 1.5mln rows, runs in about 0.6s
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)
BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)
Plan of last query:
Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)
-> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)
Group Key: (unnest(adroom.domain_ids))
-> HashAggregate (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)
Group Key: unnest(adroom.domain_ids)
-> ProjectSet (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)
-> Index Scan using adroom_active_idx on adroom (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)
Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))
Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
Rows Removed by Filter: 41
-> Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043)
Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))
Planning time: 1.580 ms
Execution time: 733331.740 ms
Dmitry Shalashov, relap.io & surfingbird.ru
Hello!
What about :
select name,setting from pg_settings where name like '%_cost';
--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: Dmitry Shalashov [mailto:skaurus@gmail.com]
Sent: Wednesday, November 22, 2017 5:14 PM
To: pgsql-performance@postgresql.org
Subject: Query became very slow after 9.6 -> 10 upgrade
Hi!
I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.
It's "nestloop hits again" situation.
I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.
It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.
\d adroom: https://pastebin.com/
vBrPGtxT (3800 rows) \d adroom_stat: https://pastebin.
com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column) \d domains: https://pastebin.com/
65hk7YCm (73000 rows)
All three tables are analyzed.
EXPLAIN ANALYZE: https://pastebin.com/
PenHEgf0 EXPLAIN ANALYZE with nestloop off: https://pastebin.com/
zX35CPCV (0.8s)
Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.
What can I do with it?
Also maybe this will be useful:
1st query, runs under 1ms
select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts
2nd query that uses 1st one, runs under 3 ms
select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1
3rd query which returns 1.5mln rows, runs in about 0.6s
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)
BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)
Plan of last query:
Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)
-> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)
Group Key: (unnest(adroom.domain_ids))
-> HashAggregate (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)
Group Key: unnest(adroom.domain_ids)
-> ProjectSet (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)
-> Index Scan using adroom_active_idx on adroom (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)
Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))
Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
Rows Removed by Filter: 41
-> Index Scan using adroom_stat_day_adroom_id_
domain_id_url_id_is_wlabp_idx on adroom_stat (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043) Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))
Planning time: 1.580 ms
Execution time: 733331.740 ms
Dmitry Shalashov, relap.io & surfingb
ird.ru
Here is my select right after initdb:
postgres=# select name,setting from pg_settings where name like '%_cost';
name | setting
----------------------+---------
cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
parallel_setup_cost | 1000
parallel_tuple_cost | 0.1
random_page_cost | 4
seq_page_cost | 1
Can you generate plan with random_page_cost = 4?
--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: Dmitry Shalashov [mailto:skaurus@gmail.com]
Sent: Wednesday, November 22, 2017 5:29 PM
To: Alex Ignatov <a.ignatov@postgrespro.ru>
Cc: pgsql-performance@postgresql.org
Subject: Re: Query became very slow after 9.6 -> 10 upgrade
Sure, here it goes:
name | setting
----------------------+---------
cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
parallel_setup_cost | 1000
parallel_tuple_cost | 0.1
random_page_cost | 1
seq_page_cost | 1
Dmitry Shalashov, relap.io & surfingbird.ru
2017-11-22 17:24 GMT+03:00 Alex Ignatov <a.ignatov@postgrespro.ru>:
Hello!
What about :
select name,setting from pg_settings where name like '%_cost';
--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: Dmitry Shalashov [mailto:skaurus@gmail.com]
Sent: Wednesday, November 22, 2017 5:14 PM
To: pgsql-performance@postgresql.org
Subject: Query became very slow after 9.6 -> 10 upgrade
Hi!
I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.
It's "nestloop hits again" situation.
I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.
It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.
\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)
\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column)
\d domains: https://pastebin.com/65hk7YCm (73000 rows)
All three tables are analyzed.
EXPLAIN ANALYZE: https://pastebin.com/PenHEgf0
EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)
Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.
What can I do with it?
Also maybe this will be useful:
1st query, runs under 1ms
select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts
2nd query that uses 1st one, runs under 3 ms
select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1
3rd query which returns 1.5mln rows, runs in about 0.6s
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)
BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)
Plan of last query:
Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)
-> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)
Group Key: (unnest(adroom.domain_ids))
-> HashAggregate (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)
Group Key: unnest(adroom.domain_ids)
-> ProjectSet (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)
-> Index Scan using adroom_active_idx on adroom (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)
Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))
Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
Rows Removed by Filter: 41
-> Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043)
Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))
Planning time: 1.580 ms
Execution time: 733331.740 ms
Dmitry Shalashov, relap.io & surfingbird.ru
Here is my select right after initdb:
postgres=# select name,setting from pg_settings where name like '%_cost';
name | setting
----------------------+-------
-- cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
parallel_setup_cost | 1000
parallel_tuple_cost | 0.1
random_page_cost | 4
seq_page_cost | 1
Can you generate plan with random_page_cost = 4?
--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: Dmitry Shalashov [mailto:skaurus@gmail.com]
Sent: Wednesday, November 22, 2017 5:29 PM
To: Alex Ignatov <a.ignatov@postgrespro.ru>
Cc: pgsql-performance@postgresql.org
Subject: Re: Query became very slow after 9.6 -> 10 upgrade
Sure, here it goes:
name | setting
----------------------+-------
-- cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
parallel_setup_cost | 1000
parallel_tuple_cost | 0.1
random_page_cost | 1
seq_page_cost | 1
Dmitry Shalashov, relap.io & surfingb
ird.ru
2017-11-22 17:24 GMT+03:00 Alex Ignatov <a.ignatov@postgrespro.ru>:
Hello!
What about :
select name,setting from pg_settings where name like '%_cost';
--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: Dmitry Shalashov [mailto:skaurus@gmail.com]
Sent: Wednesday, November 22, 2017 5:14 PM
To: pgsql-performance@postgresql.org
Subject: Query became very slow after 9.6 -> 10 upgrade
Hi!
I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.
It's "nestloop hits again" situation.
I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.
It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.
\d adroom: https://pastebin.com/
vBrPGtxT (3800 rows) \d adroom_stat: https://pastebin.
com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column) \d domains: https://pastebin.com/
65hk7YCm (73000 rows)
All three tables are analyzed.
EXPLAIN ANALYZE: https://pastebin.com/
PenHEgf0 EXPLAIN ANALYZE with nestloop off: https://pastebin.com/
zX35CPCV (0.8s)
Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.
What can I do with it?
Also maybe this will be useful:
1st query, runs under 1ms
select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts
2nd query that uses 1st one, runs under 3 ms
select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1
3rd query which returns 1.5mln rows, runs in about 0.6s
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)
BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)
Plan of last query:
Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)
-> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)
Group Key: (unnest(adroom.domain_ids))
-> HashAggregate (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)
Group Key: unnest(adroom.domain_ids)
-> ProjectSet (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)
-> Index Scan using adroom_active_idx on adroom (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)
Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))
Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
Rows Removed by Filter: 41
-> Index Scan using adroom_stat_day_adroom_id_
domain_id_url_id_is_wlabp_idx on adroom_stat (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043) Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))
Planning time: 1.580 ms
Execution time: 733331.740 ms
Dmitry Shalashov, relap.io & surfingb
ird.ru
IMHO the problems here are due to poor cardinality estimates. For example in the first query, the problem is here: -> Nested Loop (cost=0.42..2.46 rows=1 width=59) (actual time=2.431..91.330 rows=3173 loops=1) -> CTE Scan on b (cost=0.00..0.02 rows=1 width=40) (actual time=2.407..23.115 rows=3173 loops=1) -> Index Scan using domains_pkey on domains d (cost=0.42..2.44 rows=1 width=19) (actualtime=0.018..0.018 rows=1 loops=3173) That is, the database expects the CTE to return 1 row, but it returns 3173 of them, which makes the nested loop very inefficient. Similarly for the other query, where this happens: Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1) -> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1) That is, about 1:3000 difference in both cases. Those estimation errors seem to be caused by a condition that is almost impossible to estimate, because in both queries it does this: groups->0->>'provider' ~ '^something' That is, it's a regexp on an expression. You might try creating an index on the expression (which is the only way to add expression statistics), and reformulate the condition as LIKE (which I believe we can estimate better than regular expressions, but I haven't tried). So something like CREATE INDEX ON adroom ((groups->0->>'provider')); WHERE groups->0->>'provider' LIKE 'something%'; regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Dmitry Shalashov <skaurus@gmail.com> writes: > BUT if I'll add to 3rd query one additional condition, which is basically > 2nd query, it will ran same 12 minutes: > SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day > between date_trunc('day', current_timestamp - interval '1 week') and > date_trunc('day', current_timestamp) AND domain_id IN (select distinct > unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, > domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and > not is_paused and current_timestamp between start_ts and stop_ts) t1) > Plan of last query: > Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual > time=3.512..733248.271 rows=1442797 loops=1) > -> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual > time=3.380..13.561 rows=3043 loops=1) > Group Key: (unnest(adroom.domain_ids)) > -> HashAggregate (cost=88.03..88.04 rows=1 width=4) (actual > time=2.199..2.607 rows=3043 loops=1) > Group Key: unnest(adroom.domain_ids) > -> ProjectSet (cost=0.28..87.78 rows=100 width=4) (actual > time=0.701..1.339 rows=3173 loops=1) Hm, seems like the problem is that that lower HashAggregate is estimated as having only one row out, which is way off and doesn't sound like a particularly bright default estimate anyway. (And then we're doing an additional HashAggregate on top of that, which is useless --- implies that something isn't realizing that the output of the SELECT DISTINCT is already distinct.) I'm suspicious that this is breakage from the work that was done on targetlist SRFs in v10, but that's just a guess at this point. Trying simple test queries involving WHERE x IN (SELECT DISTINCT unnest(foo) FROM ...), I do not see a behavior like this, so there is some not-very-obvious contributing factor in your situation. Can you put together a self-contained test case that produces a bogus one-row estimate? Extra points if it produces duplicate HashAgg steps. regards, tom lane
Dmitry Shalashov <skaurus@gmail.com> writes:
> BUT if I'll add to 3rd query one additional condition, which is basically
> 2nd query, it will ran same 12 minutes:
> SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day
> between date_trunc('day', current_timestamp - interval '1 week') and
> date_trunc('day', current_timestamp) AND domain_id IN (select distinct
> unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider,
> domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and
> not is_paused and current_timestamp between start_ts and stop_ts) t1)
> Plan of last query:
> Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual
> time=3.512..733248.271 rows=1442797 loops=1)
> -> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual
> time=3.380..13.561 rows=3043 loops=1)
> Group Key: (unnest(adroom.domain_ids))
> -> HashAggregate (cost=88.03..88.04 rows=1 width=4) (actual
> time=2.199..2.607 rows=3043 loops=1)
> Group Key: unnest(adroom.domain_ids)
> -> ProjectSet (cost=0.28..87.78 rows=100 width=4) (actual
> time=0.701..1.339 rows=3173 loops=1)
Hm, seems like the problem is that that lower HashAggregate is estimated
as having only one row out, which is way off and doesn't sound like a
particularly bright default estimate anyway. (And then we're doing an
additional HashAggregate on top of that, which is useless --- implies
that something isn't realizing that the output of the SELECT DISTINCT
is already distinct.)
I'm suspicious that this is breakage from the work that was done on
targetlist SRFs in v10, but that's just a guess at this point.
Trying simple test queries involving WHERE x IN (SELECT DISTINCT
unnest(foo) FROM ...), I do not see a behavior like this, so there is some
not-very-obvious contributing factor in your situation. Can you put
together a self-contained test case that produces a bogus one-row
estimate? Extra points if it produces duplicate HashAgg steps.
regards, tom lane
Dmitry Shalashov <skaurus@gmail.com> writes: > Turns out we had not 9.6 but 9.5. I'd managed to reproduce the weird planner behavior locally in the regression database: regression=# create table foo (f1 int[], f2 int); CREATE TABLE regression=# explain select * from tenk1 where unique2 in (select distinct unnest(f1) from foo where f2=1); QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=30.85..80.50 rows=6 width=244) -> HashAggregate (cost=30.57..30.63 rows=6 width=4) Group Key: (unnest(foo.f1)) -> HashAggregate (cost=30.42..30.49 rows=6 width=4) Group Key: unnest(foo.f1) -> ProjectSet (cost=0.00..28.92 rows=600 width=4) -> Seq Scan on foo (cost=0.00..25.88 rows=6 width=32) Filter: (f2 = 1) -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..8.30 rows=1 width=244) Index Cond: (unique2 = (unnest(foo.f1))) (10 rows) Digging into it, the reason for the duplicate HashAggregate step was that query_supports_distinctness() punted on SRFs-in-the-targetlist, basically on the argument that it wasn't worth extra work to handle that case. Thinking a bit harder, it seems to me that the correct analysis is: 1. If we are proving distinctness on the grounds of a DISTINCT clause, then it doesn't matter whether there are any SRFs, because DISTINCT removes duplicates after tlist SRF expansion. 2. But tlist SRFs break the ability to prove distinctness on the grounds of GROUP BY, unless all of them are within grouping columns. It still seems like detecting the second case is harder than it's worth, but we can trivially handle the first case, with little more than some code rearrangement. The other problem is that the output rowcount of the sub-select (ie, of the HashAggregate) is being estimated as though the SRF weren't there. This turns out to be because estimate_num_groups() doesn't consider the possibility of SRFs in the grouping columns. It never has, but in 9.6 and before the problem was masked by the fact that grouping_planner scaled up the result rowcount by tlist_returns_set_rows() *after* performing grouping. Now we're effectively doing that in the other order, which is more correct, but that means estimate_num_groups() has to apply some sort of adjustment. I suggest that it just multiply its old estimate by the maximum of the SRF expansion counts. That's likely to be an overestimate, but it's really hard to do better without specific knowledge of the individual SRF's behavior. In short, I propose the attached fixes. I've checked this and it seems to fix Dmitry's original problem according to the test case he sent off-list. regards, tom lane diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 5b0da14..5783f90 100644 *** a/src/backend/optimizer/plan/analyzejoins.c --- b/src/backend/optimizer/plan/analyzejoins.c *************** rel_is_distinct_for(PlannerInfo *root, R *** 744,751 **** bool query_supports_distinctness(Query *query) { ! /* we don't cope with SRFs, see comment below */ ! if (query->hasTargetSRFs) return false; /* check for features we can prove distinctness with */ --- 744,751 ---- bool query_supports_distinctness(Query *query) { ! /* SRFs break distinctness except with DISTINCT, see below */ ! if (query->hasTargetSRFs && query->distinctClause == NIL) return false; /* check for features we can prove distinctness with */ *************** query_is_distinct_for(Query *query, List *** 787,806 **** Assert(list_length(colnos) == list_length(opids)); /* - * A set-returning function in the query's targetlist can result in - * returning duplicate rows, if the SRF is evaluated after the - * de-duplication step; so we play it safe and say "no" if there are any - * SRFs. (We could be certain that it's okay if SRFs appear only in the - * specified columns, since those must be evaluated before de-duplication; - * but it doesn't presently seem worth the complication to check that.) - */ - if (query->hasTargetSRFs) - return false; - - /* * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the * columns in the DISTINCT clause appear in colnos and operator semantics ! * match. */ if (query->distinctClause) { --- 787,796 ---- Assert(list_length(colnos) == list_length(opids)); /* * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the * columns in the DISTINCT clause appear in colnos and operator semantics ! * match. This is true even if there are SRFs in the DISTINCT columns or ! * elsewhere in the tlist. */ if (query->distinctClause) { *************** query_is_distinct_for(Query *query, List *** 820,825 **** --- 810,825 ---- } /* + * Otherwise, a set-returning function in the query's targetlist can + * result in returning duplicate rows, despite any grouping that might + * occur before tlist evaluation. (If all tlist SRFs are within GROUP BY + * columns, it would be safe because they'd be expanded before grouping. + * But it doesn't currently seem worth the effort to check for that.) + */ + if (query->hasTargetSRFs) + return false; + + /* * Similarly, GROUP BY without GROUPING SETS guarantees uniqueness if all * the grouped columns appear in colnos and operator semantics match. */ diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 4bbb4a8..edff6da 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** estimate_num_groups(PlannerInfo *root, L *** 3361,3366 **** --- 3361,3367 ---- List **pgset) { List *varinfos = NIL; + double srf_multiplier = 1.0; double numdistinct; ListCell *l; int i; *************** estimate_num_groups(PlannerInfo *root, L *** 3394,3399 **** --- 3395,3401 ---- foreach(l, groupExprs) { Node *groupexpr = (Node *) lfirst(l); + double this_srf_multiplier; VariableStatData vardata; List *varshere; ListCell *l2; *************** estimate_num_groups(PlannerInfo *root, L *** 3402,3407 **** --- 3404,3424 ---- if (pgset && !list_member_int(*pgset, i++)) continue; + /* + * Set-returning functions in grouping columns are a bit problematic. + * The code below will effectively ignore their SRF nature and come up + * with a numdistinct estimate as though they were scalar functions. + * We compensate by scaling up the end result by the largest SRF + * rowcount estimate. (This will be an overestimate if the SRF + * produces multiple copies of any output value, but it seems best to + * assume the SRF's outputs are distinct. In any case, it's probably + * pointless to worry too much about this without much better + * estimates for SRF output rowcounts than we have today.) + */ + this_srf_multiplier = expression_returns_set_rows(groupexpr); + if (srf_multiplier < this_srf_multiplier) + srf_multiplier = this_srf_multiplier; + /* Short-circuit for expressions returning boolean */ if (exprType(groupexpr) == BOOLOID) { *************** estimate_num_groups(PlannerInfo *root, L *** 3467,3475 **** --- 3484,3498 ---- */ if (varinfos == NIL) { + /* Apply SRF multiplier as we would do in the long path */ + numdistinct *= srf_multiplier; + /* Round off */ + numdistinct = ceil(numdistinct); /* Guard against out-of-range answers */ if (numdistinct > input_rows) numdistinct = input_rows; + if (numdistinct < 1.0) + numdistinct = 1.0; return numdistinct; } *************** estimate_num_groups(PlannerInfo *root, L *** 3638,3643 **** --- 3661,3670 ---- varinfos = newvarinfos; } while (varinfos != NIL); + /* Now we can account for the effects of any SRFs */ + numdistinct *= srf_multiplier; + + /* Round off */ numdistinct = ceil(numdistinct); /* Guard against out-of-range answers */
Dmitry Shalashov <skaurus@gmail.com> writes: > Turns out we had not 9.6 but 9.5. I'd managed to reproduce the weird planner behavior locally in the regression database: regression=# create table foo (f1 int[], f2 int); CREATE TABLE regression=# explain select * from tenk1 where unique2 in (select distinct unnest(f1) from foo where f2=1); QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=30.85..80.50 rows=6 width=244) -> HashAggregate (cost=30.57..30.63 rows=6 width=4) Group Key: (unnest(foo.f1)) -> HashAggregate (cost=30.42..30.49 rows=6 width=4) Group Key: unnest(foo.f1) -> ProjectSet (cost=0.00..28.92 rows=600 width=4) -> Seq Scan on foo (cost=0.00..25.88 rows=6 width=32) Filter: (f2 = 1) -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..8.30 rows=1 width=244) Index Cond: (unique2 = (unnest(foo.f1))) (10 rows) Digging into it, the reason for the duplicate HashAggregate step was that query_supports_distinctness() punted on SRFs-in-the-targetlist, basically on the argument that it wasn't worth extra work to handle that case. Thinking a bit harder, it seems to me that the correct analysis is: 1. If we are proving distinctness on the grounds of a DISTINCT clause, then it doesn't matter whether there are any SRFs, because DISTINCT removes duplicates after tlist SRF expansion. 2. But tlist SRFs break the ability to prove distinctness on the grounds of GROUP BY, unless all of them are within grouping columns. It still seems like detecting the second case is harder than it's worth, but we can trivially handle the first case, with little more than some code rearrangement. The other problem is that the output rowcount of the sub-select (ie, of the HashAggregate) is being estimated as though the SRF weren't there. This turns out to be because estimate_num_groups() doesn't consider the possibility of SRFs in the grouping columns. It never has, but in 9.6 and before the problem was masked by the fact that grouping_planner scaled up the result rowcount by tlist_returns_set_rows() *after* performing grouping. Now we're effectively doing that in the other order, which is more correct, but that means estimate_num_groups() has to apply some sort of adjustment. I suggest that it just multiply its old estimate by the maximum of the SRF expansion counts. That's likely to be an overestimate, but it's really hard to do better without specific knowledge of the individual SRF's behavior. In short, I propose the attached fixes. I've checked this and it seems to fix Dmitry's original problem according to the test case he sent off-list. regards, tom lane diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 5b0da14..5783f90 100644 *** a/src/backend/optimizer/plan/analyzejoins.c --- b/src/backend/optimizer/plan/analyzejoins.c *************** rel_is_distinct_for(PlannerInfo *root, R *** 744,751 **** bool query_supports_distinctness(Query *query) { ! /* we don't cope with SRFs, see comment below */ ! if (query->hasTargetSRFs) return false; /* check for features we can prove distinctness with */ --- 744,751 ---- bool query_supports_distinctness(Query *query) { ! /* SRFs break distinctness except with DISTINCT, see below */ ! if (query->hasTargetSRFs && query->distinctClause == NIL) return false; /* check for features we can prove distinctness with */ *************** query_is_distinct_for(Query *query, List *** 787,806 **** Assert(list_length(colnos) == list_length(opids)); /* - * A set-returning function in the query's targetlist can result in - * returning duplicate rows, if the SRF is evaluated after the - * de-duplication step; so we play it safe and say "no" if there are any - * SRFs. (We could be certain that it's okay if SRFs appear only in the - * specified columns, since those must be evaluated before de-duplication; - * but it doesn't presently seem worth the complication to check that.) - */ - if (query->hasTargetSRFs) - return false; - - /* * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the * columns in the DISTINCT clause appear in colnos and operator semantics ! * match. */ if (query->distinctClause) { --- 787,796 ---- Assert(list_length(colnos) == list_length(opids)); /* * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the * columns in the DISTINCT clause appear in colnos and operator semantics ! * match. This is true even if there are SRFs in the DISTINCT columns or ! * elsewhere in the tlist. */ if (query->distinctClause) { *************** query_is_distinct_for(Query *query, List *** 820,825 **** --- 810,825 ---- } /* + * Otherwise, a set-returning function in the query's targetlist can + * result in returning duplicate rows, despite any grouping that might + * occur before tlist evaluation. (If all tlist SRFs are within GROUP BY + * columns, it would be safe because they'd be expanded before grouping. + * But it doesn't currently seem worth the effort to check for that.) + */ + if (query->hasTargetSRFs) + return false; + + /* * Similarly, GROUP BY without GROUPING SETS guarantees uniqueness if all * the grouped columns appear in colnos and operator semantics match. */ diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 4bbb4a8..edff6da 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** estimate_num_groups(PlannerInfo *root, L *** 3361,3366 **** --- 3361,3367 ---- List **pgset) { List *varinfos = NIL; + double srf_multiplier = 1.0; double numdistinct; ListCell *l; int i; *************** estimate_num_groups(PlannerInfo *root, L *** 3394,3399 **** --- 3395,3401 ---- foreach(l, groupExprs) { Node *groupexpr = (Node *) lfirst(l); + double this_srf_multiplier; VariableStatData vardata; List *varshere; ListCell *l2; *************** estimate_num_groups(PlannerInfo *root, L *** 3402,3407 **** --- 3404,3424 ---- if (pgset && !list_member_int(*pgset, i++)) continue; + /* + * Set-returning functions in grouping columns are a bit problematic. + * The code below will effectively ignore their SRF nature and come up + * with a numdistinct estimate as though they were scalar functions. + * We compensate by scaling up the end result by the largest SRF + * rowcount estimate. (This will be an overestimate if the SRF + * produces multiple copies of any output value, but it seems best to + * assume the SRF's outputs are distinct. In any case, it's probably + * pointless to worry too much about this without much better + * estimates for SRF output rowcounts than we have today.) + */ + this_srf_multiplier = expression_returns_set_rows(groupexpr); + if (srf_multiplier < this_srf_multiplier) + srf_multiplier = this_srf_multiplier; + /* Short-circuit for expressions returning boolean */ if (exprType(groupexpr) == BOOLOID) { *************** estimate_num_groups(PlannerInfo *root, L *** 3467,3475 **** --- 3484,3498 ---- */ if (varinfos == NIL) { + /* Apply SRF multiplier as we would do in the long path */ + numdistinct *= srf_multiplier; + /* Round off */ + numdistinct = ceil(numdistinct); /* Guard against out-of-range answers */ if (numdistinct > input_rows) numdistinct = input_rows; + if (numdistinct < 1.0) + numdistinct = 1.0; return numdistinct; } *************** estimate_num_groups(PlannerInfo *root, L *** 3638,3643 **** --- 3661,3670 ---- varinfos = newvarinfos; } while (varinfos != NIL); + /* Now we can account for the effects of any SRFs */ + numdistinct *= srf_multiplier; + + /* Round off */ numdistinct = ceil(numdistinct); /* Guard against out-of-range answers */
Dmitry Shalashov <skaurus@gmail.com> writes: > We tried to apply the patch on 10.1 source, but something is wrong it seems: > patch -p1 < ../1.patch > (Stripping trailing CRs from patch; use --binary to disable.) > patching file src/backend/optimizer/plan/analyzejoins.c > (Stripping trailing CRs from patch; use --binary to disable.) > patching file src/backend/utils/adt/selfuncs.c > Hunk #1 succeeded at 3270 (offset -91 lines). > Hunk #2 succeeded at 3304 (offset -91 lines). > Hunk #3 succeeded at 3313 (offset -91 lines). > Hunk #4 succeeded at 3393 (offset -91 lines). > patch unexpectedly ends in middle of line > Hunk #5 succeeded at 3570 with fuzz 1 (offset -91 lines). The line number offsets are expected when applying to v10, but it looks like you failed to transfer the attachment cleanly ... there were certainly not CRs in it when I mailed it. The output on v10 should just look like patching file src/backend/optimizer/plan/analyzejoins.c patching file src/backend/utils/adt/selfuncs.c Hunk #1 succeeded at 3270 (offset -91 lines). Hunk #2 succeeded at 3304 (offset -91 lines). Hunk #3 succeeded at 3313 (offset -91 lines). Hunk #4 succeeded at 3393 (offset -91 lines). Hunk #5 succeeded at 3570 (offset -91 lines). regards, tom lane
Dmitry Shalashov <skaurus@gmail.com> writes: > It looks that patch helps us. Tom, thank you! > I'm still testing it though, just in case. Excellent, please follow up if you learn anything new. > What are PostgreSQL schedule on releasing fixes like this? Can I expect > that it will be in 10.2 and when can I expect 10.2, approximately of course? I haven't pushed it to the git repo yet, but I will shortly, and then it will be in the next minor release. That will probably be in early February, per our release policy: https://www.postgresql.org/developer/roadmap/ regards, tom lane
On Sat, Nov 25, 2017 at 8:54 PM, Dmitry Shalashov <skaurus@gmail.com> wrote: > Is it completely safe to use manually patched version in production? Patching upstream PostgreSQL to fix a critical bug is something that can of course be done. And to reach a state where you think something is safe to use in production first be sure to test it thoroughly on a stage instance. The author is also working on Postgres for 20 years, so this gives some insurance. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Sat, Nov 25, 2017 at 8:54 PM, Dmitry Shalashov <skaurus@gmail.com> wrote: >> Is it completely safe to use manually patched version in production? > Patching upstream PostgreSQL to fix a critical bug is something that > can of course be done. And to reach a state where you think something > is safe to use in production first be sure to test it thoroughly on a > stage instance. The author is also working on Postgres for 20 years, > so this gives some insurance. It's not like there's some magic dust that we sprinkle on the code at release time ;-). If there's a problem with that patch, it's much more likely that you'd discover it through field testing than that we would notice it during development (we missed the original problem after all). So you can do that field testing now, or after 10.2 comes out. The former seems preferable, if you are comfortable with building a patched copy at all. I don't know what your normal source of Postgres executables is, but all the common packaging technologies make it pretty easy to rebuild a package from source with patch(es) added. Modifying your vendor's SRPM (or equivalent concept if you're not on Red Hat) is a good skill to have. regards, tom lane