Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE - Mailing list pgsql-bugs

From David Rowley
Subject Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
Date
Msg-id CAApHDvqBTtdtE_w__HeOvzen6UqKUVzcNpTm74_ioJOfdOJ50A@mail.gmail.com
Whole thread Raw
In response to BUG #18365: Inconsistent cost function between materialized and non-materialized CTE  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
On Tue, 27 Feb 2024 at 02:44, PG Bug reporting form
<noreply@postgresql.org> wrote:
> I have run into an issue where a query with a CTE performs a sequential scan
> on a large table (42M rows, ~1min on our prod), while explicitly
> materializing the CTE simply performs an index only scan (~2s). When I set
> `enable_seqscan=off` and compare the costs, it turns out the query planner
> grossly overestimates the cost of a Nested Loop compared to the (as far as I
> can tell) exact same Nested Loop when the CTE is materialized. I know that
> the query planner acts on heuristics, so this might not be considered a bug,
> but the cost values are so wildly different for what should be essentially
> the same operation, that it might warrant further investigation.

> Is this a bug?

I don't see any bugs. What seems to be going on is that the
materialized CTE underestimates the number of rows after making the
CTE distinct so the planner can use a join rather than a semi-join.

That's seen in:

->  HashAggregate  (cost=2576.28..2679.33 rows=10305 width=16) (actual
time=765.793..893.761 rows=166060 loops=1)

Due to that row underestimation, the planner thinks a parameterized
nested loop is the best way to join the two relations as it thinks the
stock_history_date_product_id_idx index only needs to be looked up
10305 times rather than 166060 times.  With the non-materialized
version, the planner estimates 103051.  That causes it to think the
Nested Loop -> index scan on stock_history_date_product_id_idx is too
expensive and hash join with a seq scan is cheaper   As it turns out,
looking up the index *is* faster, even when that's done 166060 times.

The two parameters that drive the planner's decision on this are
random_page_cost, (you might want to consider lowering that) and
effective_cache_size.  A rough guideline for effective_cache_size is
75% of the machine's RAM. However, it much depends on your
shared_buffer setting and what type of other things run concurrently
on the machine.  Some people have found lowering random_page_cost as
low as 1.1 helps. The default is 4.0, which has remained since the HDD
days.  For SSDs, it's often too large.

I've attached the EXPLAINs I trimmed down and compared to reach this conclusion.

David

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Next
From: "Haifang Wang (Centific Technologies Inc)"
Date:
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607