BUG #18365: Inconsistent cost function between materialized and non-materialized CTE - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
Date
Msg-id 18365-ea9982ad685d4cb9@postgresql.org
Whole thread Raw
Responses Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18365
Logged by:          Sjors Gielen
Email address:      postgresql@sjorsgielen.nl
PostgreSQL version: 16.2
Operating system:   Linux (native amd64 and Docker for Mac arm64)
Description:

Dear all,

I have run into an issue where a query with a CTE performs a sequential scan
on a large table (42M rows, ~1min on our prod), while explicitly
materializing the CTE simply performs an index only scan (~2s). When I set
`enable_seqscan=off` and compare the costs, it turns out the query planner
grossly overestimates the cost of a Nested Loop compared to the (as far as I
can tell) exact same Nested Loop when the CTE is materialized. I know that
the query planner acts on heuristics, so this might not be considered a bug,
but the cost values are so wildly different for what should be essentially
the same operation, that it might warrant further investigation.

I can reproduce the issue on PostgreSQL 15.2, 15.6, 16.2 and 17devel as of
20240223.1636.gd360e3c.

I have reported the issue, the schema, the query and the query plan outputs
at

<https://dba.stackexchange.com/questions/335570/why-is-postgresql-performing-a-sequential-scan-except-when-my-cte-is-materializ>.
Also, in order to facilitate reproduction, I have uploaded two reproduction
scripts and a 1 GB sample of my dataset which still reproduces the issue at
<https://sjorsgielen.nl/psql-bug-report.tar.gz>. `bring-up.sh` runs a given
version of the PostgreSQL Docker image listening on local port 15432.
`run-test.sh` creates the `testing` schema, creates the DDL, and imports the
data from data.sql.bz2. It should be easy to run `run-test.sh` against any
Postgres server by changing the DSN at the top of the script. The `testing`
schema is expected not to exist.

By running `./bring-up.sh 16.2 && ./run-test.sh`, some minutes later, you
should see the output of three `EXPLAIN (ANALYZE, BUFFERS, VERBOSE)`. I
would expect the cost of the Nested Loop in the first case (materialized
view), which is `2246..76657` (around 75k), to be more or less the same as
that of the Nested Loop in the third case (sequential scan disabled), which
is 230310..797581 (~570k). Because this cost is much higher, even surpassing
the Hash Join cost of ~290k, PostgreSQL decides to use a sequential scan in
the second case (non-materialized view) which ends up causing a slowdown of
~6.5 times.

Is this a bug?

Thank you,
Sjors Gielen


pgsql-bugs by date:

Previous
From: Cees van Zeeland
Date:
Subject: Re: BUG #18362: unaccent rules and Old Greek text
Next
From: Tom Lane
Date:
Subject: Re: BUG #18364: psql execution error: Segmentation fault