Home > mailing lists

Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions - Mailing list pgsql-general

From	Dimitrios Apostolou
Subject	Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Date	May 13, 2024 12:41:05
Msg-id	3efce60b-48c0-160d-0444-474b02f76739@gmx.net Whole thread Raw
In response to	Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
List	pgsql-general

Tree view

On Sat, 11 May 2024, David Rowley wrote:

> On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> Indeed that's an awful estimate, the table has more than 1M of unique
>> values in that column. Looking into pg_stat_user_tables, I can't see the
>> partitions having been vacuum'd or analyzed at all. I think they should
>> have been auto-analyzed, since they get a ton of INSERTs
>> (no deletes/updates though) and I have the default autovacuum settings.
>> Could it be that autovacuum starts, but never
>> finishes? I can't find something in the logs.
>
> It's not the partitions getting analyzed you need to worry about for
> an ndistinct estimate on the partitioned table. It's auto-analyze or
> ANALYZE on the partitioned table itself that you should care about.
>
> If you look at [1], it says "Tuples changed in partitions and
> inheritance children do not trigger analyze on the parent table."

Thanks

>
>> In any case, even after the planner decides to execute the terrible plan
>> with the parallel seqscans, why doesn't it finish right when it finds 10
>> distinct values?
>
> It will. It's just that Sorting requires fetching everything from its subnode.

Isn't it plain wrong to have a sort step in the plan than? The different
partitions contain different value ranges with no overlap, and the last
query I posted doesn't even contain an ORDER BY clause, just a DISTINCT
clause on an indexed column.

Even with bad estimates, even with seq scan instead of index scan, the
plan should be such that it concludes all parallel work as soon as it
finds the 10 distinct values. And this is actually achieved if I disable
parallel plans. Could it be a bug in the parallel plan generation?

Dimitris

pgsql-general by date:

From: Dimitrios Apostolou
Date: 13 May 2024, 12:27:57
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions

From: Dimitrios Apostolou
Date: 13 May 2024, 12:46:12
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions

Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions - Mailing list pgsql-general

Previous

Next