Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions - Mailing list pgsql-general

From Dimitrios Apostolou
Subject Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Date
Msg-id 3efce60b-48c0-160d-0444-474b02f76739@gmx.net
Whole thread Raw
In response to Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
List pgsql-general
On Sat, 11 May 2024, David Rowley wrote:

> On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> Indeed that's an awful estimate, the table has more than 1M of unique
>> values in that column. Looking into pg_stat_user_tables, I can't see the
>> partitions having been vacuum'd or analyzed at all. I think they should
>> have been auto-analyzed, since they get a ton of INSERTs
>> (no deletes/updates though) and I have the default autovacuum settings.
>> Could it be that autovacuum starts, but never
>> finishes? I can't find something in the logs.
>
> It's not the partitions getting analyzed you need to worry about for
> an ndistinct estimate on the partitioned table. It's auto-analyze or
> ANALYZE on the partitioned table itself that you should care about.
>
> If you look at [1], it says "Tuples changed in partitions and
> inheritance children do not trigger analyze on the parent table."

Thanks

>
>> In any case, even after the planner decides to execute the terrible plan
>> with the parallel seqscans, why doesn't it finish right when it finds 10
>> distinct values?
>
> It will. It's just that Sorting requires fetching everything from its subnode.

Isn't it plain wrong to have a sort step in the plan than? The different
partitions contain different value ranges with no overlap, and the last
query I posted doesn't even contain an ORDER BY clause, just a DISTINCT
clause on an indexed column.

Even with bad estimates, even with seq scan instead of index scan, the
plan should be such that it concludes all parallel work as soon as it
finds the 10 distinct values. And this is actually achieved if I disable
parallel plans. Could it be a bug in the parallel plan generation?


Dimitris




pgsql-general by date:

Previous
From: Dimitrios Apostolou
Date:
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Next
From: Dimitrios Apostolou
Date:
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions