Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions - Mailing list pgsql-general

From Dimitrios Apostolou
Subject Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Date
Msg-id 69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net
Whole thread Raw
In response to SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions  (Dimitrios Apostolou <jimis@gmx.net>)
Responses Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
List pgsql-general
On Fri, 10 May 2024, Dimitrios Apostolou wrote:

> I noticed that the following query is very very slow (too long to wait for it
> to finish):
>
> SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;

Update: even the simplest SELECT DISTINCT query shows similar behaviour:

EXPLAIN
SELECT DISTINCT workitem_n FROM test_runs_raw LIMIT 10;

  Limit  (cost=724518979.52..724518979.92 rows=10 width=4)
    ->  Unique  (cost=724518979.52..724518987.52 rows=200 width=4)
          ->  Sort  (cost=724518979.52..724518983.52 rows=1600 width=4)               Sort Key:
test_runs_raw.workitem_n
                ->  Gather  (cost=724518732.37..724518894.37 rows=1600 width=4)
                      Workers Planned: 4
                      ->  HashAggregate  (cost=724517732.37..724517734.37 rows=200 width=4)
                            Group Key: test_runs_raw.workitem_n
                            ->  Parallel Append  (cost=0.00..704131546.90 rows=8154474186 width=4)
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max9600k_pkey on
test_runs_raw__part_max9600ktest_runs_raw_480  (cost=0.57..1429238.50 rows=16811660 width=4) 
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max10140k_pkey on
test_runs_raw__part_max10140ktest_runs_raw_507  (cost=0.57..1081827.27 rows=12896836 width=4) 
                                  ->  Parallel Seq Scan on test_runs_raw__part_max9500k test_runs_raw_475
(cost=0.00..2717185.06rows=32060806 width=4) 
                                  ->  Parallel Seq Scan on test_runs_raw__part_max11180k test_runs_raw_559
(cost=0.00..2610814.95rows=30806095 width=4) 


It also takes ages to return, so I have to interrupt it.

I believe it should exit early, as soon as it finds 10 distinct values
(which should be rather easy even with parallel seqscans, given the
pattern followed when inserting the data).


Thanks,
Dimitris



pgsql-general by date:

Previous
From: Dimitrios Apostolou
Date:
Subject: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Next
From: Dimitrios Apostolou
Date:
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions