"micro bucket sort" ... - Mailing list pgsql-hackers
From | Hans-Jürgen Schönig |
---|---|
Subject | "micro bucket sort" ... |
Date | |
Msg-id | F90E257C-AD3B-4928-A90E-1DE67D432741@cybertec.at Whole thread Raw |
Responses |
Re: "micro bucket sort" ...
Re: "micro bucket sort" ... Re: "micro bucket sort" ... |
List | pgsql-hackers |
hello all ... i am bugged with a small issue which is basically like this ... test=# create table t_test as select x, x % 5 as y from generate_series(1, 1000000) AS x; SELECT test=# create index idx_aaaaa on t_test (x) ; CREATE INDEX test=# ANALYZE ; ANALYZE test=# explain analyze select * from t_test order by x; QUERYPLAN ------------------------------------------------------------------------------------------------------------------------------------Index Scanusing idx_aaaaa on t_test (cost=0.00..30408.36 rows=1000000 width=8) (actual time=0.057..311.832 rows=1000000 loops=1)Totalruntime: 392.943 ms (2 rows) we know that we get sorted output from the index and thus we do the index traversal here ... if you add a condition to the sorting you will naturally get a sort in postgres because y is clearly now known to be sorted. test=# explain analyze select * from t_test order by x, y; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------Sort (cost=141431.84..143931.84 rows=1000000 width=8) (actual time=1086.014..1271.257 rows=1000000 loops=1) Sort Key: x, y SortMethod: external sort Disk: 17608kB -> Seq Scan on t_test (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.024..143.474rows=1000000 loops=1)Total runtime: 1351.848 ms (5 rows) same with limit ... test=# explain analyze select * from t_test order by x, y limit 20; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------Limit (cost=41034.64..41034.69 rows=20 width=8) (actual time=317.939..317.943 rows=20 loops=1) -> Sort (cost=41034.64..43534.64rows=1000000 width=8) (actual time=317.934..317.936 rows=20 loops=1) Sort Key: x, y Sort Method: top-N heapsort Memory: 26kB -> Seq Scan on t_test (cost=0.00..14425.00 rows=1000000 width=8) (actualtime=0.019..144.109 rows=1000000 loops=1)Total runtime: 317.995 ms (6 rows) now, the problem is: i cannot easily create additional indexes as i have too many possible "second" conditions here. what makes it even more funny: i don't have enough space to do the resort of the entire thing (X TB). so, a more expensive index traversal is my only option. my question is: is there already a concept out there to make this work or does anybody know of a patch out there addressingan issue like that? some idea is heavily appreciated. it seems our sort key infrastructure is not enough for this. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
pgsql-hackers by date: