Claudio Freire wrote
> Min-max indexes always require a sequential scan of the min-max index
> itself when querying.
I'm worried about the number of heap pages that will be scanned.
My understanding is that given the random input, the index will
not be selective enough, and will end up requiring a lot of page
scanning to get the relevant rows.
That is: the "selectivity" of the min/max values will be very low, given
the high cardinality and randomness of the input; I'm afraid that, in
most "page ranges", the min will be lower than the queried ones,
and the max higher, resulting in lots of heap page scans.
Quick test:
a table with 1M rows, with random values from 1 to 10000000:
create table t as select generate_series(1, 1000000) as i, trunc(random() *
10000000) as n;
suppose a page range contains 100 rows (but my understanding is that minmax
index will use a much bigger row count...), let's find how many "page
ranges"
should be scanned to find a series of 50 values (again, random in the
1-10000000 range):
with cte as (select min(n) as minn, max(n) as maxn, i/100 from t group by i/100),inp as (select generate_series(1, 50)
iinp,trunc(random() * 10000000) as
s)select count(*) from inp left outer join cte on inp.s between minn and maxn group by iinp
I get > 9000 pages for 49 values out of 50... which means scanning 90% of
the table.
Either my sql is not correct (likely), or my understanding of the minmax
index is
not correct (even more likely), or the minmax index is not usable in a
random inputs
scenario.
--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777009.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.