Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Leonardo Francalanci
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id 1383661732562-5777009.post@n5.nabble.com
Whole thread Raw
In response to Re: Fast insertion indexes: why no developments  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Fast insertion indexes: why no developments  (Claudio Freire <klaussfreire@gmail.com>)
Re: Fast insertion indexes: why no developments  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Claudio Freire wrote
> Min-max indexes always require a sequential scan of the min-max index
> itself when querying. 

I'm worried about the number of heap pages that will be scanned.
My understanding is that given the random input, the index will
not be selective enough, and will end up requiring a lot of page 
scanning to get the relevant rows.

That is: the "selectivity" of the min/max values will be very low, given
the high cardinality and randomness of the input; I'm afraid that, in
most "page ranges", the min will be lower than the queried ones,
and the max higher, resulting in lots of heap page scans.

Quick test:
a table with 1M rows, with random values from 1 to 10000000:
create table t as select generate_series(1, 1000000) as i, trunc(random() *
10000000) as n;

suppose a page range contains 100 rows (but my understanding is that minmax
index will use a much bigger row count...), let's find how many "page
ranges"
should be scanned to find a series of 50 values (again, random in the
1-10000000 range):

with cte as (select min(n) as minn, max(n) as maxn, i/100 from t group by i/100),inp as (select generate_series(1, 50)
iinp,trunc(random() * 10000000) as
 
s)select count(*) from inp  left outer join cte on inp.s between minn and maxn group by iinp


I get > 9000 pages for 49 values out of 50... which means scanning 90% of
the table.

Either my sql is not correct (likely), or my understanding of the minmax
index is
not correct (even more likely), or the minmax index is not usable in a
random inputs
scenario.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777009.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: logical column order and physical column order
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] configure: add git describe output to PG_VERSION when building a git tree