Home > mailing lists

Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From	Leonardo Francalanci
Subject	Re: Fast insertion indexes: why no developments
Date	November 5, 2013 14:28:58
Msg-id	1383661732562-5777009.post@n5.nabble.com Whole thread Raw
In response to	Re: Fast insertion indexes: why no developments (Claudio Freire <klaussfreire@gmail.com>)
Responses	Re: Fast insertion indexes: why no developments Re: Fast insertion indexes: why no developments
List	pgsql-hackers

Tree view

Claudio Freire wrote
> Min-max indexes always require a sequential scan of the min-max index
> itself when querying. 

I'm worried about the number of heap pages that will be scanned.
My understanding is that given the random input, the index will
not be selective enough, and will end up requiring a lot of page 
scanning to get the relevant rows.

That is: the "selectivity" of the min/max values will be very low, given
the high cardinality and randomness of the input; I'm afraid that, in
most "page ranges", the min will be lower than the queried ones,
and the max higher, resulting in lots of heap page scans.

Quick test:
a table with 1M rows, with random values from 1 to 10000000:
create table t as select generate_series(1, 1000000) as i, trunc(random() *
10000000) as n;

suppose a page range contains 100 rows (but my understanding is that minmax
index will use a much bigger row count...), let's find how many "page
ranges"
should be scanned to find a series of 50 values (again, random in the
1-10000000 range):

with cte as (select min(n) as minn, max(n) as maxn, i/100 from t group by i/100),inp as (select generate_series(1, 50)
iinp,trunc(random() * 10000000) as
 
s)select count(*) from inp  left outer join cte on inp.s between minn and maxn group by iinp


I get > 9000 pages for 49 values out of 50... which means scanning 90% of
the table.

Either my sql is not correct (likely), or my understanding of the minmax
index is
not correct (even more likely), or the minmax index is not usable in a
random inputs
scenario.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777009.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

pgsql-hackers by date:

From: Alvaro Herrera
Date: 05 November 2013, 14:22:20
Subject: Re: logical column order and physical column order

From: Tom Lane
Date: 05 November 2013, 14:35:52
Subject: Re: [PATCH] configure: add git describe output to PG_VERSION when building a git tree

Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

Previous

Next