Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id CAGTBQpZg7Dpav4dKsL3CKYJ8Ur1PfZPuikapSntrSWnsYkCRBw@mail.gmail.com
Whole thread Raw
In response to Re: Fast insertion indexes: why no developments  (Leonardo Francalanci <m_lists@yahoo.it>)
Responses Re: Fast insertion indexes: why no developments  (Leonardo Francalanci <m_lists@yahoo.it>)
List pgsql-hackers
On Tue, Nov 5, 2013 at 6:57 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
> Simon Riggs wrote
>> Minmax indexes seem to surprise many people, so broad generalisations
>> aren't likely to be useful.
>>
>> I think the best thing to do is to publish some SQL requests that
>> demonstrate in detail what you are trying to achieve and test them
>> against minmax indexes. That way we can discuss what does work and
>> what doesn't work well enough yet.
>
> While I do believe in testing (since "In theory there is no difference
> between theory and practice. In practice there is"), I would like to know
> the "properties" of the minmax index before trying it.
> What is it supposed to be good at? What are the pros/cons? We can't ask all
> the users to just "try" the index and see if it works for them.
> As I said, my understanding is that is very efficient (both in insertion and
> in searching) when data is somehow ordered in the table. But maybe I got it
> wrong...

Well, for one, random inserts (with random data) on a min-max index
have a roughly 1/N chance of requiring a write to disk, and (N-1)/N
chance of being completely free (or maybe a read to verify a write
isn't needed, but that'll probably hit shared buffers), where N is the
number of tuples per page. Per index page that is.

Of course, non-random workloads are a different matter.

Min-max indexes always require a sequential scan of the min-max index
itself when querying. That works when you intend to query enough
tuples to make up the cost (that is, more tuples than M * N *
random_cost / seq_cost), where M is the number of pages in the index.
Well, actually, since they result in better io patterns as well, the
tradeoff is probably a little bit more tricky than that, in favor of
min-max indexes.

Min-max indexes tend to be very compact, so M is usually low.



pgsql-hackers by date:

Previous
From: Albe Laurenz
Date:
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Next
From: Sameer Thakur
Date:
Subject: Re: pg_stat_statements: calls under-estimation propagation