Re: [RFC] Minmax indexes - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: [RFC] Minmax indexes |
Date | |
Msg-id | 20130617202340.GF3537@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: [RFC] Minmax indexes (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: [RFC] Minmax indexes
|
List | pgsql-hackers |
Josh Berkus wrote: > > Value changes in columns that are part of a minmax index, and tuple insertion > > in summarized pages, would invalidate the stored min/max values. To support > > this, each minmax index has a validity map; a range can only be considered in a > > scan if it hasn't been invalidated by such changes (A range "not considered" in > > the scan needs to be returned in whole regardless of the stored min/max values, > > that is, it cannot be pruned per query quals). The validity map is very > > similar to the visibility map in terms of performance characteristics: quick > > enough that it's not contentious, allowing updates and insertions to proceed > > even when data values violate the minmax index conditions. An invalidated > > range can be made valid by re-summarization (see below). > > This begins to sound like these indexes are only useful on append-only > tables. Not that there aren't plenty of those, but ... But what? This is a useful use-case; one that's not served by any other type of index. Sure, you can have btrees over append-only tables, but they are really large and have huge maintainance costs. A smaller lossy index is useful if low-maintenance and if it avoids keeping the cost of scanning the table low enough. These indexes can be considered a sort of partitioning of a large table. Only instead of having to define CHECK (insert_date >= 'a month') for each partition, you just create the index on the insert_date column, and the index will allow a seqscan of the table to skip pages that don't match the query's WHERE clause on that column. > > Re-summarization is relatively expensive, because the complete page range has > > to be scanned. > > Why? Why can't we just update the affected pages in the index? The page range has to be scanned in order to find out the min/max values for the indexed columns on the range; and then, with these data, update the index. > > To avoid this, a table having a minmax index would be > > configured so that inserts only go to the page(s) at the end of the table; this > > avoids frequent invalidation of ranges in the middle of the table. We provide > > a table reloption that tweaks the FSM behavior, so that summarized pages are > > not candidates for insertion. > > We haven't had an index type which modifies table insertion behavior > before, and I'm not keen to start now; imagine having two indexes on the > same table each with their own, conflicting, requirements. This is not a requirement. It merely makes the index more effective. > If we're going to start adding reloptions for specific table behavior, > I'd rather think of all of the optimizations we might have for a > prospective "append-only table" and bundle those, rather than tying it > to whether a certain index exists or not. Eh? Sure, my intention for this reloption is for the user to be able to state their intention for the table, and each feature that has append-only table optimization does its thing. I wasn't thinking in anything automatic. > Also, I hate the name ... Feel free to propose other names; that way I can hate your proposals too (or maybe not). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: