Re: Minmax indexes - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Minmax indexes |
Date | |
Msg-id | CAA4eK1J3eMAoRTk2CVoJ_csHB8=4xcM823ptJ6LQ9bj_uhEojA@mail.gmail.com Whole thread Raw |
In response to | Minmax indexes (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Minmax indexes
|
List | pgsql-hackers |
On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Hi, > > Here's a reviewable version of what I've dubbed Minmax indexes. Some > people said they would like to use some other name for this feature, but > I have yet to hear usable ideas, so for now I will keep calling them > this way. I'm open to proposals, but if you pick something that cannot > be abbreviated "mm" I might have you prepare a rebased version which > renames the files and structs. > > The implementation here has been simplified from what I originally > proposed at 20130614222805.GZ5491@eldon.alvh.no-ip.org -- in particular, > I noticed that there's no need to involve aggregate functions at all; we > can just use inequality operators. So the pg_amproc entries are gone; > only the pg_amop entries are necessary. > > I've somewhat punted on the question of doing resummarization separately > from vacuuming. Right now, resummarization (as well as other necessary > index cleanup) takes place in amvacuumcleanup. This is not optimal; I > have stated elsewhere that I'd like to create separate maintenance > actions that can be carried out by autovacuum. That would be useful > both for Minmax indexes and GIN indexes (pending insertion list); maybe > others. That's not part of this patch, however. > > The design of this stuff is in the file "minmax-proposal" at the top of > the tree. That file is up to date, though it still contains some open > questions that were present in the original proposal. (I have not fixed > some bogosities pointed out by Noah, for instance. I will do that > shortly.) In a final version, that file would be applied as > src/backend/access/minmax/README, most likely. > > One area on which I needed to modify core code is IndexBuildHeapScan. I > needed a version that was able to scan only a certain range of pages, > not the entire table, so I introduced a new IndexBuildHeapRangeScan, and > added a quick "heap_scansetlimits" function. I haven't tested that this > works outside of the HeapRangeScan thingy, so it's probably completely > bogus; I'm open to suggestions if people think this should be > implemented differently. In any case, keeping that implementation > together with vanilla IndexBuildHeapScan makes a lot of sense. > > One thing still to tackle is when to mark ranges as unsummarized. Right > now, any new tuple on a page range would cause a new index entry to be > created and a new revmap update. This would cause huge index bloat if, > say, a page is emptied and vacuumed and filled with new tuples with > increasing values outside the original range; each new tuple would > create a new index tuple. I have two ideas about this (1. mark range as > unsummarized if 3rd time we touch the same page range; Why only at 3rd time? Doesn't it need to be precise, like if someone inserts a row having value greater than max value of corresponding index tuple, then that index tuple's corresponding max value needs to be updated and I think its updated with the help of validity map. For example: considering we need to store below info for each index tuple: In each index tuple (corresponding to onepage range), we store: - first block this tuple applies to - last block this tuple applies to - for each indexedcolumn: * min() value across all tuples in the range * max() value across all tuples in the range Assume first and last block for index tuple is same (assume block no. 'x') and min value is 5 and max is 10. Now user insert/update value in block 'x' such that max value of index col. is 11, if we don't update corresponding index tuple or at least invalidate it, won't it lead to wrong results? > 2. vacuum the > affected index page if it's full, so we can maintain the index always up > to date without causing unduly bloat), but I haven't implemented > anything yet. > > The "amcostestimate" routine is completely bogus; right now it returns > constant 0, meaning the index is always chosen if it exists. I think for first version, you might want to keep things simple, but there should be some way for optimizer to select this index. So rather than choose if it is present, we can make optimizerchoose when some-one says set enable_minmax index to true. How about keeping this up-to-date during foreground operations. Vacuum/Maintainer task maintaining things usually have problems of bloat and then we need optimize/workaround issues. Lot of people have raised this or similar point previously and what I read you are of opinion that it seems to be slow. I really don't think that it can be so slow that adding so much handling to get it up-to-date by some maintainer task is useful. Currently there are systems like Oracle where index clean-up is mainly done during foreground operation, so this alone cannot be reason for slowness. Comparing the logic with IOS is also not completely right as for IOS, we need to know each tuple's visibility, which is not the case here. Now it can so happen that min and max values are sometimes not right because later the operation is rolled back, but I think such cases will be less and we can find some way to handle such cases may be maintainer task only, but the handling will be quite simpler. On Windows, patch gives below compilation errors: src\backend\access\minmax\mmtuple.c(96): error C2057: expected constant expression src\backend\access\minmax\mmtuple.c(96): error C2466: cannot allocate an array of constant size 0 src\backend\access\minmax\mmtuple.c(96): error C2133: 'values' : unknown size src\backend\access\minmax\mmtuple.c(97):error C2057: expected constant expression src\backend\access\minmax\mmtuple.c(97): error C2466: cannot allocate an array of constant size 0 src\backend\access\minmax\mmtuple.c(97): error C2133: 'nulls' : unknown size src\backend\access\minmax\mmtuple.c(102):error C2057: expected constant expression src\backend\access\minmax\mmtuple.c(102): error C2466: cannot allocate an array of constant size 0 src\backend\access\minmax\mmtuple.c(102): error C2133: 'phony_nullbitmap' : unknown size src\backend\access\minmax\mmtuple.c(110): warning C4034: sizeof returns 0 src\backend\access\minmax\mmtuple.c(246):error C2057: expected constant expression src\backend\access\minmax\mmtuple.c(246): error C2466: cannot allocate an array of constant size 0 src\backend\access\minmax\mmtuple.c(246): error C2133: 'values' : unknown size src\backend\access\minmax\mmtuple.c(247):error C2057: expected constant expression src\backend\access\minmax\mmtuple.c(247): error C2466: cannot allocate an array of constant size 0 src\backend\access\minmax\mmtuple.c(247): error C2133: 'allnulls' : unknown size src\backend\access\minmax\mmtuple.c(248): error C2057: expected constant expression src\backend\access\minmax\mmtuple.c(248): error C2466: cannot allocate an array of constant size 0 src\backend\access\minmax\mmtuple.c(248): error C2133: 'hasnulls' : unknown size With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: