Re: DISTINCT with btree skip scan - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: DISTINCT with btree skip scan |
Date | |
Msg-id | CA+Tgmobb3uN0xDqTRu7f7WdjGRAXpSFxeAQnvNr=OK5_kC_SSg@mail.gmail.com Whole thread Raw |
In response to | Re: DISTINCT with btree skip scan (Thomas Munro <thomas.munro@enterprisedb.com>) |
List | pgsql-hackers |
On Wed, Nov 23, 2016 at 4:19 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Oct 12, 2016 at 4:19 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Here it is, now split into two parts: one patch >> to add an amskip operation, and another to consider using it to >> implement DISTINCT when it was already has an index only scan path on >> an index that supports skipping. > > Those patches add an amskip operation and then teach the planner and > executor how to do this, given a table foo (a, b) with an index on (a) > or (a, b): > > SELECT DISTINCT foo FROM a > > Index Only Scan for distinct values of (a) using foo_pkey on foo Cool! > In just the right circumstances that is vastly faster than scanning > every tuple and uniquifying. I was thinking that the next step in > this experiment might be to introduce a kind of plan that can handle > queries where the index's leading column is not in the WHERE clause, > building on the above plan, and behaving conceptually a little bit > like a Nested Loop, like so: > > SELECT * FROM foo WHERE b = 42 > > Index Skip Scan > -> Index Only Scan for distinct values of (a) using foo_pkey on foo > -> Index Only Scan using foo_pkey on foo > Index Cond: ((a = outer.a) AND (b = 42)) Blech, that seems really ugly and probably inefficient. > I suppose you could instead have a single node that knows how to > perform the whole scan itself so that you don't finish up searching > for each distinct value in both the inner and outer subplan, but it > occurred to me that building the plan out of Lego bricks like this > might generalise better to more complicated queries. I suppose it > might also parallelise nicely with a Parallel Index Only Scan as the > outer plan. I think putting all the smarts in a single node is likely to be better, faster, and cleaner. This patch has gotten favorable comments from several people, but somebody really needs to REVIEW it. Oh, well, since I'm here... + if (ScanDirectionIsForward(dir)) + { + so->currPos.moreLeft = false; + so->currPos.moreRight = true; + } + else + { + so->currPos.moreLeft = true; + so->currPos.moreRight = false; + } The lack of comments makes it hard for me to understand what the motivation for this is, but I bet it's wrong. Suppose there's a cursor involved here and the user tries to back up. Instead of having a separate amskip operation, maybe there should be a flag attached to a scan indicating whether it should return only distinct results. Otherwise, you're allowing for the possibility that the same scan might sometimes skip and other times not skip, but then it seems hard for the scan to survive cursor operations. Anyway, why would that be useful? + if (ScanDirectionIsForward(dir)) + offnum = OffsetNumberPrev(offnum); + if (!_bt_readpage(scan, dir, offnum)) + { + if (!_bt_steppage(scan, dir)) + { + return false; + } + } As the TODO:TM comment at the top of the function sort of implies without directly coming out and saying so, this is ugly and suggests that you've picked the wrong API. If the idea I proposed above isn't appealing, I still think we need to get rid of this by some other means. + /* + * TODO:TM For now, we use _bt_search to search from the root; in theory + * we should be able to do a local traversal ie from the current page, but + * I don't know if it would actually be better in general. + */ I think that this depends a lot on the number of duplicates. If we end up merely fast-forwarding within the same page to find the next unique value, re-traversing from the root sucks. However, if the next distinct key *isn't* on the same page, then traversing from the root is a good bet. A typical btree is only a few levels deep (like 3) so once you don't find what you're looking for on the same page it's probably fairly sound to re-traverse from the root. Of course you lose at least a little bit in the case where you would have found the next distinct key within a page or two but in other cases you win big. I wonder if that would be a suitable heuristic: first check the current page, if all keys there are equal then retraverse from the root. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: