Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE |
Date | |
Msg-id | CAA4eK1+f=KpUW3TOW0P9LUCA8xhFujcjKhEnYRtoaB83LSsSMg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE (Chapman Flack <chap@anastigmatix.net>) |
Responses |
Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE
|
List | pgsql-hackers |
On Mon, May 22, 2017 at 8:31 AM, Chapman Flack <chap@anastigmatix.net> wrote: > On 05/19/17 11:41, Tom Lane wrote: > >> No, nobody's done anything about allowing hash indexes to support >> uniqueness AFAIK. I don't have a clear picture of how much work >> it would be, but it would likely be more than trivial effort; > > I see what you mean. Because of the way hash values are ordered > (to allow binary search) within a page, but not between pages of > a bucket, insertion as it stands now is able to stop as soon as > it finds any page with room for the entry, but a unique-insertion > will have to check every page of the bucket for matching hashes, > and then (because only the hash and tid are in the index) chase > any of those to the heap to compare the value. > > Maybe both hash collisions and overflow pages are rare enough > in practice with reasonable data that the performance impact > of that would be small, but still the possibility has to be > accounted for, the locking may get hairier (do you now keep > the lock you have on the page where room was found for the entry, > and use another lock to walk the remaining pages until sure > there's no duplicate?). > > At least I see that interest in UNIQUE for hash indexes has been > shown on -hackers several times over the years, and is on the TODO. > Neil Conway seems to have had an idea [1] for making the locking work, > 14 years ago (however relevant that might be to today's code). > I think that won't be directly applicable now as we have changed the locking mechanism such that instead of acquiring heavy-weight locks it uses LWLocks and for inserts and we don't hold it for more than one bucket page at a time for inserts. I think to make unique check work, there are a couple of options like (a) unique insertions can always acquire a cleanup lock on bucket page, this will work because scans (to find duplicate values in overflow pages will hold a pin on bucket page). The drawback will be that unique insertions need to wait even when there is some unrelated scan is in progress. (b) unique insertions can hold locks on bucket pages while traversing the bucket chain. It is bad to retain locks on multiple buckets for concurrency, but in practise, unqiue indexes won't have many overflow pages. (c) keep a flag in bucket page to indicate unique index insertion is in progress and if such a flag is set, other insertions need to wait. I think this is not a preferable way because we need to take care of clearing such a flag not only after the operation but also after crash recovery. Any better ideas? > ... and one inquiry last year [2] did seem to get tabled because of the > lack of WAL logging, which is now a non-blocker. > > I haven't seen much discussion of /why/ one would want hash-based UNIQUE. > I know my own reasons, but I'm not sure how persuasive they are in light > of the implementation realities, so maybe that makes such a discussion > worthwhile. I can start; these are the two reasons I had: > > 1. To a naive intuition (especially one raised more on in-memory data > structures than the guts of databases), it just seems natural: > hashing seems like the canonical approach to uniqueness testing > where there's no need for ordering, intuition suggests a performance > advantage, and so the least-astonishment principle suffers upon finding > it isn't supported. > > 2. When developing a custom data type, it feels like tedious > busy-work to have to bang out a full set of ordering operators > for a btree operator class if there is no meaningful order for > the type. > > Maybe the intuitions behind (1) are just misinformed, the performance > ones at least, in light of Craig Ringer's low opinion of whether "hash > indexes are better than btree for anything" [3], and André Barbosa's > more recent performance comparison [4] (which does show some advantages > for hash in some circumstances, but mostly not large. The only large > advantage was in initial creation; would that be hashsort.c at work?). > > But then, both [3] and [4] predate the recent projects on hash indexes > that have "made them crash safe and are on the way to making them > performant" [5], so maybe an updated comparison would be timely, or some > addition to the docs to better characterize the circumstances where hash > could be good. > I think the performance characteristics of hash indexes have changed especially for read-only cases after recent work. We have done many tests which indicate that hash indexes perform better than btree when the column value is unique. You might want to try that based on your usecase. > (Every index method newer than btree and hash has its own > part VII Internals chapter; for completeness, might it make sense to have > those for btree and hash also, even if only to broadly discuss > the conditions under which they perform especially well or poorly?) > > For all sorts of indexes, would there be any use for some CREATE INDEX > syntax for a multicolumn index to say that some of its rightmost columns > aren't there to participate in the indexing scheme, but only to benefit > index-only scans? Applied to a hash index, that might offer another useful > kind of multicolumn support, which otherwise seems limited to queries > where you have the exact values of all indexed columns. > Agreed, but even if we have any such syntax, making it work for hash indexes is tricky, because we currently store the hash code in the index, not the original hash index key. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: