Re: Hash Indexes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Hash Indexes |
Date | |
Msg-id | CA+TgmoacGhy_MpNbTzeTgyOjyx7=BgjCjSq271nm4NE=+wQsaA@mail.gmail.com Whole thread Raw |
In response to | Re: Hash Indexes (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Hash Indexes
|
List | pgsql-hackers |
On Fri, Sep 16, 2016 at 2:38 PM, Andres Freund <andres@anarazel.de> wrote: >> I think that exploring it well requires good code. If the code is good, >> why not commit it? > > Because getting there requires a lot of effort, debugging it afterwards > would take effort, and maintaining it would also takes a fair amount? > Adding code isn't free. Of course not, but nobody's saying you have to be the one to put in any of that effort. I was a bit afraid that nobody outside of EnterpriseDB was going to take any interest in this patch, and I'm really pretty pleased by the amount of interest that it's generated. It's pretty clear that multiple smart people are working pretty hard to break this, and Amit is fixing it, and at least for me that makes me a lot less scared that the final result will be horribly broken. It will probably have some bugs, but they probably won't be worse than the status quo: WARNING: hash indexes are not WAL-logged and their use is discouraged Personally, I think it's outright embarrassing that we've had that limitation for years; it boils down to "hey, we have this feature but it doesn't work", which is a pretty crummy position for the world's most advanced open-source database to take. > I'm rather unenthused about having a hash index implementation that's > mildly better in some corner cases, but otherwise doesn't have much > benefit. That'll mean we'll have to step up our user education a lot, > and we'll have to maintain something for little benefit. If it turns out that it has little benefit, then we don't really need to step up our user education. People can just keep using btree like they do now and that will be fine. The only time we *really* need to step up our user education is if it *does* have a benefit. I think that's a real possibility, because it's pretty clear to me - based in part on off-list conversations with Amit - that the hash index code has gotten very little love compared to btree, and there are lots of optimizations that have been done for btree that have not been done for hash indexes, but which could be done. So I think there's a very good chance that once we fix hash indexes to the point where they can realistically be used, there will be further patches - either from Amit or others - which improve performance even more. Even the preliminary results are not bad, though. Also, Oracle offers hash indexes, and SQL Server offers them for memory-optimized tables. DB2 offers a "hash access path" which is not described as an index but seems to work like one. MySQL, like SQL Server, offers them only for memory-optimized tables. When all of the other database products that we're competing against offer something, it's not crazy to think that we should have it, too - and that it should actually work, rather than being some kind of half-supported wart. By the way, I think that one thing which limits the performance improvement we can get from hash indexes is the overall slowness of the executor. You can't save more by speeding something up than the percentage of time you were spending on it in the first place. IOW, if you're spending all of your time in src/backend/executor then you can't be spending it in src/backend/access, so making src/backend/access faster doesn't help much. However, as the executor gets faster, which I hope it will, the potential gains from a faster index go up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: