Re: Hash Indexes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Hash Indexes
Date
Msg-id CA+TgmoacGhy_MpNbTzeTgyOjyx7=BgjCjSq271nm4NE=+wQsaA@mail.gmail.com
Whole thread Raw
In response to Re: Hash Indexes  (Andres Freund <andres@anarazel.de>)
Responses Re: Hash Indexes
List pgsql-hackers
On Fri, Sep 16, 2016 at 2:38 PM, Andres Freund <andres@anarazel.de> wrote:
>> I think that exploring it well requires good code.  If the code is good,
>> why not commit it?
>
> Because getting there requires a lot of effort, debugging it afterwards
> would take effort, and maintaining it would also takes a fair amount?
> Adding code isn't free.

Of course not, but nobody's saying you have to be the one to put in
any of that effort.  I was a bit afraid that nobody outside of
EnterpriseDB was going to take any interest in this patch, and I'm
really pretty pleased by the amount of interest that it's generated.
It's pretty clear that multiple smart people are working pretty hard
to break this, and Amit is fixing it, and at least for me that makes
me a lot less scared that the final result will be horribly broken.
It will probably have some bugs, but they probably won't be worse than
the status quo:

WARNING: hash indexes are not WAL-logged and their use is discouraged

Personally, I think it's outright embarrassing that we've had that
limitation for years; it boils down to "hey, we have this feature but
it doesn't work", which is a pretty crummy position for the world's
most advanced open-source database to take.

> I'm rather unenthused about having a hash index implementation that's
> mildly better in some corner cases, but otherwise doesn't have much
> benefit. That'll mean we'll have to step up our user education a lot,
> and we'll have to maintain something for little benefit.

If it turns out that it has little benefit, then we don't really need
to step up our user education.  People can just keep using btree like
they do now and that will be fine.  The only time we *really* need to
step up our user education is if it *does* have a benefit.  I think
that's a real possibility, because it's pretty clear to me - based in
part on off-list conversations with Amit - that the hash index code
has gotten very little love compared to btree, and there are lots of
optimizations that have been done for btree that have not been done
for hash indexes, but which could be done.  So I think there's a very
good chance that once we fix hash indexes to the point where they can
realistically be used, there will be further patches - either from
Amit or others - which improve performance even more.  Even the
preliminary results are not bad, though.

Also, Oracle offers hash indexes, and SQL Server offers them for
memory-optimized tables.  DB2 offers a "hash access path" which is not
described as an index but seems to work like one.  MySQL, like SQL
Server, offers them only for memory-optimized tables.  When all of the
other database products that we're competing against offer something,
it's not crazy to think that we should have it, too - and that it
should actually work, rather than being some kind of half-supported
wart.

By the way, I think that one thing which limits the performance
improvement we can get from hash indexes is the overall slowness of
the executor.  You can't save more by speeding something up than the
percentage of time you were spending on it in the first place.  IOW,
if you're spending all of your time in src/backend/executor then you
can't be spending it in src/backend/access, so making
src/backend/access faster doesn't help much.  However, as the executor
gets faster, which I hope it will, the potential gains from a faster
index go up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
Next
From: Robert Haas
Date:
Subject: Re: Declarative partitioning - another take