Re: GSoC on WAL-logging hash indexes - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: GSoC on WAL-logging hash indexes
Date
Msg-id CAMkU=1yhg806uMaRsJeaHgcUJa45+n=DMEt6BGwq9d_bRsDi8A@mail.gmail.com
Whole thread Raw
In response to Re: GSoC on WAL-logging hash indexes  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Wed, Apr 30, 2014 at 11:19 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Apr 30, 2014 at 11:03 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> If we don't put in the work to make them useful, then they won't ever become
> useful.
>
> If we do put in the effort (and it would be considerable) then I think they
> will be.  But you may be correct that the effort required would perhaps be
> better used in making btree even more better.  I don't think we can conclude
> that definitively without putting in the work to do the experiment.

My argument doesn't hinge on there being more important work to do.
Rather, I simply don't think that there is never going to be a
compelling reason to use hash indexes in production.

I have an indexed text column with an average length of 50, a stddev length of 15, and a pronounced right skew.  Currently, the longest value in it is 811.  But inevitably someone will need to insert something longer than 2712.  When that day comes, I will drop the btree index and add a hash index (unless we remove that limitation from btree indexes in the mean time).  It lets me sleep at night knowing that I have that option today, even if it would complicate crash recovery.
 

Apart from the
obvious inflexibility, consider what it takes to make index creation
fast - insertion-style building of indexes is much slower. Consider
multi-key indexes.

I'm pretty sure hash indexes already implement a bulk creation fast path.  In any case, I've never noticed them being slow, and I've tested some pretty big ones.

 
Now, I'm not telling anyone what to work on, and if someone wants to
make hash indexes WAL-logged to plug that hole, don't let me stop you.
It probably makes sense as a project to learn more about Postgres
internals. However, it would be unfair to not speak up given my
misgivings around the practical utility of hash indexes.

Sure, and we all have our own opinions on that.  Should we summarize them somewhere easier to follow than a long email thread but more detailed than a TODO entry?  Whatever happened with the GSOC people?  That should be well under way by now, is anyone working on it?  Are the discussions of their efforts on-list, or is it between them and their mentors?

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: GSoC on WAL-logging hash indexes
Next
From: Jeff Janes
Date:
Subject: Re: GSoC on WAL-logging hash indexes