Thread: Save Hash Indexes

Save Hash Indexes

From

Dimitri Fontaine

Date:

01 November 2013, 13:31:20

Hi,

Here's an idea: when a user ask for an Hash Index transparently build a
BTree index over an hash function instead.

Advantages:
 - it works - it's crash safe - it's (much?) faster than a hash index anyways

Drawbacks:
 - root access concurrency - we need a hash_any function stable against pg_upgrade

After talking about it with Heikki, we don't seem to find ways in which
the root access concurrency pattern would be visible enough to matter.

Also, talking with Peter Geoghegan, it's unclear that there's a use case
where a hash index would be faster than a btree index over the hash
function.

Comments?
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Save Hash Indexes

From

"ktm@rice.edu"

Date:

01 November 2013, 13:48:52

On Fri, Nov 01, 2013 at 01:31:10PM +0000, Dimitri Fontaine wrote:
> Hi,
> 
> Here's an idea: when a user ask for an Hash Index transparently build a
> BTree index over an hash function instead.
> 
> Advantages:
> 
>   - it works
>   - it's crash safe
>   - it's (much?) faster than a hash index anyways
> 
> Drawbacks:
> 
>   - root access concurrency
>   - we need a hash_any function stable against pg_upgrade
> 
> After talking about it with Heikki, we don't seem to find ways in which
> the root access concurrency pattern would be visible enough to matter.
> 
> Also, talking with Peter Geoghegan, it's unclear that there's a use case
> where a hash index would be faster than a btree index over the hash
> function.
> 
> Comments?
> -- 
> Dimitri Fontaine
> http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support
> 

Hi Dimitri,

This use of a function index as a "safe" hash index has been the substitute
for a while. Check the previous threads. It is not a true substitute because
a hash index is O(1) for lookups but a btree is O(log n) so hash indexes have
an advantage for very large numbers on entries. In fact a recent post compared
both the btree substitute and the current hash index and for large indexes the
hash allowed 2X the lookups than the equivalent btree, which is what you would
expect. The use-case is exactly for very large tables/indexes where the index
does not fit in memory, to say nothing of the data itself.

Regards,
Ken

Re: Save Hash Indexes

From

Tom Lane

Date:

01 November 2013, 13:50:07

Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Here's an idea: when a user ask for an Hash Index transparently build a
> BTree index over an hash function instead.

-1.  If someone asks for a hash index, they should get a hash index.
If you feel the documentation isn't sufficiently clear about the problems
involved, we can work on that.

The bigger picture here is that such an approach amounts to deciding that
no one will ever be allowed to fix hash indexes.  I'm not for that, even
if I'm not volunteering to be the fixer myself.

I also don't believe your claim that this would always be faster than a
real hash index.  What happened to O(1) vs O(log N)?

Lastly: what real-world problem are we solving by kicking that code
to the curb?
        regards, tom lane

Re: Save Hash Indexes

From

Dimitri Fontaine

Date:

01 November 2013, 14:13:10

Tom Lane <tgl@sss.pgh.pa.us> writes:
> -1.  If someone asks for a hash index, they should get a hash index.
> If you feel the documentation isn't sufficiently clear about the problems
> involved, we can work on that.

Fair enough.

> Lastly: what real-world problem are we solving by kicking that code
> to the curb?

Well how long did we wait for the fix already? The lack of crash safety
and replication ability is not just a drawback, it's about killing the
user adoption.

Maybe we should implement UNLOGGED indexes and then Hash Indexes would
only exist in the UNLOGGED fashion?

But well, yeah ok, my idea is a non-starter.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Save Hash Indexes

From

Andrew Dunstan

Date:

01 November 2013, 14:16:55

On 11/01/2013 09:49 AM, Tom Lane wrote:
> Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
>> Here's an idea: when a user ask for an Hash Index transparently build a
>> BTree index over an hash function instead.
> -1.  If someone asks for a hash index, they should get a hash index.
> If you feel the documentation isn't sufficiently clear about the problems
> involved, we can work on that.
>
> The bigger picture here is that such an approach amounts to deciding that
> no one will ever be allowed to fix hash indexes.  I'm not for that, even
> if I'm not volunteering to be the fixer myself.
>
> I also don't believe your claim that this would always be faster than a
> real hash index.  What happened to O(1) vs O(log N)?
>
> Lastly: what real-world problem are we solving by kicking that code
> to the curb?
>
>             

Yeah, and there's this: I've had at least one client who switched to 
using hash indexes and got a significant benefit from it precisely 
because they aren't WAL logged. They could afford to rebuild the indexes 
in the unlikely event of a crash, but the IO gain was worth it to them. 
Neither could they have tolerated unlogged tables - they wanted crash 
safety for their data. After talking through the various options with 
them they decided this was the best choice, and it might be sad to 
remove that choice from people.

cheers

andrew

Re: Save Hash Indexes

From

Tom Lane

Date:

01 November 2013, 14:44:09

Andrew Dunstan <andrew@dunslane.net> writes:
> Yeah, and there's this: I've had at least one client who switched to 
> using hash indexes and got a significant benefit from it precisely 
> because they aren't WAL logged. They could afford to rebuild the indexes 
> in the unlikely event of a crash, but the IO gain was worth it to them. 
> Neither could they have tolerated unlogged tables - they wanted crash 
> safety for their data. After talking through the various options with 
> them they decided this was the best choice, and it might be sad to 
> remove that choice from people.

That's an interesting story, but it seems like what it points to is the
need for a general "unlogged index" feature, rather than depending on
what's universally agreed to be an implementation deficiency of hash
indexes.  So it sounds like an independent topic.
        regards, tom lane

Re: Save Hash Indexes

From

Andres Freund

Date:

01 November 2013, 14:49:40

On 2013-11-01 09:49:57 -0400, Tom Lane wrote:
> Lastly: what real-world problem are we solving by kicking that code
> to the curb?

It makes hashed lookups much easier to use. Currently if you want
indexed access over wide columns and equality is all you need you need
to write rather awkward queries to make that happen in each query,
something like:
WHERE hash(column) = hash('value') AND column = 'value'.

So some magic that hides having to do that would be nice, even it
doesn't have O(log(n)) properties. The interesting part is that the
index gets much denser and smaller, and that the individual comparisons
are much cheaper.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Save Hash Indexes

From

Atri Sharma

Date:

01 November 2013, 14:54:08

On Friday, November 1, 2013, ktm@rice.edu wrote:

On Fri, Nov 01, 2013 at 01:31:10PM +0000, Dimitri Fontaine wrote:
> Hi,
>
> Here's an idea: when a user ask for an Hash Index transparently build a
> BTree index over an hash function instead.
>
> Advantages:
>
> - it works
> - it's crash safe
> - it's (much?) faster than a hash index anyways
>
> Drawbacks:
>
> - root access concurrency
> - we need a hash_any function stable against pg_upgrade
>
> After talking about it with Heikki, we don't seem to find ways in which
> the root access concurrency pattern would be visible enough to matter.
>
> Also, talking with Peter Geoghegan, it's unclear that there's a use case
> where a hash index would be faster than a btree index over the hash
> function.
>
> Comments?
> --
> Dimitri Fontaine
> http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
>

Hi Dimitri,

This use of a function index as a "safe" hash index has been the substitute
for a while. Check the previous threads. It is not a true substitute because
a hash index is O(1) for lookups but a btree is O(log n) so hash indexes have
an advantage for very large numbers on entries. In fact a recent post compared
both the btree substitute and the current hash index and for large indexes the
hash allowed 2X the lookups than the equivalent btree, which is what you would
expect. The use-case is exactly for very large tables/indexes where the index
does not fit in memory, to say nothing of the data itself.

Regards,
Ken

Hi,

I agree too.Theoretically, hash tables are constant time lookup while btree are logarithmic. There may be cases where we may get better performance with btree, but replacing hash indexes with them completely is not really a good idea,IMHO.

Is the motivation behind this idea drawn from workloads where btree perform better, or are you trying to fix issues with hash indexes?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Regards,

Atri

l'apprenant

Re: Save Hash Indexes

From

Jeff Janes

Date:

01 November 2013, 15:37:49

On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Hi,

Here's an idea: when a user ask for an Hash Index transparently build a
BTree index over an hash function instead.

Could something be added to the planner so that you can just build a btree index on a hash expression, and have the planner automatically realize it can used for an equality query because if a=b, then md5(a)=md5(b), at least for some data types? A general ability to recognize things like that seems like it might have other uses as well.

Cheers,

Jeff

Re: Save Hash Indexes

From

Daniel Farina

Date:

01 November 2013, 15:53:22

On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Hi,
>
> Here's an idea: when a user ask for an Hash Index transparently build a
> BTree index over an hash function instead.
>
> Advantages:
>
>   - it works
>   - it's crash safe
>   - it's (much?) faster than a hash index anyways
>
> Drawbacks:
>
>   - root access concurrency
>   - we need a hash_any function stable against pg_upgrade
>
> After talking about it with Heikki, we don't seem to find ways in which
> the root access concurrency pattern would be visible enough to matter.
>
> Also, talking with Peter Geoghegan, it's unclear that there's a use case
> where a hash index would be faster than a btree index over the hash
> function.
>
> Comments?

I haven't met a single Heroku user that has stumbled into this
landmine.  Normally I am very weary of such landmine laden features,
but this one is there and doesn't seem to have detonated at any point.I guess the obscure nature of those indexes and
thestern warning in

the documentation has sufficed to discourage their use.

Given that experience, I think foreclosing fixing hash indexes is
premature, and doesn't seem to be hurting inexperienced users of this
stripe.  Maybe there are yet other reasons to argue the subject,
though...it's not like the current user base relies on the behavior
as-is either, having seemingly steered clear just about altogether.

Re: Save Hash Indexes

From

Daniel Farina

Date:

01 November 2013, 16:38:23

On Fri, Nov 1, 2013 at 8:52 AM, Daniel Farina <daniel@heroku.com> wrote:
> On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>> Also, talking with Peter Geoghegan, it's unclear that there's a use case
>> where a hash index would be faster than a btree index over the hash
>> function.
>>
>> Comments?
>
> I haven't met a single Heroku user that has stumbled into this
> landmine.  Normally I am very weary of such landmine laden features,
> but this one is there and doesn't seem to have detonated at any point.
>  I guess the obscure nature of those indexes and the stern warning in
> the documentation has sufficed to discourage their use.
>
> Given that experience, I think foreclosing fixing hash indexes is
> premature, and doesn't seem to be hurting inexperienced users of this
> stripe.  Maybe there are yet other reasons to argue the subject,
> though...it's not like the current user base relies on the behavior
> as-is either, having seemingly steered clear just about altogether.

In addendum, though, some users *have* done the hash-function + btree
trick to make keys smaller.  They tend to resort to full blown
cryptographic hashes and assume/enforce non-collision, but it's
somewhat awkward and finicky.  So while perhaps commandeering the
'hash' index type is a bit over-aggressive, making that use case work
better would probably carry positive impact.

I can say most of my indexes over text are *never* used for range
queries, so hashing them down one would think would be great.  The
fiddliness of applying expression indexes over all that forecloses
getting the benefits that might be possible, there: only certain heavy
workloads will receive that level of care.

Re: Save Hash Indexes

From

Hannu Krosing

Date:

01 November 2013, 18:13:57

On 11/01/2013 03:49 PM, Andres Freund wrote:
> On 2013-11-01 09:49:57 -0400, Tom Lane wrote:
>> Lastly: what real-world problem are we solving by kicking that code
>> to the curb?
> It makes hashed lookups much easier to use. Currently if you want
> indexed access over wide columns and equality is all you need you need
> to write rather awkward queries to make that happen in each query,
> something like:
> WHERE hash(column) = hash('value') AND column = 'value'.
>
> So some magic that hides having to do that would be nice, even it
> doesn't have O(log(n)) properties. 
Is'nt there a way to do this using a special operator class ?
> The interesting part is that the
> index gets much denser and smaller, and that the individual comparisons
> are much cheaper.
>
> Greetings,
>
> Andres Freund
>


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: Save Hash Indexes

From

Robert Haas

Date:

01 November 2013, 19:04:33

On Fri, Nov 1, 2013 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The bigger picture here is that such an approach amounts to deciding that
> no one will ever be allowed to fix hash indexes.  I'm not for that, even
> if I'm not volunteering to be the fixer myself.

Yeah.

I have thought about doing some work on this, although I don't know
when I'll ever have the time.  As far as I can see, the basic problem
is that we need a better way of splitting buckets.  Right now,
splitting a bucket means locking it, scanning the entire bucket chain
and moving ~1/2 the tuples to the new bucket, and then unlocking it.
Since this is a long operation, we have to use heavyweight locks to
protect the buckets, which is bad for concurrency.  Since it involves
a modification to an arbitrarily large number of pages that has to be
atomic, it's not possible to WAL-log it sensibly  -- and in fact, I
think this is really the major obstacle to being able to implementing
WAL-logging for hash indexes.

I think we could work around this problem as follows.   Suppose we
have buckets 1..N and the split will create bucket N+1.  We need a
flag that can be set and cleared on the metapage, call it
split-in-progress, and a flag that can optionally be set on particular
index tuples, call it moved-by-split.  We will allow scans of all
buckets and insertions into all buckets while the split is in
progress, but (as now) we will not allow more than one split to be in
progress at the same time.  We start the split by updating the
metapage to incrementing the number of buckets and set the
split-in-progress flag.  While the split-in-progress flag is set, any
scans of bucket N+1 first scan that bucket, ignoring any tuples
flagged moved-by-split, and then ALSO scan bucket (N+1)/2.  The
moved-by-split flag never has any effect except when scanning the
highest-numbered bucket that existed at the start of that particular
scan, and then only if the split-in-progress flag was also set at that
time.

Once the split operation has set the split-in-progress flag, it will
begin scanning bucket (N+1)/2.  Every time it finds a tuple that
properly belongs in bucket N+1, it will insert the tuple into bucket
N+1 with the moved-by-split flag set.  Tuples inserted by anything
other than a split operation will leave this flag clear, and tuples
inserted while the split is in progress will target the same bucket
that they would hit if the split were already complete.  Thus, bucket
N+1 will end up with a mix of moved-by-split tuples, coming from
bucket (N+1)/2, and unflagged tuples coming from parallel insertion
activity.  When the scan of bucket (N+1)/2 is complete, we know that
bucket N+1 now contains all the tuples that are supposed to be there,
so we clear the split-in-progress flag.  Future scans of both buckets
can proceed normally.

Of course, there's a bit of a problem here: we haven't actually
removed any tuples from bucket (N+1)/2.  That's not a correctness
problem; we'll check for an exact hash-value match before returning
any tuples from this bucket from the scan, so tuples that aren't
present in the bucket but should be will just ignored anyway.  But
it's clearly something that needs to be fixed before we can really
declare victory.  We need to wait lfor the completion of any scans
that began before we finished populating bucket N+1, because otherwise
we might remove tuples that they're still expecting to find in bucket
(N+1)/2.  I'm not sure if there's some clever trick we can use to
accomplish this; e.g. if the scan will always maintain a pin, we could
take a buffer cleanup lock on all the buffers in buckets N+1 and
(N+1)/2 in the same sequence a scan would visit them.  Of course, even
if that works, it might be a while if somebody's left a cursor lying
around.  And even if scans can't be counted on to maintain a pin, a
point about which I'm unsure off-hand, then things get even trickier.
But maybe there's some solution.

Anyway, once we out-wait concurrent scans, we can go back and nuke
everything from (N+1)/2 that no longer maps onto that bucket.  It
might be possible to make this work opportunistically, like HOT
cleanup: if we're scanning a buffer, and RecentGlobalXmin has
progressed enough that we know there can't be any pre-split scans in
progress (or if we can recognize in some other inexpensive way that
old scans must be gone), and we see a tuple there that no longer maps
onto that bucket, we scan the page and remove all such tuples.

This is full of hand-waving, but I'd be curious to know whether the
approach is perceived to have any merit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company