Thread: Save Hash Indexes
Hi, Here's an idea: when a user ask for an Hash Index transparently build a BTree index over an hash function instead. Advantages: - it works - it's crash safe - it's (much?) faster than a hash index anyways Drawbacks: - root access concurrency - we need a hash_any function stable against pg_upgrade After talking about it with Heikki, we don't seem to find ways in which the root access concurrency pattern would be visible enough to matter. Also, talking with Peter Geoghegan, it's unclear that there's a use case where a hash index would be faster than a btree index over the hash function. Comments? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Nov 01, 2013 at 01:31:10PM +0000, Dimitri Fontaine wrote: > Hi, > > Here's an idea: when a user ask for an Hash Index transparently build a > BTree index over an hash function instead. > > Advantages: > > - it works > - it's crash safe > - it's (much?) faster than a hash index anyways > > Drawbacks: > > - root access concurrency > - we need a hash_any function stable against pg_upgrade > > After talking about it with Heikki, we don't seem to find ways in which > the root access concurrency pattern would be visible enough to matter. > > Also, talking with Peter Geoghegan, it's unclear that there's a use case > where a hash index would be faster than a btree index over the hash > function. > > Comments? > -- > Dimitri Fontaine > http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support > Hi Dimitri, This use of a function index as a "safe" hash index has been the substitute for a while. Check the previous threads. It is not a true substitute because a hash index is O(1) for lookups but a btree is O(log n) so hash indexes have an advantage for very large numbers on entries. In fact a recent post compared both the btree substitute and the current hash index and for large indexes the hash allowed 2X the lookups than the equivalent btree, which is what you would expect. The use-case is exactly for very large tables/indexes where the index does not fit in memory, to say nothing of the data itself. Regards, Ken
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Here's an idea: when a user ask for an Hash Index transparently build a > BTree index over an hash function instead. -1. If someone asks for a hash index, they should get a hash index. If you feel the documentation isn't sufficiently clear about the problems involved, we can work on that. The bigger picture here is that such an approach amounts to deciding that no one will ever be allowed to fix hash indexes. I'm not for that, even if I'm not volunteering to be the fixer myself. I also don't believe your claim that this would always be faster than a real hash index. What happened to O(1) vs O(log N)? Lastly: what real-world problem are we solving by kicking that code to the curb? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > -1. If someone asks for a hash index, they should get a hash index. > If you feel the documentation isn't sufficiently clear about the problems > involved, we can work on that. Fair enough. > Lastly: what real-world problem are we solving by kicking that code > to the curb? Well how long did we wait for the fix already? The lack of crash safety and replication ability is not just a drawback, it's about killing the user adoption. Maybe we should implement UNLOGGED indexes and then Hash Indexes would only exist in the UNLOGGED fashion? But well, yeah ok, my idea is a non-starter. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 11/01/2013 09:49 AM, Tom Lane wrote: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: >> Here's an idea: when a user ask for an Hash Index transparently build a >> BTree index over an hash function instead. > -1. If someone asks for a hash index, they should get a hash index. > If you feel the documentation isn't sufficiently clear about the problems > involved, we can work on that. > > The bigger picture here is that such an approach amounts to deciding that > no one will ever be allowed to fix hash indexes. I'm not for that, even > if I'm not volunteering to be the fixer myself. > > I also don't believe your claim that this would always be faster than a > real hash index. What happened to O(1) vs O(log N)? > > Lastly: what real-world problem are we solving by kicking that code > to the curb? > > Yeah, and there's this: I've had at least one client who switched to using hash indexes and got a significant benefit from it precisely because they aren't WAL logged. They could afford to rebuild the indexes in the unlikely event of a crash, but the IO gain was worth it to them. Neither could they have tolerated unlogged tables - they wanted crash safety for their data. After talking through the various options with them they decided this was the best choice, and it might be sad to remove that choice from people. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Yeah, and there's this: I've had at least one client who switched to > using hash indexes and got a significant benefit from it precisely > because they aren't WAL logged. They could afford to rebuild the indexes > in the unlikely event of a crash, but the IO gain was worth it to them. > Neither could they have tolerated unlogged tables - they wanted crash > safety for their data. After talking through the various options with > them they decided this was the best choice, and it might be sad to > remove that choice from people. That's an interesting story, but it seems like what it points to is the need for a general "unlogged index" feature, rather than depending on what's universally agreed to be an implementation deficiency of hash indexes. So it sounds like an independent topic. regards, tom lane
On 2013-11-01 09:49:57 -0400, Tom Lane wrote: > Lastly: what real-world problem are we solving by kicking that code > to the curb? It makes hashed lookups much easier to use. Currently if you want indexed access over wide columns and equality is all you need you need to write rather awkward queries to make that happen in each query, something like: WHERE hash(column) = hash('value') AND column = 'value'. So some magic that hides having to do that would be nice, even it doesn't have O(log(n)) properties. The interesting part is that the index gets much denser and smaller, and that the individual comparisons are much cheaper. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Friday, November 1, 2013, ktm@rice.edu wrote:
On Fri, Nov 01, 2013 at 01:31:10PM +0000, Dimitri Fontaine wrote:
> Hi,
>
> Here's an idea: when a user ask for an Hash Index transparently build a
> BTree index over an hash function instead.
>
> Advantages:
>
> - it works
> - it's crash safe
> - it's (much?) faster than a hash index anyways
>
> Drawbacks:
>
> - root access concurrency
> - we need a hash_any function stable against pg_upgrade
>
> After talking about it with Heikki, we don't seem to find ways in which
> the root access concurrency pattern would be visible enough to matter.
>
> Also, talking with Peter Geoghegan, it's unclear that there's a use case
> where a hash index would be faster than a btree index over the hash
> function.
>
> Comments?
> --
> Dimitri Fontaine
> http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
>
Hi Dimitri,
This use of a function index as a "safe" hash index has been the substitute
for a while. Check the previous threads. It is not a true substitute because
a hash index is O(1) for lookups but a btree is O(log n) so hash indexes have
an advantage for very large numbers on entries. In fact a recent post compared
both the btree substitute and the current hash index and for large indexes the
hash allowed 2X the lookups than the equivalent btree, which is what you would
expect. The use-case is exactly for very large tables/indexes where the index
does not fit in memory, to say nothing of the data itself.
Regards,
Ken
Hi,
I agree too.Theoretically, hash tables are constant time lookup while btree are logarithmic. There may be cases where we may get better performance with btree, but replacing hash indexes with them completely is not really a good idea,IMHO.
Is the motivation behind this idea drawn from workloads where btree perform better, or are you trying to fix issues with hash indexes?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Regards,
Atri
l'apprenant
On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Hi,
Here's an idea: when a user ask for an Hash Index transparently build a
BTree index over an hash function instead.
Could something be added to the planner so that you can just build a btree index on a hash expression, and have the planner automatically realize it can used for an equality query because if a=b, then md5(a)=md5(b), at least for some data types? A general ability to recognize things like that seems like it might have other uses as well.
Cheers,
Jeff
On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Hi, > > Here's an idea: when a user ask for an Hash Index transparently build a > BTree index over an hash function instead. > > Advantages: > > - it works > - it's crash safe > - it's (much?) faster than a hash index anyways > > Drawbacks: > > - root access concurrency > - we need a hash_any function stable against pg_upgrade > > After talking about it with Heikki, we don't seem to find ways in which > the root access concurrency pattern would be visible enough to matter. > > Also, talking with Peter Geoghegan, it's unclear that there's a use case > where a hash index would be faster than a btree index over the hash > function. > > Comments? I haven't met a single Heroku user that has stumbled into this landmine. Normally I am very weary of such landmine laden features, but this one is there and doesn't seem to have detonated at any point.I guess the obscure nature of those indexes and thestern warning in the documentation has sufficed to discourage their use. Given that experience, I think foreclosing fixing hash indexes is premature, and doesn't seem to be hurting inexperienced users of this stripe. Maybe there are yet other reasons to argue the subject, though...it's not like the current user base relies on the behavior as-is either, having seemingly steered clear just about altogether.
On Fri, Nov 1, 2013 at 8:52 AM, Daniel Farina <daniel@heroku.com> wrote: > On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> Also, talking with Peter Geoghegan, it's unclear that there's a use case >> where a hash index would be faster than a btree index over the hash >> function. >> >> Comments? > > I haven't met a single Heroku user that has stumbled into this > landmine. Normally I am very weary of such landmine laden features, > but this one is there and doesn't seem to have detonated at any point. > I guess the obscure nature of those indexes and the stern warning in > the documentation has sufficed to discourage their use. > > Given that experience, I think foreclosing fixing hash indexes is > premature, and doesn't seem to be hurting inexperienced users of this > stripe. Maybe there are yet other reasons to argue the subject, > though...it's not like the current user base relies on the behavior > as-is either, having seemingly steered clear just about altogether. In addendum, though, some users *have* done the hash-function + btree trick to make keys smaller. They tend to resort to full blown cryptographic hashes and assume/enforce non-collision, but it's somewhat awkward and finicky. So while perhaps commandeering the 'hash' index type is a bit over-aggressive, making that use case work better would probably carry positive impact. I can say most of my indexes over text are *never* used for range queries, so hashing them down one would think would be great. The fiddliness of applying expression indexes over all that forecloses getting the benefits that might be possible, there: only certain heavy workloads will receive that level of care.
On 11/01/2013 03:49 PM, Andres Freund wrote: > On 2013-11-01 09:49:57 -0400, Tom Lane wrote: >> Lastly: what real-world problem are we solving by kicking that code >> to the curb? > It makes hashed lookups much easier to use. Currently if you want > indexed access over wide columns and equality is all you need you need > to write rather awkward queries to make that happen in each query, > something like: > WHERE hash(column) = hash('value') AND column = 'value'. > > So some magic that hides having to do that would be nice, even it > doesn't have O(log(n)) properties. Is'nt there a way to do this using a special operator class ? > The interesting part is that the > index gets much denser and smaller, and that the individual comparisons > are much cheaper. > > Greetings, > > Andres Freund > -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Fri, Nov 1, 2013 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The bigger picture here is that such an approach amounts to deciding that > no one will ever be allowed to fix hash indexes. I'm not for that, even > if I'm not volunteering to be the fixer myself. Yeah. I have thought about doing some work on this, although I don't know when I'll ever have the time. As far as I can see, the basic problem is that we need a better way of splitting buckets. Right now, splitting a bucket means locking it, scanning the entire bucket chain and moving ~1/2 the tuples to the new bucket, and then unlocking it. Since this is a long operation, we have to use heavyweight locks to protect the buckets, which is bad for concurrency. Since it involves a modification to an arbitrarily large number of pages that has to be atomic, it's not possible to WAL-log it sensibly -- and in fact, I think this is really the major obstacle to being able to implementing WAL-logging for hash indexes. I think we could work around this problem as follows. Suppose we have buckets 1..N and the split will create bucket N+1. We need a flag that can be set and cleared on the metapage, call it split-in-progress, and a flag that can optionally be set on particular index tuples, call it moved-by-split. We will allow scans of all buckets and insertions into all buckets while the split is in progress, but (as now) we will not allow more than one split to be in progress at the same time. We start the split by updating the metapage to incrementing the number of buckets and set the split-in-progress flag. While the split-in-progress flag is set, any scans of bucket N+1 first scan that bucket, ignoring any tuples flagged moved-by-split, and then ALSO scan bucket (N+1)/2. The moved-by-split flag never has any effect except when scanning the highest-numbered bucket that existed at the start of that particular scan, and then only if the split-in-progress flag was also set at that time. Once the split operation has set the split-in-progress flag, it will begin scanning bucket (N+1)/2. Every time it finds a tuple that properly belongs in bucket N+1, it will insert the tuple into bucket N+1 with the moved-by-split flag set. Tuples inserted by anything other than a split operation will leave this flag clear, and tuples inserted while the split is in progress will target the same bucket that they would hit if the split were already complete. Thus, bucket N+1 will end up with a mix of moved-by-split tuples, coming from bucket (N+1)/2, and unflagged tuples coming from parallel insertion activity. When the scan of bucket (N+1)/2 is complete, we know that bucket N+1 now contains all the tuples that are supposed to be there, so we clear the split-in-progress flag. Future scans of both buckets can proceed normally. Of course, there's a bit of a problem here: we haven't actually removed any tuples from bucket (N+1)/2. That's not a correctness problem; we'll check for an exact hash-value match before returning any tuples from this bucket from the scan, so tuples that aren't present in the bucket but should be will just ignored anyway. But it's clearly something that needs to be fixed before we can really declare victory. We need to wait lfor the completion of any scans that began before we finished populating bucket N+1, because otherwise we might remove tuples that they're still expecting to find in bucket (N+1)/2. I'm not sure if there's some clever trick we can use to accomplish this; e.g. if the scan will always maintain a pin, we could take a buffer cleanup lock on all the buffers in buckets N+1 and (N+1)/2 in the same sequence a scan would visit them. Of course, even if that works, it might be a while if somebody's left a cursor lying around. And even if scans can't be counted on to maintain a pin, a point about which I'm unsure off-hand, then things get even trickier. But maybe there's some solution. Anyway, once we out-wait concurrent scans, we can go back and nuke everything from (N+1)/2 that no longer maps onto that bucket. It might be possible to make this work opportunistically, like HOT cleanup: if we're scanning a buffer, and RecentGlobalXmin has progressed enough that we know there can't be any pre-split scans in progress (or if we can recognize in some other inexpensive way that old scans must be gone), and we see a tuple there that no longer maps onto that bucket, we scan the page and remove all such tuples. This is full of hand-waving, but I'd be curious to know whether the approach is perceived to have any merit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company