Thread: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

[HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Chapman Flack

Date:

19 May 2017, 18:34:08

Hi,

The item on hash indexes reminded me of an old comment from years
ago that I put in the code of the first custom PG datatype I ever
built at $work:

COMMENT ON OPERATOR CLASS puid_ops USING btree IS
'As puids are only identifiers, there is no obvious reason to define
ordering operators or support btree indexing. But for some curious
reason PostgreSQL 8.4 does not allow a hash index to support UNIQUE
constraints (this may be because, per the manual, hash index "operations
are not presently WAL-logged" so it could be risky to base constraints
on them). Therefore, the whole set of ordering operators must be
implemented to provide an operator class for the btree index method.';

Was my guess about the reason right? Does this PG10 announcement
also mean it will be possible to use UNIQUE constraints with some
pure-identifier, no-natural-ordering type that supports only hashing?

-Chap

Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Tom Lane

Date:

19 May 2017, 18:41:21

Chapman Flack <chap@anastigmatix.net> writes:
> Was my guess about the reason right? Does this PG10 announcement
> also mean it will be possible to use UNIQUE constraints with some
> pure-identifier, no-natural-ordering type that supports only hashing?

No, nobody's done anything about allowing hash indexes to support
uniqueness AFAIK.  I don't have a clear picture of how much work
it would be, but it would likely be more than trivial effort;
there's definitely extra code in btree that supports that.

(You might be right about the big picture, in that no one wanted to
bother with working on that as long as hash indexes weren't crash
safe.  But there's not a technical connection.)
        regards, tom lane

Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Chapman Flack

Date:

22 May 2017, 06:01:57

On 05/19/17 11:41, Tom Lane wrote:

> No, nobody's done anything about allowing hash indexes to support
> uniqueness AFAIK.  I don't have a clear picture of how much work
> it would be, but it would likely be more than trivial effort;

I see what you mean. Because of the way hash values are ordered
(to allow binary search) within a page, but not between pages of
a bucket, insertion as it stands now is able to stop as soon as
it finds any page with room for the entry, but a unique-insertion
will have to check every page of the bucket for matching hashes,
and then (because only the hash and tid are in the index) chase
any of those to the heap to compare the value.

Maybe both hash collisions and overflow pages are rare enough
in practice with reasonable data that the performance impact
of that would be small, but still the possibility has to be
accounted for, the locking may get hairier (do you now keep
the lock you have on the page where room was found for the entry,
and use another lock to walk the remaining pages until sure
there's no duplicate?).

At least I see that interest in UNIQUE for hash indexes has been
shown on -hackers several times over the years, and is on the TODO.
Neil Conway seems to have had an idea [1] for making the locking work,
14 years ago (however relevant that might be to today's code).

... and one inquiry last year [2] did seem to get tabled because of the
lack of WAL logging, which is now a non-blocker.

I haven't seen much discussion of /why/ one would want hash-based UNIQUE.
I know my own reasons, but I'm not sure how persuasive they are in light
of the implementation realities, so maybe that makes such a discussion
worthwhile. I can start; these are the two reasons I had:

1. To a naive intuition (especially one raised more on in-memory data  structures than the guts of databases), it just
seemsnatural:  hashing seems like the canonical approach to uniqueness testing  where there's no need for ordering,
intuitionsuggests a performance  advantage, and so the least-astonishment principle suffers upon finding  it isn't
supported.

2. When developing a custom data type, it feels like tedious  busy-work to have to bang out a full set of ordering
operators for a btree operator class if there is no meaningful order for  the type.

Maybe the intuitions behind (1) are just misinformed, the performance
ones at least, in light of Craig Ringer's low opinion of whether "hash
indexes are better than btree for anything" [3], and André Barbosa's
more recent performance comparison [4] (which does show some advantages
for hash in some circumstances, but mostly not large. The only large
advantage was in initial creation; would that be hashsort.c at work?).

But then, both [3] and [4] predate the recent projects on hash indexes
that have "made them crash safe and are on the way to making them
performant" [5], so maybe an updated comparison would be timely, or some
addition to the docs to better characterize the circumstances where hash
could be good. (Every index method newer than btree and hash has its own
part VII Internals chapter; for completeness, might it make sense to have
those for btree and hash also, even if only to broadly discuss
the conditions under which they perform especially well or poorly?)

For all sorts of indexes, would there be any use for some CREATE INDEX
syntax for a multicolumn index to say that some of its rightmost columns
aren't there to participate in the indexing scheme, but only to benefit
index-only scans? Applied to a hash index, that might offer another useful
kind of multicolumn support, which otherwise seems limited to queries
where you have the exact values of all indexed columns.

Anyway, even if my performance assumption behind (1) was too optimistic,
the astonishment when a new user finds a hash can't support uniqueness
still seems real. A related astonishment is that a hash opclass can't
support DISTINCT, and that seems like something that could be changed
with much less work than making hash indexes amcanunique. Apparently,
array comparison already looks for a btree opclass but can fall back to
a hash one, if present, for equality comparisons. Would it be difficult
to do the same for DISTINCT?

As for my reason (2), the tedium of having to bang out btree operators
for a new type with no meaningful order (which, as I've just remembered,
is necessary not just for UNIQUE constraints but even just to make
SELECT DISTINCT work), maybe there's a solution that simply reduces
the tedium. After all, if a new type has no meaningful notion of order,
an arbitrary one imposed by copy/paste of some existing opclass
for a type with the same internallength might often be good enough.
Could there be some syntactic sugar for that, say,

CREATE OPERATOR CLASS btree_foo_ops
FOR TYPE foo USING btree LIKE int4_ops;

? A transformOpclassLike function could verify that foo and the opcintype
of int4_ops have the same typlen and typbyval, and that the operators and
support procs are backed by C functions, and return a list of
CREATE OPERATOR reusing the same functions, followed by the rewritten
CREATE OPERATOR CLASS.

Would it be helpful to link any part of these notes to the hash index
section of the TODO page?

-Chap

[1]: https://www.postgresql.org/message-id/1063758747.24276.29.camel%40tokyo
[2]: https://www.postgresql.org/message-id/23242.1454609521%40sss.pgh.pa.us
[3]: http://stackoverflow.com/questions/24568157   in a comment
[4]: http://blog.andrebarbosa.co/hash-indexes-on-postgres/
[5]:
https://www.postgresql.org/message-id/CA%2BTgmoZTMWkGdv8302RmvMNTdqhL9LkNzEXcswseUmhZ%3DO8wgA%40mail.gmail.com

Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Amit Kapila

Date:

22 May 2017, 12:16:38

On Mon, May 22, 2017 at 8:31 AM, Chapman Flack <chap@anastigmatix.net> wrote:
> On 05/19/17 11:41, Tom Lane wrote:
>
>> No, nobody's done anything about allowing hash indexes to support
>> uniqueness AFAIK.  I don't have a clear picture of how much work
>> it would be, but it would likely be more than trivial effort;
>
> I see what you mean. Because of the way hash values are ordered
> (to allow binary search) within a page, but not between pages of
> a bucket, insertion as it stands now is able to stop as soon as
> it finds any page with room for the entry, but a unique-insertion
> will have to check every page of the bucket for matching hashes,
> and then (because only the hash and tid are in the index) chase
> any of those to the heap to compare the value.
>
> Maybe both hash collisions and overflow pages are rare enough
> in practice with reasonable data that the performance impact
> of that would be small, but still the possibility has to be
> accounted for, the locking may get hairier (do you now keep
> the lock you have on the page where room was found for the entry,
> and use another lock to walk the remaining pages until sure
> there's no duplicate?).
>
> At least I see that interest in UNIQUE for hash indexes has been
> shown on -hackers several times over the years, and is on the TODO.
> Neil Conway seems to have had an idea [1] for making the locking work,
> 14 years ago (however relevant that might be to today's code).
>

I think that won't be directly applicable now as we have changed the
locking mechanism such that instead of acquiring heavy-weight locks it
uses LWLocks and for inserts and we don't hold it for more than one
bucket page at a time for inserts.  I think to make unique check work,
there are a couple of options like (a) unique insertions can always
acquire a cleanup lock on bucket page, this will work because scans
(to find duplicate values in overflow pages will hold a pin on bucket
page).  The drawback will be that unique insertions need to wait even
when there is some unrelated scan is in progress.  (b) unique
insertions can hold locks on bucket pages while traversing the bucket
chain.  It is bad to retain locks on multiple buckets for concurrency,
but in practise, unqiue indexes won't have many overflow pages. (c)
keep a flag in bucket page to indicate unique index insertion is in
progress and if such a flag is set, other insertions need to wait.  I
think this is not a preferable way because we need to take care of
clearing such a flag not only after the operation but also after crash
recovery.

Any better ideas?

> ... and one inquiry last year [2] did seem to get tabled because of the
> lack of WAL logging, which is now a non-blocker.
>
> I haven't seen much discussion of /why/ one would want hash-based UNIQUE.
> I know my own reasons, but I'm not sure how persuasive they are in light
> of the implementation realities, so maybe that makes such a discussion
> worthwhile. I can start; these are the two reasons I had:
>
> 1. To a naive intuition (especially one raised more on in-memory data
>    structures than the guts of databases), it just seems natural:
>    hashing seems like the canonical approach to uniqueness testing
>    where there's no need for ordering, intuition suggests a performance
>    advantage, and so the least-astonishment principle suffers upon finding
>    it isn't supported.
>
> 2. When developing a custom data type, it feels like tedious
>    busy-work to have to bang out a full set of ordering operators
>    for a btree operator class if there is no meaningful order for
>    the type.
>
> Maybe the intuitions behind (1) are just misinformed, the performance
> ones at least, in light of Craig Ringer's low opinion of whether "hash
> indexes are better than btree for anything" [3], and André Barbosa's
> more recent performance comparison [4] (which does show some advantages
> for hash in some circumstances, but mostly not large. The only large
> advantage was in initial creation; would that be hashsort.c at work?).
>
> But then, both [3] and [4] predate the recent projects on hash indexes
> that have "made them crash safe and are on the way to making them
> performant" [5], so maybe an updated comparison would be timely, or some
> addition to the docs to better characterize the circumstances where hash
> could be good.
>

I think the performance characteristics of hash indexes have changed
especially for read-only cases after recent work.  We have done many
tests which indicate that hash indexes perform better than btree when
the column value is unique.  You might want to try that based on your
usecase.

> (Every index method newer than btree and hash has its own
> part VII Internals chapter; for completeness, might it make sense to have
> those for btree and hash also, even if only to broadly discuss
> the conditions under which they perform especially well or poorly?)
>
> For all sorts of indexes, would there be any use for some CREATE INDEX
> syntax for a multicolumn index to say that some of its rightmost columns
> aren't there to participate in the indexing scheme, but only to benefit
> index-only scans? Applied to a hash index, that might offer another useful
> kind of multicolumn support, which otherwise seems limited to queries
> where you have the exact values of all indexed columns.
>

Agreed, but even if we have any such syntax, making it work for hash
indexes is tricky, because we currently store the hash code in the
index, not the original hash index key.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Chapman Flack

Date:

22 May 2017, 17:46:41

On 05/22/2017 05:16 AM, Amit Kapila wrote:
> Agreed, but even if we have any such syntax, making it work for hash
> indexes is tricky, because we currently store the hash code in the
> index, not the original hash index key.

That was what gave me the idea in the first place, which then
I realized could be more generally useful. If I could say
something like

CREATE INDEX ON foo USING btree ( bar, baz ALSO quux );

so that only bar and baz are compared in insertion and search,
but quux is along for the ride and available to index-only scans,
then the (admittedly weird) syntax

CREATE INDEX ON foo USING hash ( bar ALSO bar );

could be taken to mean that the value of bar as well as its hash
is wanted in the index. I was first thinking of that to save
unique-insert from running to the heap to check hash collisions,
though on second thought if collisions are common enough for that
to be a win, maybe the hash function's just wrong. It could still
be useful for index-only scans.

-Chap

Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Alvaro Herrera

Date:

23 May 2017, 01:39:18

Chapman Flack wrote:

> That was what gave me the idea in the first place, which then
> I realized could be more generally useful. If I could say
> something like
> 
> CREATE INDEX ON foo USING btree ( bar, baz ALSO quux );
> 
> so that only bar and baz are compared in insertion and search,
> but quux is along for the ride and available to index-only scans,

INCLUDING:
https://www.postgresql.org/message-id/56168952.4010101@postgrespro.ru

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Chapman Flack

Date:

23 May 2017, 02:54:57

On 05/22/17 18:39, Alvaro Herrera wrote:
> Chapman Flack wrote:
>> CREATE INDEX ON foo USING btree ( bar, baz ALSO quux );
>
> INCLUDING:
> https://www.postgresql.org/message-id/56168952.4010101@postgrespro.ru

I'd buy that.

-Chap

Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

From

Bruce Momjian

Date:

30 May 2017, 20:01:34

On Sun, May 21, 2017 at 11:01:57PM -0400, Chapman Flack wrote:
> ? A transformOpclassLike function could verify that foo and the opcintype
> of int4_ops have the same typlen and typbyval, and that the operators and
> support procs are backed by C functions, and return a list of
> CREATE OPERATOR reusing the same functions, followed by the rewritten
> CREATE OPERATOR CLASS.
> 
> Would it be helpful to link any part of these notes to the hash index
> section of the TODO page?

I added a link to this excellent summary email based on the first
paragraph alone:
Add UNIQUE capability to hash indexes 

then when I got to your suggestion at the bottom, it was already done. 
:-)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +