Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Hash Functions
Date
Msg-id 5544.1494614747@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Hash Functions  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, May 12, 2017 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd vote that it's not, which means that this whole approach to hash
>> partitioning is unworkable.  I agree with Andres that demanding hash
>> functions produce architecture-independent values will not fly.

> If we can't produce architecture-independent hash values, then what's
> the other option?

Forget hash partitioning.  There's no law saying that that's a good
idea and we have to have it.  With a different set of constraints,
maybe we could do it, but I think the existing design decisions have
basically locked it out --- and I doubt that hash partitioning is so
valuable that we should jettison other desirable properties to get it.

> One alternative would be to change the way that we dump and restore
> the data.  Instead of dumping the data with the individual partitions,
> dump it all out for the parent and let tuple routing sort it out at
> restore time.  Of course, this isn't very satisfying because now
> dump-and-restore hasn't really preserved the state of the database;
> indeed, you could make it into a hard failure by creating a foreign
> key referencing a partition hash partition.

Yeah, that isn't really appetizing at all.  If we were doing physical
partitioning below the user-visible level, we could make it fly.
But the existing design makes the partition boundaries user-visible
which means we have to insist that the partitioning rule is immutable
(in the strongest sense of being invariant across all installations
you could possibly wish to transport data between).

Now, we already have some issues of that sort with range partitioning;
if say you try to transport data to another system with a different
sort ordering, you have probably got problems.  But that issue already
existed with the old manual partitioning approach, ie if you have a
CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer
to such a system could fail already.  So I'm okay with that.  But it
seems like hash partitioning is going to introduce new, exciting, and
hard-to-document-or-avoid portability gotchas.  Do we really need
to go there?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Hash Functions
Next
From: Kenneth Marshall
Date:
Subject: Re: [HACKERS] Hash Functions