Re: pg_dump versus hash partitioning - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_dump versus hash partitioning
Date
Msg-id 1416220.1675285933@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_dump versus hash partitioning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_dump versus hash partitioning
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 1, 2023 at 1:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the meantime, I think we need to recognize that hash values are
>> not very portable.  I do not think we do our users a service by
>> letting them discover the corner cases the hard way.

> I think you're not really engaging with the argument that "not
> completely portable" and "totally broken" are two different things,
> and I still think that's an important point here.

I don't buy that argument.  From the user's perspective, it's broken
if her use-case fails.  "It works for most people" is cold comfort,
most especially so if there's no convenient path to fixing it after
a failure.

> I don't think the fact that our *traditional* standard for how stable
> a hash function needs to be has been XYZ carries any water.

Well, it wouldn't need to if we had a practical way of changing the
behavior of an existing hash function, but guess what: we don't.
Andrew's original proposal for fixing this was exactly to change the
behavior of hashenum().  There were some problems with the idea of
depending on enumsortorder instead of enum OID, but the really
fundamental issue is that you can't change hashing behavior without
breaking pg_upgrade completely.  Not only will your hash indexes be
corrupt, but your hash-partitioned tables will be broken, in exactly
the same way that we're trying to solve for dump/reload cases (which
of course will *also* be broken by redefining the hash function, if
you didn't use --load-via-partition-root).  Moreover, while we can
always advise people to reindex, there's no similarly easy way to fix
broken partitioning.

That being the case, I don't think moving the goalposts for hash
function stability is going to lead to a workable solution.

> On the question of whether hash partitioning is a good feature in
> general, I can only say that I disagree with what seems to be your
> position, which as best as I can tell is "it sucks and we should kill
> it with fire".

As I said, I'm not prepared to litigate that case today ... but
I do have a sneaking suspicion that we will eventually reach that
conclusion.  In any case, if we don't want to reach that conclusion,
we need some practical solution to these dump/reload problems.
Have you got a better idea than --load-via-partition-root?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: recovery modules
Next
From: Peter Geoghegan
Date:
Subject: Re: pg_dump versus hash partitioning