Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Hash Functions
Date
Msg-id CA+TgmobY_N+svUcuf+3q0sGd3Xan+wRaq-dkEbZdgbbdv2XF4Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Hash Functions  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: [HACKERS] Hash Functions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, May 16, 2017 at 4:25 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> Why can't hash partitions be stored in tables the same way as we do TOAST?
> That should take care of the naming problem.

Hmm, yeah, something like that could be done, but every place where
you are currently allowed to refer to a partition by name would have
to be be changed to accept some other syntax for specifying the
partition.  Even with all the things you propose to disallow, things
like CLUSTER, VACUUM, ANALYZE, etc. would still have to be accepted on
individual partitions.  I suspect even CREATE INDEX and DROP INDEX
would need to be accepted on individual partitions, because an index
on one partition somehow becomes bloated while the corresponding
indexes on other partitions are OK, you'll want to create a
replacement index concurrently and drop the old one.  Of course, for
similar reasons, you'd need some way for \d on the parent to display
information on indexes on all the children, and all of that output
would have to frobbed to use whatever syntax is now required, in lieu
of a name, to use an individual partition.  Error messages would have
to be adjusted in probably quite a few places to use the new notation,
too.  And on and on.  It's not impossible to do, but we could end up
chasing down loose ends for a very long time.

Beyond that, I think it's a bad idea to make hash partitions behave
completely differently from list and range partitions.  That's a lot
of code extra code to maintain, and a lot of extra notional complexity
for users, for really not a lot of benefit.  I think we're taking a
significant but not overwhelming problem -- our current hash functions
aren't portable -- and through a chain of logical links eventually
ending up with the conclusion that the design of partitioning needs to
be totally overhauled.  I want to resist that conclusion.  I'm not
saying that the problem isn't a problem, or that there's not some
logic to each step in the chain, but it's not that hard to blow a
small problem up into a huge one by assuming the worst possible
consequences or the tightest possible requirements at each step.
http://tvtropes.org/pmwiki/pmwiki.php/Main/ForWantOfANail is not an
argument for stricter regulation of the blacksmith industry.

>> If Java has portable hash functions, why can't we?
>
> Java standardizes on a particular unicode encoding (utf-16). Are you
> suggesting that we do the same? Or is there another solution that I am
> missing?

Well, I've already said several times (and Peter Eisentraut has
agreed) that we don't really need the hash functions to be portable
across encodings.  I think there are at least three good arguments for
that position.

First, as Peter Geoghegan points out, the word is increasingly
standardizing on Unicode, and that trend seems likely to continue.  I
strongly suspect that UTF-8 is the most common database encoding by a
wide margin.  There may occasionally be reasons to avoid it if, for
example, you're using one of the Eastern languages that doesn't play
entirely nicely with UTF-8, or if you happen to be storing a large
number of characters that can be represented in a single byte in some
other encoding but which require multiple bytes in UTF-8, but for an
awful lot of people UTF-8 just works and there's no need to think any
further.  So, a lot of people will never hit the problem of needing to
migrate a database between encodings because they'll just use UTF-8.

Second, if the previous argument turns out to be wrong and the world
abandons UTF-8 in favor of some new and better system (UTF-9?), or if
users frequently want to make some other encoding conversion like
Tom's original example of LATIN1 -> UTF-8, we've already got a
proposed workaround for that case which seems like it will work just
fine.  Just dump your data with pg_dump
--insert-hash-partitioned-data-into-parent and reload on the new
system.  This isn't absolutely guaranteed to work if you've done
something silly that will make the load of a particular row work on
one partition and fail on some other one, but you probably won't do
that because it would be dumb.  Also, it will be a bit slower than a
regular dump-and-reload cycle because tuple routing isn't free.
Neither of these problems really sound very bad.  If we're going to
start fixing things that could cause database migrations/upgrades to
occasionally fail in corner cases or run more slowly than expected,
there's a long list of things that you can do in your DDL that will
make pg_upgrade bomb out, and many of them are things that bite users
with some regularity (e.g. tablespaces inside the data directory or
other tablespaces, dependencies on system objects that are changed in
the new version, ALTER USER .. SET ROLE).  For whatever reason, we
haven't viewed those warts as really high-priority items in need of
fixing; in some cases, instead of actually trying to improve
usability, we've all but mocked the people reporting those issues for
having the temerity to do configure the system in a way that we think
isn't very smart.  How can we then turn around and say "it's
absolutely unacceptable for there to be any case where dump and reload
fails even if there's a workaround switch for pg_dump that will almost
always solve the problem"?  That's holding hash partitioning to a
stricter standard than our existing features, which seems unjustified.

Third, the fact that we support multiple encodings ought to be a
strong point of PostgreSQL, not an excuse for running away from
features.  This same issue came up with JSON support, and the question
was, well, what do we do about the places where JSON assumes that
everything is UTF-8, like in \uXXXX escape sequences, given that we
support multiple encodings?  We could have answered that question by
giving up and saying that JSON support is just too hard in our
multi-encoding environment, but we found some solution that let us
move forward.  I think the result is that JSON is not really quite
per-spec unless you are using UTF-8 as your encoding, but I am
confident that it's better to have made that trade-off than to just
not support JSON at all.  Similarly, I think that if we use the fact
that we support multiple encodings either as an excuse for why we
shouldn't have hash partitioning at all or for why the design needs to
be vastly more complex than would otherwise be required, we've made a
liability of what should be an asset.  Tom's willing to make those
arguments, but he doesn't think hash partitioning is worthwhile in the
first place.  If we believe it is worthwhile (and I do), then we
should be looking for a set of design decisions that make implementing
it reasonably practical, maybe with some trade-offs, rather than a set
of decisions that increase the complexity to the point where it will
take us 5-10 years to unravel all the problems.

Also, it's worth noting that our decision here doesn't need to be
monolithic.  If we want, we can ship one opfamily that hashes strings
in the obvious way - i.e. fast but not portable across encodings - and
another that converts the input string to UTF-8 and then hashes the
result.  The latter will be somewhat slow, but it will be portable
across encodings, and I don't have any problem at all with providing
it if somebody wants to do the legwork.  It'll also fail in any
encodings that can't be converted to UTF-8, but it seems a bit much to
insist that hash partitioning must solve a problem that the Unicode
consortium hasn't managed to overcome.  Anyway, if we do something
like that, then users can choose whether they want the portability for
which Tom advocates or the speed which I believe most users will
prefer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Piotr Stefaniak
Date:
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advanceof pgindent run.
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] [POC] hash partitioning