Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: [HACKERS] Hash Functions
Date
Msg-id CAMp0ubcQ3VYdU1kNUCOmpj225U4hk6ZEoaUVeReP8h60p+mv1Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Hash Functions  (David Fetter <david@fetter.org>)
Responses Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Hash Functions  (David Fetter <david@fetter.org>)
Re: [HACKERS] Hash Functions  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Re: [HACKERS] Hash Functions  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On Mon, May 15, 2017 at 1:04 PM, David Fetter <david@fetter.org> wrote:
> As the discussion has devolved here, it appears that there are, at
> least conceptually, two fundamentally different classes of partition:
> public, which is to say meaningful to DB clients, and "private", used
> for optimizations, but otherwise opaque to DB clients.
>
> Mashing those two cases together appears to cause more problems than
> it solves.

I concur at this point. I originally thought hash functions might be
made portable, but I think Tom and Andres showed that to be too
problematic -- the issue with different encodings is the real killer.

But I also believe hash partitioning is important and we shouldn't
give up on it yet.

That means we need to have a concept of hash partitions that's
different from range/list partitioning. The terminology
"public"/"private" does not seem appropriate. Logical/physical or
external/internal might be better.

With hash partitioning:
* User only specifies number of partitions of the parent table; does
not specify individual partition properties (modulus, etc.)
* Dump/reload goes through the parent table (though we may provide
options so pg_dump/restore can optimize this)
* We could provide syntax to adjust the number of partitions, which
would be expensive but still useful sometimes.
* All DDL should be on the parent table, including check constraints,
FKs, unique constraints, exclusion constraints, indexes, etc. - Unique and exclusion constraints would only be
permittedif the
 
keys are a superset of the partition keys. - FKs would only be permitted if the two table's partition schemes
match and the keys are members of the same hash opfamily (this could
be relaxed slightly, but it gets a little confusing if so)
* No attach/detach of partitions
* All partitions have the same permissions
* Individual partitions would only be individually-addressable for
maintenance (like reindex and vacuum), but not for arbitrary queries - perhaps also COPY for bulk loading/dumping, in
casewe get clients
 
smart enough to do their own hashing.

The only real downside is that it could surprise users -- why can I
add a CHECK constraint on my range-partitioned table but not the
hash-partitioned one? We should try to document this so users don't
find that out too far along. As long as they aren't surprised, I think
users will understand why these aren't quite the same concepts.

Regards,    Jeff Davis



pgsql-hackers by date:

Previous
From: Ildus Kurbangaliev
Date:
Subject: Re: [HACKERS] Bug in ExecModifyTable function and trigger issuesfor foreign tables
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Race conditions with WAL sender PID lookups