Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Hash Functions
Date
Msg-id CA+TgmoZ-WCDxkjiHfzKv4Fa18NK-M2d7g1-1pGK2EKMTqGJ1GA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Hash Functions  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: [HACKERS] Hash Functions  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Tue, May 16, 2017 at 11:10 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> With hash partitioning:
> * User only specifies number of partitions of the parent table; does
> not specify individual partition properties (modulus, etc.)
> * Dump/reload goes through the parent table (though we may provide
> options so pg_dump/restore can optimize this)
> * We could provide syntax to adjust the number of partitions, which
> would be expensive but still useful sometimes.
> * All DDL should be on the parent table, including check constraints,
> FKs, unique constraints, exclusion constraints, indexes, etc.
>   - Unique and exclusion constraints would only be permitted if the
> keys are a superset of the partition keys.
>   - FKs would only be permitted if the two table's partition schemes
> match and the keys are members of the same hash opfamily (this could
> be relaxed slightly, but it gets a little confusing if so)
> * No attach/detach of partitions
> * All partitions have the same permissions
> * Individual partitions would only be individually-addressable for
> maintenance (like reindex and vacuum), but not for arbitrary queries
>   - perhaps also COPY for bulk loading/dumping, in case we get clients
> smart enough to do their own hashing.

I don't really find this a very practical design.  If the table
partitions are spread across different relfilenodes, then those
relfilenodes have to have separate pg_class entries and separate
indexes, and those indexes also need to have separate pg_class
entries.  Otherwise, nothing works.  And if they do have separate
pg_class entries, then the partitions have to have their own names,
and likewise for their indexes, and a dump-and-reload has to preserve
those names.  If it doesn't, and those objects get new system-assigned
names after the dump-and-reload, then dump restoration can fail when a
system-assigned name collides with an existing name that is first
mentioned later in the dump.

If we had the ability to have anonymous pg_class entries -- relations
that have no names -- then maybe it would be possible to make
something like what you're talking about work.  But that does not seem
easy to do.  There's a unique index on (relname, relnamespace) for
good reason, and we can't make it partial on a system catalog.  We
could make the relname column allow nulls, but that would add overhead
to any code that needs to access the relation name, and there's a fair
amount of that.

Similarly, if we had the ability to associate multiple relfilenodes
with a single relation, and if index entries could point to
<which-relfilenode, block, offset> rather than just <block, offset>,
then we could also make this work.  But either of those things would
require significant re-engineering and would have downsides in other
cases.

If Java has portable hash functions, why can't we?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Race conditions with WAL sender PID lookups
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Adding support for Default partition in partitioning