Re: On partitioning - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: On partitioning
Date
Msg-id 20141113063944.GY28859@tamriel.snowman.net
Whole thread Raw
In response to Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: On partitioning  ("Amit Langote" <Langote_Amit_f8@lab.ntt.co.jp>)
Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Nov 12, 2014 at 5:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> Maybe as anyarray, but I think pg_node_tree
> >> might even be better.  That can also represent data of some arbitrary
> >> type, but it doesn't enforce that everything is uniform.
> >
> > Of course, the more general you make it, the more likely that it'll be
> > impossible to optimize well.

Agreed- a node tree seems a bit too far to make this really work well..
But, I'm curious what you were thinking specifically?  A node tree which
accepts an "argument" of the constant used in the original query and
then spits back a table might work reasonably well for that case- but
with declarative partitioning, I expect us to eventually be able to
eliminate complete partitions from consideration on both sides of a
partition-table join and optimize cases where we have two partitioned
tables being joined with a compatible join key and only actually do
joins between the partitions which overlap each other.  I don't see
those happening if we're allowing a node tree (only).  If having a node
tree is just one option among other partitioning options, then we can
provide users with the ability to choose what suits their particular
needs.

> The point for me is just that range and list partitioning probably
> need different structure, and hash partitioning, if we want to support
> that, needs something else again.  Range partitioning needs an array
> of partition boundaries and an array of child OIDs.  List partitioning
> needs an array of specific values and a child table OID for each.
> Hash partitioning needs something probably quite different.  We might
> be able to do it as a pair of arrays - one of type anyarray and one of
> type OID - and meet all needs that way.

I agree that these will require different structures in the catalog..
While reviewing the superuser checks, I expected to have a similar need
and discussed various options- having multiple catalog tables, having a
single table with multiple columns, having a single table with a 'type'
column and then a bytea blob.  In the end, it wasn't really necessary as
the only thing which I expected to need more than 'yes/no' were the
directory permissions (which it looks like might end up killed anyway,
much to my sadness..), but while considering the options, I continued to
feel like anything but independent tables was hacking around to try and
reduce the number of inodes used for folks who don't actually use these
features, and that's a terrible reason to complicate the catalog and
code, in my view.

It occurs to me that we might be able to come up with a better way to
address the inode concern and therefore ignore it.  There are other
considerations to having more catalog tables, but declarative
partitioning is an important enough feature, in my view, that I wouldn't
care if it required 10 catalog tables to implement.  Misrepresenting it
with a catalog that's got a bunch of columns, all but one of which are
NULL, or by using essentially removing the knowledge of the data type
from the system by using a type column with some binary blob, isn't
doing ourselves or our users any favors.  That's not to say that I'm
against a solution which only needs one catalog table, but let's not
completely throw away proper structure because of inode or other
resource consideration issues.  We have quite a few other catalog tables
which are rarely used and it'd be good to address the issue with those
consuming resources independently.

I'm not a fan of using pg_class- there are a number of columns in there
which I would *not* wish to be allowed to be different per partition
(starting with relowner and relacl...).  Making those NULL would be just
as bad (probably worse, really, since we'd also need to add new columns
to pg_class to indicate the partitioning...) as having a sparsely
populated new catalog table.
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: inherit support for foreign tables
Next
From: Etsuro Fujita
Date:
Subject: Re: inherit support for foreign tables