Re: On partitioning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: On partitioning
Date
Msg-id 015b01d00eab$56d37700$047a6500$@lab.ntt.co.jp
Whole thread Raw
In response to Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: On partitioning  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: On partitioning  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: On partitioning  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi Robert,

From: Robert Haas [mailto:robertmhaas@gmail.com]
> > * Catalog schema:
> >
> > CREATE TABLE pg_catalog.pg_partitioned_rel
> > (
> >    partrelid                oid    NOT NULL,
> >    partkind                oid    NOT NULL,
> >    partissub              bool  NOT NULL,
> >    partkey                 int2vector NOT NULL, -- partitioning attributes
> >    partopclass         oidvector,
> >
> >    PRIMARY KEY (partrelid, partissub),
> >    FOREIGN KEY (partrelid)   REFERENCES pg_class (oid),
> >    FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid)
> > )
> > WITHOUT OIDS ;
>
> So, we're going to support exactly two levels of partitioning?
> partitions with partissub=false and subpartitions with partissub=true?
>  Why not support only one level of partitioning here but then let the
> children have their own pg_partitioned_rel entries if they are
> subpartitioned?  That seems like a cleaner design and lets us support
> an arbitrary number of partitioning levels if we ever need them.
>

Yeah, that's what I thought at some point in favour of dropping partissub altogether.

However, not that this design solves it, there is one question - if we would want to support defining for a table both
partitionkey and sub-partition key in advance? That is, without having defined a first level partition yet; in that
case,what level do we associate sub-(sub-) partitioning key with or more to the point where do we keep it? One way is
toreplace partissub by partkeylevel with level 0 being the topmost-level partitioning key and so on while keeping the
partrelidequal to the pg_class.oid of the parent. That brings us to next question of managing hierarchies in
pg_partition_defcorresponding to partkeylevel in the definition of topmost partitioned relation. But I guess those are
implementationdetails rather than representational unless I am being too naïve. 

> > CREATE TABLE pg_catalog.pg_partition_def
> > (
> >    partitionid                      oid     NOT NULL,
> >    partitionparentrel       oid    NOT NULL,
> >    partitionisoverflow     bool  NOT NULL,
> >    partitionvalues             anyarray,
> >
> >    PRIMARY KEY (partitionid),
> >    FOREIGN KEY (partitionid) REFERENCES pg_class(oid)
> > )
> > WITHOUT OIDS;
> >
> > ALTER TABLE pg_catalog.pg_class ADD COLUMN relispartitioned;
>
> What is an overflow partition and why do we want that?
>

That would be a default partition. That is, where the tuples that don't belong elsewhere (other defined partitions) go.
VALUESclause of the definition for such a partition would look like: 

(a range partition) ... VALUES LESS THAN MAXVALUE
(a list partition) ... VALUES DEFAULT

There has been discussion about whether there shouldn't be such a place for tuples to go. That is, it should generate
anerror if a tuple can't go anywhere (or support auto-creating a new one like in interval partitioning?) 

> What are you going to do if the partitioning key has two columns of
> different data types?
>

Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought. They are one of the most crucial elements
ofthe scheme. 

I wonder if your suggestion of pg_node_tree plays well here. This then could be a list of CONSTs or some such... And I
amthinking it's a concern only for range partitions, no? (that is, a multicolumn partition key) 

I think partkind switches the interpretation of the field as appropriate. Am I missing something? By the way, I had
mentionedwe could have two values fields each for range and list partition kind. 

> > * DDL syntax (no multi-column partitioning, sub-partitioning support as yet):
> >
> > -- create partitioned table and child partitions at once.
> > CREATE TABLE parent (...)
> > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ]
> > [ (
> >      PARTITION child
> >        {
> >            VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
> >          | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
> >        }
> >        [ WITH ( ... ) ] [ TABLESPACE tbs ]
> >      [, ...]
> >   ) ] ;
>
> How are you going to dump and restore this, bearing in mind that you
> have to preserve a bunch of OIDs across pg_upgrade?  What if somebody
> wants to do pg_dump --table name_of_a_partition?
>

Assuming everything's (including partitioned relation and partitions at all levels) got a pg_class entry of its own,
wouldOIDs be a problem? Or what is the nature of this problem if it's possible that it may be. 

If someone pg_dump's an individual partition as a table, we could let it be dumped as just a plain table. I am thinking
weshould be able to do that or should be doing just that (?) 

> I actually think it will be much cleaner to declare the parent first
> and then have separate CREATE TABLE statements that glue the children
> in, like CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1,
> 10000).
>

Oh, do you mean to do away without any syntax for defining partitions with CREATE TABLE parent?

By the way, do you mean the following:

CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1, 10000)

Instead of,

CREATE PARTITION child ON parent VALUES LESS THAN 10000?

And as for the dump of a partitioned table, it does sound cleaner to do it piece by piece starting with the parent and
itspartitioning key (as ALTER on it?) followed by individual partitions using either of the syntax above. Moreover we
dumpa sub-partition as a partition on its parent partition. 

Thanks for your time and valuable input.

Regards,
Amit





pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Noah Misch
Date:
Subject: Re: using Core Foundation locale functions