Re: Declarative partitioning - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Declarative partitioning
Date
Msg-id CADkLM=f1zyL-cV_LK8GpXv=iS_=VbHs8bernUMa+8yHtUXjebQ@mail.gmail.com
Whole thread Raw
In response to Re: Declarative partitioning  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: Declarative partitioning
List pgsql-hackers
Hm, I see.  How about multi-column keys?  Do we care enough about that use
case?  I don't see a nice-enough-looking range literal for such keys.
Consider for instance,

With the partitioned table defined as:

CREATE TABLE foo(c1 char(2), c2 char(2)) PARTITION BY RANGE (c1, c2);

Good question! I would assume that we'd use a syntax that presumes c1 and c2 are a hypothetical composite type. But what does that look like?

To answer it, I tried this:

# create type duple as (a text, b text);
CREATE TYPE
# create type duplerange as range (subtype = duple);
CREATE TYPE
# select '(beebop,alula)'::duple;
     duple
----------------
 (beebop,alula)
(1 row)

# select '("hey ho","letsgo")'::duple;
       duple
-------------------
 ("hey ho",letsgo)
(1 row)

analytics=# select duplerange('(beebop,alula)','("hey ho","letsgo")','(]');
                duplerange
------------------------------------------
 ("(beebop,alula)","(""hey ho"",letsgo)"]
(1 row) 

So I would assume that we'd use a syntax that presumed the columns were in a composite range type.

Which means your creates would look like (following Robert Haas's implied suggestion that we leave off the string literal quotes):

CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES (      , (b,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,2), (b,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,3), (b,4) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,4), (c,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,2), (c,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,3), (c,4) );

That's not terrible looking.

We would want to also think about what subset of many permutations of this
syntax to accept range specs for new partitions.  Mostly to preserve the
non-overlapping invariant and I think it would also be nice to prevent gaps.

Gaps *might* be intentional. I can certainly see where we'd want to set up warnings for discontinuity, or perhaps some utility functions:
    pg_partitions_ranges_are_continuous('master_table_name')
    pg_partitions_are_adjacent('master_table_name','p1','p2')

But for the most part, range partitions evolve from splits when one partition grows too big, so that won't be such a problem.


Consider that once we create:

PARTITION FOR VALUES [current_date,);

Now to create a new partition starting at later date, we have to have a
"split partition" feature which would scan the above partition to
distribute the existing data rows appropriately to the resulting two
partitions. Right?

Correct. And depending on policy, that might be desirable and might be not.
If the table were for death records, we'd very much want to reject rows in the future, if only to avoid upsetting the person.
If the table were of movie release dates, we'd *expect* that only dates (,current_date] would be entered, but if someone chose to leak a release date, we'd want to capture that and deal with it later.
So yeah, we're going to (eventually) need a SPLIT PARTITION that migrates rows to a new partition.


IOW, one shouldn't create an unbounded partition if more partitions in the
unbounded direction are expected to be created.  It would be OK for
unbounded partitions to be on the lower end most of the times.

On this I'll have to disagree. My own use case where I use my range_partitioning extension starts off with a single partition () and all new partitions are splits of that. The ranges evolve over time as partitions grow and slow down. It's nice because we're not trying to predict where growth will be, we split where growth is.
 

> p.s. Sorry I haven't been able to kick the tires just yet. We have a very
> good use case for this, it's just a matter of getting a machine and the
> time to devote to it.

I would appreciate it.  You could wait a little more for my next
submission which will contain some revisions to the tuple routing code.


Ok, I'll wait a bit. In the mean time I can tell you a bit about the existing production system I'm hoping to replicate in true partitioning looks like this:

Big Master Table:
     Range partition by C collated text
           Date Range 
           Date Range
           ...
     Range partition by C collated text
           Date Range
           Date Range
           ...
    ...

Currently this is accomplished through my range_partitioning module, and then using pg_partman on those partitions. It works, but it's a lot of moving parts.

The machine will be a 32 core AWS box. As per usual with AWS, it will be have ample memory and CPU, and be somewhat starved for I/O.

Question: I haven't dove into the code, but I was curious about your tuple routing algorithm. Is there any way for the algorithm to begin it's scan of candidate partitions based on the destination of the last row inserted this statement? I ask because most use cases (that I am aware of) have data that would naturally cluster in the same partition.

pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Proposal: Generic WAL logical messages
Next
From: Pavel Stehule
Date:
Subject: Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types