Thread: multicolumn partitioning help

multicolumn partitioning help

From
James Robertson
Date:
Hey folks,
I am having issues with multicolumn partitioning. For reference I am using the following link as my guide:
https://www.postgresql.org/docs/devel/sql-createtable.html

To demonstrate my problem, I created a simple table called humans. I want to partition by the year of the human birth and then the first character of the hash. So for each year I'll have year*16 partitions. (hex)

CREATE TABLE humans (
    hash bytea,
    fname text,
    dob date
    )PARTITION BY RANGE (EXTRACT(YEAR FROM dob),substring(hash::text, 1, 1));
   
Reading the documentation:   "When creating a range partition, the lower bound specified with FROM is an inclusive bound, whereas the upper bound specified with TO is an exclusive bound".

However I can't insert any of the following after the first one, because it says it overlaps. Do I need to do anything different when defining multi-column partitions?


This works:
CREATE TABLE humans_1968_0 PARTITION OF humans FOR VALUES FROM (1968, '0') TO (1969, '1');


These fail: 
CREATE TABLE humans_1968_1 PARTITION OF humans FOR VALUES FROM (1968, '1') TO (1969, '2');
CREATE TABLE humans_1968_2 PARTITION OF humans FOR VALUES FROM (1968, '2') TO (1969, '3');
CREATE TABLE humans_1968_3 PARTITION OF humans FOR VALUES FROM (1968, '3') TO (1969, '4');
CREATE TABLE humans_1968_4 PARTITION OF humans FOR VALUES FROM (1968, '4') TO (1969, '5');
CREATE TABLE humans_1968_5 PARTITION OF humans FOR VALUES FROM (1968, '5') TO (1969, '6');
CREATE TABLE humans_1968_6 PARTITION OF humans FOR VALUES FROM (1968, '6') TO (1969, '7');
CREATE TABLE humans_1968_7 PARTITION OF humans FOR VALUES FROM (1968, '7') TO (1969, '8');
CREATE TABLE humans_1968_8 PARTITION OF humans FOR VALUES FROM (1968, '8') TO (1969, '9');
CREATE TABLE humans_1968_9 PARTITION OF humans FOR VALUES FROM (1968, '9') TO (1969, 'a');
CREATE TABLE humans_1968_a PARTITION OF humans FOR VALUES FROM (1968, 'a') TO (1969, 'b');
CREATE TABLE humans_1968_b PARTITION OF humans FOR VALUES FROM (1968, 'b') TO (1969, 'c');
CREATE TABLE humans_1968_c PARTITION OF humans FOR VALUES FROM (1968, 'c') TO (1969, 'd');
CREATE TABLE humans_1968_d PARTITION OF humans FOR VALUES FROM (1968, 'd') TO (1969, 'e');
CREATE TABLE humans_1968_e PARTITION OF humans FOR VALUES FROM (1968, 'e') TO (1969, 'f');
CREATE TABLE humans_1968_f PARTITION OF humans FOR VALUES FROM (1968, 'f') TO (1969, 'g');
CREATE TABLE humans_1969_0 PARTITION OF humans FOR VALUES FROM (1969, '0') TO (1970, '1');
CREATE TABLE humans_1969_1 PARTITION OF humans FOR VALUES FROM (1969, '1') TO (1970, '2');
CREATE TABLE humans_1969_2 PARTITION OF humans FOR VALUES FROM (1969, '2') TO (1970, '3');
CREATE TABLE humans_1969_3 PARTITION OF humans FOR VALUES FROM (1969, '3') TO (1970, '4');
CREATE TABLE humans_1969_4 PARTITION OF humans FOR VALUES FROM (1969, '4') TO (1970, '5');
CREATE TABLE humans_1969_5 PARTITION OF humans FOR VALUES FROM (1969, '5') TO (1970, '6');
CREATE TABLE humans_1969_6 PARTITION OF humans FOR VALUES FROM (1969, '6') TO (1970, '7');
CREATE TABLE humans_1969_7 PARTITION OF humans FOR VALUES FROM (1969, '7') TO (1970, '8');
CREATE TABLE humans_1969_8 PARTITION OF humans FOR VALUES FROM (1969, '8') TO (1970, '9');
CREATE TABLE humans_1969_9 PARTITION OF humans FOR VALUES FROM (1969, '9') TO (1970, 'a');
CREATE TABLE humans_1969_a PARTITION OF humans FOR VALUES FROM (1969, 'a') TO (1970, 'b');
CREATE TABLE humans_1969_b PARTITION OF humans FOR VALUES FROM (1969, 'b') TO (1970, 'c');
CREATE TABLE humans_1969_c PARTITION OF humans FOR VALUES FROM (1969, 'c') TO (1970, 'd');
CREATE TABLE humans_1969_d PARTITION OF humans FOR VALUES FROM (1969, 'd') TO (1970, 'e');
CREATE TABLE humans_1969_e PARTITION OF humans FOR VALUES FROM (1969, 'e') TO (1970, 'f');
CREATE TABLE humans_1969_f PARTITION OF humans FOR VALUES FROM (1969, 'f') TO (1970, 'g');

Thank you for reviewing this problem.

Re: multicolumn partitioning help

From
Justin Pryzby
Date:
On Sun, Mar 12, 2023 at 01:59:32PM -0400, James Robertson wrote:
> Hey folks,
> I am having issues with multicolumn partitioning. For reference I am using
> the following link as my guide:
> https://www.postgresql.org/docs/devel/sql-createtable.html
> 
> Reading the documentation:   "When creating a range partition, the lower
> bound specified with FROM is an inclusive bound, whereas the upper bound
> specified with TO is an exclusive bound".
> 
> However I can't insert any of the following after the first one, because it
> says it overlaps. Do I need to do anything different when defining
> multi-column partitions?

The bounds are compared like rows:

When creating a range partition, the lower bound specified with FROM is
an inclusive bound, whereas the upper bound specified with TO is an
exclusive bound. That is, the values specified in the FROM list are
valid values of the corresponding partition key columns for this
partition, whereas those in the TO list are not. Note that this
statement must be understood according to the rules of row-wise
comparison (Section 9.24.5). For example, given PARTITION BY RANGE
(x,y), a partition bound FROM (1, 2) TO (3, 4) allows x=1 with any y>=2,
x=2 with any non-null y, and x=3 with any y<4.

https://www.postgresql.org/docs/current/functions-comparisons.html#ROW-WISE-COMPARISON

> This works:
> CREATE TABLE humans_1968_0 PARTITION OF humans FOR VALUES FROM (1968, '0')
> TO (1969, '1');

This table is everything from 1968 (starting with '0') to 1969

> These fail:
> CREATE TABLE humans_1968_1 PARTITION OF humans FOR VALUES FROM (1968, '1')
> TO (1969, '2');

Which is why these are overlapping.

> CREATE TABLE humans_1969_1 PARTITION OF humans FOR VALUES FROM (1969, '1')
> TO (1970, '2');

This one doesn't fail, but it "occupies" / subjugates all of 1969
starting with 1.

-- 
Justin



Re: multicolumn partitioning help

From
Laurenz Albe
Date:
On Sun, 2023-03-12 at 13:59 -0400, James Robertson wrote:
> I am having issues with multicolumn partitioning. For reference I am using the following link as my guide:
> https://www.postgresql.org/docs/devel/sql-createtable.html
>
> To demonstrate my problem, I created a simple table called humans. I want to partition by the year
> of the human birth and then the first character of the hash. So for each year I'll have year*16 partitions. (hex)
>
> CREATE TABLE humans (
>     hash bytea,
>     fname text,
>     dob date
>     )PARTITION BY RANGE (EXTRACT(YEAR FROM dob),substring(hash::text, 1, 1));
>    
> Reading the documentation:   "When creating a range partition, the lower bound specified with
> FROM is an inclusive bound, whereas the upper bound specified with TO is an exclusive bound".
>
> However I can't insert any of the following after the first one, because it says it overlaps.
> Do I need to do anything different when defining multi-column partitions?
>
>
> This works:
> CREATE TABLE humans_1968_0 PARTITION OF humans FOR VALUES FROM (1968, '0') TO (1969, '1');
>
>
> These fail: 
> CREATE TABLE humans_1968_1 PARTITION OF humans FOR VALUES FROM (1968, '1') TO (1969, '2');

Justin has explained what the problem is, let me supply a solution.

I think you want subpartitioning, like

  CREATE TABLE humans (
     hash bytea,
     fname text,
     dob date
  ) PARTITION BY LIST (EXTRACT (YEAR FROM dob));

  CREATE TABLE humans_2002
     PARTITION OF humans FOR VALUES IN (2002)
     PARTITION BY HASH (hash);

  CREATE TABLE humans_2002_0
     PARTITION OF humans_2002 FOR VALUES WITH (MODULUS 26, REMAINDER 0);

  [...]

  CREATE TABLE humans_2002_25
     PARTITION OF humans_2002 FOR VALUES WITH (MODULUS 26, REMAINDER 25);

and so on for the other years.

Yours,
Laurenz Albe



Re: multicolumn partitioning help

From
James Robertson
Date:
Laurenz, Justin,
Thank you both for thinking of this problem.

Laurenz your solution is how I thought I would work around my (lack of) understanding of partitioning. (nested partitions).
I was hesitant because I didn't know what sort of performance problems I would create for myself.

If we have true multi-column don't we get the benefit of:

TopLevelTable
|
|----> worker-thread 1
|
|----> worker-thread 2
|
|----> worker-thread n

Doesn't that give me more performance than:

TopLevelTable
|
|----> worker-thread 1
........|----> sub-table 1.1
........|----> sub-table 1.2
........|----> sub-table 1.n
|
|----> worker-thread 2
........|----> sub-table 2.1
........|----> sub-table 2.2
........|----> sub-table 2.n

or do we get?

TopLevelTable
|
|----> worker-thread 1 (default catch)
........|----> worker thread 2 -> sub-table 1.1
........|----> worker thread 3 -> sub-table 1.2
........|----> worker thread 4 -> sub-table 1.n
|
|----> worker-thread 5 (default catch)
........|----> worker thread 6 -> sub-table 2.1
........|----> worker thread 7 -> sub-table 2.2
........|----> worker thread 8 -> sub-table 2.n


Summary: 
1) if we create nested partitions, do we create performance issues:
2) if nested partitions are the solutions, what is the point of multi-column partitioning? 


wish list) wouldn't it be neat if we can do mult-mode multi-column? like PARTITION BY RANGE (EXTRACT(YEAR FROM dob)) LIST (SUBSTRING(hash, 1, 1));

On Tue, Mar 14, 2023 at 5:41 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sun, 2023-03-12 at 13:59 -0400, James Robertson wrote:
> I am having issues with multicolumn partitioning. For reference I am using the following link as my guide:
> https://www.postgresql.org/docs/devel/sql-createtable.html
>
> To demonstrate my problem, I created a simple table called humans. I want to partition by the year
> of the human birth and then the first character of the hash. So for each year I'll have year*16 partitions. (hex)
>
> CREATE TABLE humans (
>     hash bytea,
>     fname text,
>     dob date
>     )PARTITION BY RANGE (EXTRACT(YEAR FROM dob),substring(hash::text, 1, 1));
>    
> Reading the documentation:   "When creating a range partition, the lower bound specified with
> FROM is an inclusive bound, whereas the upper bound specified with TO is an exclusive bound".
>
> However I can't insert any of the following after the first one, because it says it overlaps.
> Do I need to do anything different when defining multi-column partitions?
>
>
> This works:
> CREATE TABLE humans_1968_0 PARTITION OF humans FOR VALUES FROM (1968, '0') TO (1969, '1');
>
>
> These fail: 
> CREATE TABLE humans_1968_1 PARTITION OF humans FOR VALUES FROM (1968, '1') TO (1969, '2');

Justin has explained what the problem is, let me supply a solution.

I think you want subpartitioning, like

  CREATE TABLE humans (
     hash bytea,
     fname text,
     dob date
  ) PARTITION BY LIST (EXTRACT (YEAR FROM dob));

  CREATE TABLE humans_2002
     PARTITION OF humans FOR VALUES IN (2002)
     PARTITION BY HASH (hash);

  CREATE TABLE humans_2002_0
     PARTITION OF humans_2002 FOR VALUES WITH (MODULUS 26, REMAINDER 0);

  [...]

  CREATE TABLE humans_2002_25
     PARTITION OF humans_2002 FOR VALUES WITH (MODULUS 26, REMAINDER 25);

and so on for the other years.

Yours,
Laurenz Albe

Re: multicolumn partitioning help

From
Laurenz Albe
Date:
On Tue, 2023-03-14 at 19:33 -0400, James Robertson wrote:
> Laurenz your solution is how I thought I would work around my (lack of) understanding
> of partitioning. (nested partitions).
> I was hesitant because I didn't know what sort of performance problems I would create for myself.
>
> [...] more performance [...]

If you are thinking of subpartitioning primarily in terms of boosting performance,
you should know that you only get performance benefits from partitioning with
very special queries that effectively have to be designed together with the
partitioning strategy.  Other statements typically become somewhat slower
through partitioning.

So it is really impossible to discuss performance benefits without knowing
the exact query.  It may be best if you build a play database with realistic amounts
of test data and use EXPLAIN and EXPLAIN (ANALYZE) to see the effects that
partitioning has on your queries.

Yours,
Laurenz Albe



Re: multicolumn partitioning help

From
David Rowley
Date:
On Wed, 15 Mar 2023 at 10:41, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> I think you want subpartitioning, like
>
>   CREATE TABLE humans (
>      hash bytea,
>      fname text,
>      dob date
>   ) PARTITION BY LIST (EXTRACT (YEAR FROM dob));

This may be perfectly fine, but it is also important to highlight that
partitioning in this way may hinder partition pruning.

If the first level partitioned table was to be BY RANGE (dob); then
the partitions could be defined like FOR VALUES FROM ('2023-01-01') TO
('2024-01-01'). For a query that had something like WHERE dob =
'2023-03-16', then PostgreSQL could prune away all the partitions for
the other years.  The same wouldn't occur if the table was partitioned
by LIST (EXTRACT (YEAR FROM dob)) unless you added a AND EXTRACT (YEAR
FROM dob) = 2023 to the query's WHERE clause.

Robert, there are a few tips about partitioning in [1] that you may
wish to review.

David

[1] https://www.postgresql.org/docs/devel/ddl-partitioning.html



Re: multicolumn partitioning help

From
David Rowley
Date:
On Thu, 16 Mar 2023 at 00:47, James Robertson <james@jsrobertson.net> wrote:
> or do we get?
>
> TopLevelTable
> |
> |----> worker-thread 1 (default catch)
> ........|----> worker thread 2 -> sub-table 1.1
> ........|----> worker thread 3 -> sub-table 1.2
> ........|----> worker thread 4 -> sub-table 1.n
> |
> |----> worker-thread 5 (default catch)
> ........|----> worker thread 6 -> sub-table 2.1
> ........|----> worker thread 7 -> sub-table 2.2
> ........|----> worker thread 8 -> sub-table 2.n

The planner generally flattens out the scans to each partition into a
single Append or MergeAppend node.  Append nodes can be parallelised.
Assuming there's no reason that a particular partition can't support
it, the parallel workers can be distributed to assist without
restriction to which partition they help with. Multiple workers can
even help with a single partition. Workers can then move over to
helping with other partitions when they're done with the partition
they've been working on. I believe some other databases do or did at
least restrict parallelism to 1 worker per partition (which perhaps is
why you raised this).  There's no such restriction with PostgreSQL.

> Summary:
> 1) if we create nested partitions, do we create performance issues:

If you create too many partitions, it can create performance issues.
You should look at the partitioning best practices section of the
documents for details about that. I recommend a careful read of those.

> 2) if nested partitions are the solutions, what is the point of multi-column partitioning?

There are many reasons.  If you have multiple levels of partitioning,
then the partition pruning done during planning is going to have more
work to do as it'll be executed once, then once again for each
partitioned table remaining after running it for the first level.
Also, it seems much easier to PARTITION BY HASH(a,b) than to first do
HASH(a) then another level to HASH(b).  However, there may be
advantages to having multiple levels here as the planner would still
be able to prune partitions if the WHERE clause didn't contain any
quals like "b = <value>".  The key take away here is that they're
different, so we support both.

> wish list) wouldn't it be neat if we can do mult-mode multi-column? like PARTITION BY RANGE (EXTRACT(YEAR FROM dob))
LIST(SUBSTRING(hash, 1, 1));
 

Effectively, multi-level partitioning gives you that, It's just the
DDL is different from how you wrote it.

David