Thread: partitions vs indexes

partitions vs indexes

From
"Enrico Thierbach"
Date:

Hello list,

I run into some trouble with partitions:

I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key. Also, a column might hold a referenece to a parent row. So this is my current table setup, slimmed down:

CREATE TYPE statuses AS ENUM ('ready', ‘processing’, ‘done’);

CREATE TABLE mytable ( id          BIGSERIAL PRIMARY KEY NOT NULL,  parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE,   status statuses DEFAULT 'ready'
);

Since entries in the table are often frequented when status is not ‘done’ I would like to partition by state. However, if I want to do that:

CREATE TABLE mytable ( id          BIGSERIAL NOT NULL,  parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE,   status statuses DEFAULT 'ready'
 -- UNIQUE(id, status)        -- doesn’t work: can’t reference parent -- UNIQUE(id)                -- doesn’t work: can’t partition
) PARTITION BY LIST(status);

I need to add status to the primary key or another unique constraint. In that case, however, I can no longer have the foreign key constraint on parent_id.

Weirdly enough partitioning works fine if there is no unique constraint on that table:

CREATE TABLE mytable ( id          BIGSERIAL NOT NULL,  status statuses DEFAULT 'ready'
) PARTITION BY LIST(status);

So partitioning seems to require the column being in a unique constraint if and only if a unique constraint exist on the table. Also I cannot create multiple unique constraints on the table.

Here comes my question:

  • Do I miss something?
  • ThI don’t understand the requirement the partition value to be part of a unique constraint if such a constraint exists, since partitioning seems to work fine if the table has no unique constraints at all. Can someone shed some light on that? Is that maybe an artificial limitation that will go away on the future?
  • Any suggestions how I could proceed?

Thank you for any suggestion!

Best,
Eno

--
me on github: http://github.com/radiospiel

Re: partitions vs indexes

From
"Enrico Thierbach"
Date:

On 2 Oct 2019, at 22:09, Enrico Thierbach wrote:

Hello list,

I run into some trouble with partitions:

I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key. Also, a column might hold a referenece to a parent row. So this is my current table setup, slimmed down:

CREATE TYPE statuses AS ENUM ('ready', ‘processing’, ‘done’);

CREATE TABLE mytable ( id          BIGSERIAL PRIMARY KEY NOT NULL, parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE, status statuses DEFAULT 'ready'
);

Since entries in the table are often frequented when status is not ‘done’ I would like to partition by state. However, if I want to do that:

CREATE TABLE mytable ( id          BIGSERIAL NOT NULL, parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE, status statuses DEFAULT 'ready'
 -- UNIQUE(id, status)        -- doesn’t work: can’t reference parent -- UNIQUE(id)                -- doesn’t work: can’t partition
) PARTITION BY LIST(status);

I need to add status to the primary key or another unique constraint. In that case, however, I can no longer have the foreign key constraint on parent_id.

Weirdly enough partitioning works fine if there is no unique constraint on that table:

CREATE TABLE mytable ( id          BIGSERIAL NOT NULL, status statuses DEFAULT 'ready'
) PARTITION BY LIST(status);

So partitioning seems to require the column being in a unique constraint if and only if a unique constraint exist on the table. Also I cannot create multiple unique constraints on the table.

Here comes my question:

  • Do I miss something?
  • ThI don’t understand the requirement the partition value to be part of a unique constraint if such a constraint exists, since partitioning seems to work fine if the table has no unique constraints at all. Can someone shed some light on that? Is that maybe an artificial limitation that will go away on the future?
  • Any suggestions how I could proceed?

Thank you for any suggestion!

Best,
Eno

--
me on github: http://github.com/radiospiel

and, errm, forgot to mention thatI am on postgresql 11.3. Sorry for that omission.

Best,
eno

--
me on github: http://github.com/radiospiel

Re: partitions vs indexes

From
Michael Lewis
Date:
"I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key"

That isn't possible. The partition key must be contained by the primary key. That is, the primary key could be site_id, id and you can create hash partition on id or site_id but not created_on.

You could drop primary key and foreign keys and implement them via trigger functions as described in this blog series, but it seems questionable-

I do not assume the restriction would be dropped in future releases. I don't know that scanning all the partitions to figure out whether the primary key is violated would be advisable. Which is what the trigger functions described in the blog post has to do, right?

It might be noteworthy that partitioning with more than 10-100 partitions is MUCH faster in PG12 than PG11 (up to 4-8 thousand partitions) from testing shared by those working on that code.

Re: partitions vs indexes

From
"Enrico Thierbach"
Date:

On 2 Oct 2019, at 22:16, Michael Lewis wrote:

"I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key"

That isn't possible. The partition key must be contained by the primary key. That is, the primary key could be site_id, id and you can create hash partition on id or site_id but not created_on.

You could drop primary key and foreign keys and implement them via trigger functions as described in this blog series, but it seems questionable-

I do not assume the restriction would be dropped in future releases. I don't know that scanning all the partitions to figure out whether the primary key is violated would be advisable. Which is what the trigger functions described in the blog post has to do, right?

It might be noteworthy that partitioning with more than 10-100 partitions is MUCH faster in PG12 than PG11 (up to 4-8 thousand partitions) from testing shared by those working on that code.

Michael, thank you for your response. I think I now grasp the idea: if there is a uniqueness constraint, then the database would rather not visit all partitions to check for constraint violation, but want to identify the single partition that might fail that; hence any partitioning value must be a subset of or be identical to the uniqueness constraint.

I get that this makes sense if you have many partitions; however, I basically want to end up with two partitions, “hot” and “cold”. A row’s lifetime starts in a hot partition, and, after being processed, moves into the cold partition.

Most of the work actually happens in the hot partition, so I think having this as small as possible is probably helpful. For numbers: the hot partition would tyically contain ~10000 rows, the cold partition, on the other hand, will have 10s of millions. At the same time I still want to be able to look up a row by its id in the root relation, not in the concrete partitions. Having the database validate a uniqueness constraint in two tables instead of in one would be a worthwhile sacrifice for me.

Having said that I just realized I could probably reach my goal by setting up explicit hot and cold tables, move rows around manually whenever their “hotness” changes, and set up a view which combines both tables into a single relation. I would only have to drop all explicit FK references from the schema. A downside, certainly, but one that I could live with.

Best,
/eno

--
me on github: http://github.com/radiospiel

Re: partitions vs indexes

From
Laurenz Albe
Date:
On Wed, 2019-10-02 at 22:09 +0200, Enrico Thierbach wrote:
> would like to convert a table with a primary key into a partitioned 
> setup by a column which is not part of the primary key. Also, a
> column might hold a referenece to a parent row. So this is my current
> table setup, slimmed down:
> 
> CREATE TYPE statuses AS ENUM ('ready', ‘processing’, ‘done’);
> 
> CREATE TABLE mytable (
>   id          BIGSERIAL PRIMARY KEY NOT NULL, 
>   parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE,  
>   status statuses DEFAULT 'ready'
> );
> Since entries in the table are often frequented when status is not
> ‘done’ I would like to partition by state.

Have you considered using partial indexes?

CREATE INDEX ..... WHERE status <> 'done'.

Such indexes will be smaller, and any query where "status" is in
the WHERE clause can use the index.

If partitoining is really what you need, you could create the primary
key on the partitions and not on the partitioned table.

That won't guarantee you global uniqueness, but you could rely on
a sequence to do that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com