Thread: pg12 - partition by column that might have null values

pg12 - partition by column that might have null values

From
Mariel Cherkassky
Date:
Hey,
In PG12 I'm trying to create partitions by range on a date column that might be null (indicates it is the most recent version of the object). My PK has to include the partition column, therefore I'm getting an error that I cant create a primary key with the specific column because it has null values.

For example : 
\d object_revision
                                          Table "public.object_revision"
   Column    |            Type             | Collation | Nullable |                    Default
-------------+-----------------------------+-----------+----------+-----------------------------------------------
 id          | integer                     |           | not null | nextval('mariel_dates_test_id_seq'::regclass)
 end_time    | timestamp without time zone |           |          |
 object_hash | text                        |           |          |
Indexes:
    "id_pk" PRIMARY KEY, btree (id)

Lets say that the same object (object_hash) can have many revisions, the end_time is the time it was last updated. I'm trying to create this table as a range partition on the end_time. However, when I try to add the pk I'm getting an error : 
ALTER TABLE object_revision ADD CONSTRAINT object_revision_id_end_time PRIMARY KEY (id,end_time);
ERROR:  column "end_time" contains null values

does someone familiar with a workaround ? I know that in postgresql as part of the primary key definition unique and not null constraints are enforced on each column and not on both of them. However, this might be problematic with pg12 partitions..


Re: pg12 - partition by column that might have null values

From
Laurenz Albe
Date:
Mariel Cherkassky wrote:
> In PG12 I'm trying to create partitions by range on a date column
> that might be null (indicates it is the most recent version of the
> object). My PK has to include the partition column, therefore I'm
> getting an error that I cant create a primary key with the specific
> column because it has null values.
> 
> For example : 
> \d object_revision
>                                           Table
> "public.object_revision"
>    Column    |            Type             | Collation | Nullable |  
>                  Default
> -------------+-----------------------------+-----------+----------+
> -----------------------------------------------
>  id          | integer                     |           | not null |
> nextval('mariel_dates_test_id_seq'::regclass)
>  end_time    | timestamp without time zone |           |          |
>  object_hash | text                        |           |          |
> Indexes:
>     "id_pk" PRIMARY KEY, btree (id)
> 
> Lets say that the same object (object_hash) can have many revisions,
> the end_time is the time it was last updated. I'm trying to create
> this table as a range partition on the end_time. However, when I try
> to add the pk I'm getting an error : 
> ALTER TABLE object_revision ADD CONSTRAINT
> object_revision_id_end_time PRIMARY KEY (id,end_time);
> ERROR:  column "end_time" contains null values
> 
> does someone familiar with a workaround ? I know that in postgresql
> as part of the primary key definition unique and not null constraints
> are enforced on each column and not on both of them. However, this
> might be problematic with pg12 partitions..

Can "end_time" ever be modified?

If yes, it is a bad choice for a partitioning column.

If the NULL in "end_time" is meant to signify "no end", use the
value "infinity" instead of NULL.  Then you can define the column
as NOT NULL.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: pg12 - partition by column that might have null values

From
Mariel Cherkassky
Date:
Whenever I have a new revision of that object, I update the end_time of the latest revision to be now() and I add a new record of that object with end_date null.
The null value is used to find most recent revisions of objects..
Thanks for the suggestion of infinity ! I'll try it.

RE: pg12 - partition by column that might have null values

From
"Mike Sofen"
Date:

From: Mariel Cherkassky <mariel.cherkassky@gmail.com>
Sent: Wednesday, October 02, 2019 12:37 AM
Whenever I have a new revision of that object, I update the end_time of the latest revision to be now() and I add a new record of that object with end_date null.

The null value is used to find most recent revisions of objects..

Thanks for the suggestion of infinity ! I'll try it.

 

My partitioning table design model always uses a partitioning column that is 100% static since that guarantees that rows are not constantly moving between partitions (with index update overhead etc).  In this scenario I’d use a “StartTime” column to anchor the row in a partition.  The relatively few rows with a null EndTime don’t need the power of partitioning, just an index to find them.

 

Mike Sofen

Re: pg12 - partition by column that might have null values

From
Mariel Cherkassky
Date:
but the start time doesnt indicates that the object is the most recent, it just indicates when the object was added to your table. If your queries involve the start_time I can understand why u set it as a partition column, otherwise is isnt useful. In most of my queries I query by one of 2 options :
1.Where end_time is null 
2.Where start_date>DATE and end_date <DATE

I think that doing the following will be the best option : 
partition by list (end_time) -  (1 for all non null (non infinity) and 1 default for all those who has end_time that isnt null)
on each partition I'll create range partition on the end_date so that I can search for revisions faster.

What do you think ? 

RE: pg12 - partition by column that might have null values

From
"Mike Sofen"
Date:

 

>but the start time doesnt indicates that the object is the most recent, it just indicates when the object was added to your table…

>on each partition I'll create range partition on the end_date so that I can search for revisions faster.

 

I believe you are confusing data storage with query optimization.  Rarely would there be more updated rows than aged/stable rows…in the normal system, having even 3% of the data in churn (updateable) state would be unusual and your description of the data dynamics on this table said that a row updated once, gets the end_date set and then a new row is created.

 

To me, that says, put an index on end_date so you can find/query them quickly, and create partitions on a static date so the rows (and indexes) aren’t always being updated.

 

Mike Sofen

Re: pg12 - partition by column that might have null values

From
Mariel Cherkassky
Date:
Not sure how data storage is relevant here, I was only focusing on query optimization. Lets say that most of the data isnt moving (history data). However, objects can be changed and therefore new revisions are added and the previous revisions updated (their end_date is updated). If you run queries that involve the end_date very common (in order to get the most recent revision of objects) it will be better to set this column as a partition column instead just having an index on this col. In this way, getting all the recent revisions of a specific object is reached by log(m) [m is the number of most recent revisions] instead of logn [n is the number of revisions u have] and n is by far bigger than m. Correct me I'f I'm wrong, this topic is quite interesting ..


Re: pg12 - partition by column that might have null values

From
Michael Lewis
Date:
Just create a partial index on id column where end_date = infinity (or null if you really prefer that pattern) and the system can quickly find the rows that are still most current revision. How many rows do you have in this table? Or foresee ever having? What took you down the road of partitioning the table? Theory only, or solving a real life optimization problem?