Thread: pg12 - partition by column that might have null values
Table "public.object_revision"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+-----------------------------------------------
id | integer | | not null | nextval('mariel_dates_test_id_seq'::regclass)
end_time | timestamp without time zone | | |
object_hash | text | | |
Indexes:
"id_pk" PRIMARY KEY, btree (id)
ERROR: column "end_time" contains null values
Mariel Cherkassky wrote: > In PG12 I'm trying to create partitions by range on a date column > that might be null (indicates it is the most recent version of the > object). My PK has to include the partition column, therefore I'm > getting an error that I cant create a primary key with the specific > column because it has null values. > > For example : > \d object_revision > Table > "public.object_revision" > Column | Type | Collation | Nullable | > Default > -------------+-----------------------------+-----------+----------+ > ----------------------------------------------- > id | integer | | not null | > nextval('mariel_dates_test_id_seq'::regclass) > end_time | timestamp without time zone | | | > object_hash | text | | | > Indexes: > "id_pk" PRIMARY KEY, btree (id) > > Lets say that the same object (object_hash) can have many revisions, > the end_time is the time it was last updated. I'm trying to create > this table as a range partition on the end_time. However, when I try > to add the pk I'm getting an error : > ALTER TABLE object_revision ADD CONSTRAINT > object_revision_id_end_time PRIMARY KEY (id,end_time); > ERROR: column "end_time" contains null values > > does someone familiar with a workaround ? I know that in postgresql > as part of the primary key definition unique and not null constraints > are enforced on each column and not on both of them. However, this > might be problematic with pg12 partitions.. Can "end_time" ever be modified? If yes, it is a bad choice for a partitioning column. If the NULL in "end_time" is meant to signify "no end", use the value "infinity" instead of NULL. Then you can define the column as NOT NULL. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
From: Mariel Cherkassky <mariel.cherkassky@gmail.com>
Sent: Wednesday, October 02, 2019 12:37 AM
Whenever I have a new revision of that object, I update the end_time of the latest revision to be now() and I add a new record of that object with end_date null.
The null value is used to find most recent revisions of objects..
Thanks for the suggestion of infinity ! I'll try it.
My partitioning table design model always uses a partitioning column that is 100% static since that guarantees that rows are not constantly moving between partitions (with index update overhead etc). In this scenario I’d use a “StartTime” column to anchor the row in a partition. The relatively few rows with a null EndTime don’t need the power of partitioning, just an index to find them.
Mike Sofen
>but the start time doesnt indicates that the object is the most recent, it just indicates when the object was added to your table…
>on each partition I'll create range partition on the end_date so that I can search for revisions faster.
I believe you are confusing data storage with query optimization. Rarely would there be more updated rows than aged/stable rows…in the normal system, having even 3% of the data in churn (updateable) state would be unusual and your description of the data dynamics on this table said that a row updated once, gets the end_date set and then a new row is created.
To me, that says, put an index on end_date so you can find/query them quickly, and create partitions on a static date so the rows (and indexes) aren’t always being updated.
Mike Sofen