Thread: Question on Partition key
Hello Friends,
We are trying to create a monthly range partition table , partitioned on column PART_DATE. This will hold Orders and part_date is nothing but invoice date. Some Team mates are asking to use the "PART_DATE" column as data type "INTEGER" with "YYYYMM" format and also define partitions as below. Want to know experts' views on this. If the data type of the partition key matters here or not? Or if there is any downside of each approach in future?
-- With date data type, It will look like as below
CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE DATE NOT NULL ) PARTITION BY RANGE (PART_DATE);
CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('2023-09-01') TO ('2023-10-01');
CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('2023-10-01') TO ('2023-11-01');
CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('2023-11-01') TO ('2023-12-01');
ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 , PART_DATE );
VS
-- With integer data type, It will look like as below
CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE_YM_NM Integer NOT NULL ) PARTITION BY RANGE (PART_DATE_YM_NM);
CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('202309') TO ('202310');
CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('202310') TO ('202311');
CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('202311') TO ('202312');
ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 , PART_DATE_YM_NM );
We are trying to create a monthly range partition table , partitioned on column PART_DATE. This will hold Orders and part_date is nothing but invoice date. Some Team mates are asking to use the "PART_DATE" column as data type "INTEGER" with "YYYYMM" format and also define partitions as below. Want to know experts' views on this. If the data type of the partition key matters here or not? Or if there is any downside of each approach in future?
-- With date data type, It will look like as below
CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE DATE NOT NULL ) PARTITION BY RANGE (PART_DATE);
CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('2023-09-01') TO ('2023-10-01');
CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('2023-10-01') TO ('2023-11-01');
CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('2023-11-01') TO ('2023-12-01');
ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 , PART_DATE );
VS
-- With integer data type, It will look like as below
CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE_YM_NM Integer NOT NULL ) PARTITION BY RANGE (PART_DATE_YM_NM);
CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('202309') TO ('202310');
CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('202310') TO ('202311');
CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('202311') TO ('202312');
ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 , PART_DATE_YM_NM );
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote: > We are trying to create a monthly range partition table , partitioned on > column PART_DATE. This will hold Orders and part_date is nothing but invoice > date. Some Team mates are asking to use the "PART_DATE" column as data type > "INTEGER" with "YYYYMM" format [...] Why do your team mates favor integer over date? > Want to know experts' views on this. If the data type of the partition key > matters here or not? Both integer and date are stored as 4 bytes. There should be no difference regarding index size. I don't know if the data type makes a difference in partition pruning performance in this case, but I'd be surprised if it were the case. > Or if there is any downside of each approach in future? The downside of integer is that it allows invalid dates (e.g. 202313) unless you also add check constraints. But then just use date if you want to store dates. You get input validation and can use the date operators and functions that Postgres offers. -- Erik
Have your friends also mentioned how it is going to help to convert date field to integer !???
On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold <ewie@ewie.name> wrote:
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:
> We are trying to create a monthly range partition table , partitioned on
> column PART_DATE. This will hold Orders and part_date is nothing but invoice
> date. Some Team mates are asking to use the "PART_DATE" column as data type
> "INTEGER" with "YYYYMM" format [...]
Why do your team mates favor integer over date?
> Want to know experts' views on this. If the data type of the partition key
> matters here or not?
Both integer and date are stored as 4 bytes. There should be no difference
regarding index size. I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it were
the case.
> Or if there is any downside of each approach in future?
The downside of integer is that it allows invalid dates (e.g. 202313) unless
you also add check constraints. But then just use date if you want to store
dates. You get input validation and can use the date operators and functions
that Postgres offers.
--
Erik
Thank you so much for the clarification.
Actually team have used similar partitioning strategy on integer columns in past. So they are inclined towards that. I will still, double check with others if any business restrictions exists. But as you already mentioned, it's not good in terms of data quality perspective. I agree to this point.
Additionally, is it true that optimizer will also get fooled on getting the math correct during cardinality estimates, as because there is a big difference between , comparing or substracting, two date values VS two number values. And storing the dates in the number columns will pose this problem for the optimizer. Is my understanding correct here?
On Sun, 3 Sept, 2023, 2:02 pm Deep, <biswachk@gmail.com> wrote:
Have your friends also mentioned how it is going to help to convert date field to integer !???On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold <ewie@ewie.name> wrote:On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:
> We are trying to create a monthly range partition table , partitioned on
> column PART_DATE. This will hold Orders and part_date is nothing but invoice
> date. Some Team mates are asking to use the "PART_DATE" column as data type
> "INTEGER" with "YYYYMM" format [...]
Why do your team mates favor integer over date?
> Want to know experts' views on this. If the data type of the partition key
> matters here or not?
Both integer and date are stored as 4 bytes. There should be no difference
regarding index size. I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it were
the case.
> Or if there is any downside of each approach in future?
The downside of integer is that it allows invalid dates (e.g. 202313) unless
you also add check constraints. But then just use date if you want to store
dates. You get input validation and can use the date operators and functions
that Postgres offers.
--
Erik
On Sun, 3 Sept 2023 at 23:52, veem v <veema0000@gmail.com> wrote: > Additionally, is it true that optimizer will also get fooled on getting the math correct during cardinality estimates,as because there is a big difference between , comparing or substracting, two date values VS two number values.And storing the dates in the number columns will pose this problem for the optimizer. Is my understanding correcthere? The query planner does not do any subtracting of values which are the target of the statistics. There are comparisons, but comparing a DATE or an INT are equally as cheap. To me, the design with the PART_DATE_YM_NM INT column looks very strange. Why bother partitioning by RANGE when there's just a single value? The partition pruning done for LIST partitioning will work equally as well when given ranges of values. Also, don't they ever want to store the day of the month anywhere in the table? The INT partitioned table won't allow that, but the DATE one will. Several jobs ago in a land far far away, I worked with someone who would tell engineers to not use EXISTs clauses in their SQLs as "they're not optimised very well". I questioned him about this and as it turned out, some version of Oracle once didn't optimise these very well and when he learned this, he took that knowledge and seemingly applied it to all versions of all RDBMSs in the universe. Rather bizarre, but perhaps that's what's going on here too. David