Thread: Question on Partition key

Question on Partition key

From
veem v
Date:
Hello Friends,
We are trying to create a monthly range partition table , partitioned on column  PART_DATE. This will hold Orders and part_date is nothing but invoice date. Some Team mates are asking to use the "PART_DATE" column as data type "INTEGER" with "YYYYMM" format and also define partitions as below. Want to know experts' views on this. If the data type of the partition key matters here or not? Or if there is any downside of each approach in future?

-- With date data type, It will look like as below
CREATE TABLE TAB1(  COLUMN1 VARCHAR(36) NOT NULL ,   PART_DATE DATE NOT NULL )    PARTITION BY RANGE (PART_DATE);

CREATE TABLE  TAB1_202309  PARTITION OF TAB1 FOR VALUES FROM ('2023-09-01') TO ('2023-10-01');
CREATE TABLE  TAB1_202310  PARTITION OF TAB1 FOR VALUES FROM ('2023-10-01') TO ('2023-11-01');
CREATE TABLE  TAB1_202311  PARTITION OF TAB1 FOR VALUES FROM ('2023-11-01') TO ('2023-12-01');

ALTER TABLE TAB1  ADD  CONSTRAINT PK_TAB1 PRIMARY KEY  ( COLUMN1 , PART_DATE  );


VS

-- With integer data type, 
It will look like as below

CREATE TABLE TAB1(  COLUMN1 VARCHAR(36) NOT NULL ,   PART_DATE_YM_NM Integer NOT NULL  )  PARTITION BY RANGE (PART_DATE_YM_NM);
CREATE TABLE  TAB1_202309  PARTITION OF TAB1 FOR VALUES FROM ('202309') TO ('202310');
CREATE TABLE  TAB1_202310  PARTITION OF TAB1 FOR VALUES FROM ('202310') TO ('202311');
CREATE TABLE  TAB1_202311  PARTITION OF TAB1 FOR VALUES FROM ('202311') TO ('202312');

ALTER TABLE TAB1   ADD  CONSTRAINT PK_TAB1 PRIMARY KEY  ( COLUMN1 , PART_DATE_YM_NM  );

Re: Question on Partition key

From
Erik Wienhold
Date:
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:

> We are trying to create a monthly range partition table , partitioned on
> column PART_DATE. This will hold Orders and part_date is nothing but invoice
> date. Some Team mates are asking to use the "PART_DATE" column as data type
> "INTEGER" with "YYYYMM" format [...]

Why do your team mates favor integer over date?

> Want to know experts' views on this. If the data type of the partition key
> matters here or not?

Both integer and date are stored as 4 bytes.  There should be no difference
regarding index size.  I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it were
the case.

> Or if there is any downside of each approach in future?

The downside of integer is that it allows invalid dates (e.g. 202313) unless
you also add check constraints.  But then just use date if you want to store
dates.  You get input validation and can use the date operators and functions
that Postgres offers.

--
Erik



Re: Question on Partition key

From
Deep
Date:
Have your friends also mentioned how it is going to help to convert date field to integer !???

On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold <ewie@ewie.name> wrote:
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:

> We are trying to create a monthly range partition table , partitioned on
> column PART_DATE. This will hold Orders and part_date is nothing but invoice
> date. Some Team mates are asking to use the "PART_DATE" column as data type
> "INTEGER" with "YYYYMM" format [...]

Why do your team mates favor integer over date?

> Want to know experts' views on this. If the data type of the partition key
> matters here or not?

Both integer and date are stored as 4 bytes.  There should be no difference
regarding index size.  I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it were
the case.

> Or if there is any downside of each approach in future?

The downside of integer is that it allows invalid dates (e.g. 202313) unless
you also add check constraints.  But then just use date if you want to store
dates.  You get input validation and can use the date operators and functions
that Postgres offers.

--
Erik


Re: Question on Partition key

From
veem v
Date:
Thank you so much for the clarification. 

Actually team have used similar partitioning strategy on integer columns in past. So they are inclined towards that. I will still, double check with others if any business restrictions exists. But as you already mentioned, it's not good in terms of data quality perspective. I agree to this point. 

Additionally, is it true that optimizer will also get fooled on getting the math correct during cardinality estimates, as because there is a big difference between , comparing or substracting, two date values VS two number values. And storing the dates in the number columns will pose this problem for the optimizer. Is my understanding correct here? 

On Sun, 3 Sept, 2023, 2:02 pm Deep, <biswachk@gmail.com> wrote:
Have your friends also mentioned how it is going to help to convert date field to integer !???

On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold <ewie@ewie.name> wrote:
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:

> We are trying to create a monthly range partition table , partitioned on
> column PART_DATE. This will hold Orders and part_date is nothing but invoice
> date. Some Team mates are asking to use the "PART_DATE" column as data type
> "INTEGER" with "YYYYMM" format [...]

Why do your team mates favor integer over date?

> Want to know experts' views on this. If the data type of the partition key
> matters here or not?

Both integer and date are stored as 4 bytes.  There should be no difference
regarding index size.  I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it were
the case.

> Or if there is any downside of each approach in future?

The downside of integer is that it allows invalid dates (e.g. 202313) unless
you also add check constraints.  But then just use date if you want to store
dates.  You get input validation and can use the date operators and functions
that Postgres offers.

--
Erik


Re: Question on Partition key

From
David Rowley
Date:
On Sun, 3 Sept 2023 at 23:52, veem v <veema0000@gmail.com> wrote:
> Additionally, is it true that optimizer will also get fooled on getting the math correct during cardinality
estimates,as because there is a big difference between , comparing or substracting, two date values VS two number
values.And storing the dates in the number columns will pose this problem for the optimizer. Is my understanding
correcthere? 

The query planner does not do any subtracting of values which are the
target of the statistics.  There are comparisons, but comparing a DATE
or an INT are equally as cheap.

To me, the design with the PART_DATE_YM_NM INT column looks very
strange.  Why bother partitioning by RANGE when there's just a single
value? The partition pruning done for LIST partitioning will work
equally as well when given ranges of values.  Also, don't they ever
want to store the day of the month anywhere in the table?  The INT
partitioned table won't allow that, but the DATE one will.

Several jobs ago in a land far far away, I worked with someone who
would tell engineers to not use EXISTs clauses in their SQLs as
"they're not optimised very well". I questioned him about this and as
it turned out, some version of Oracle once didn't optimise these very
well and when he learned this, he took that knowledge and seemingly
applied it to all versions of all RDBMSs in the universe.  Rather
bizarre, but perhaps that's what's going on here too.

David