Home > mailing lists

Re: Partitioning options - Mailing list pgsql-general

From	Jim Nasby
Subject	Re: Partitioning options
Date	February 8, 2024 20:34:34
Msg-id	4ef16a66-a54f-47b2-956d-803561c06e47@gmail.com Whole thread Raw
In response to	Re: Partitioning options (veem v <veema0000@gmail.com>)
List	pgsql-general

Tree view

On 2/8/24 1:43 PM, veem v wrote:
> 
> On Thu, 8 Feb 2024 at 20:08, Greg Sabino Mullane <htamfids@gmail.com 
> <mailto:htamfids@gmail.com>> wrote:
<snip>
> 
>         Should we go for simple daily range partitioning on the
>         transaction_date column?
> 
> 
>     This one gets my vote. That and some good indexes.
> 

> 
> Hello Greg,
> 
> Out of curiosity, As OP mentioned that there will be Joins and also 
> filters on column Customer_id column , so why don't you think that 
> subpartition by customer_id will be a good option? I understand List 
> subpartition may not be an option considering the new customer_ids gets 
> added slowly in the future(and default list may not be allowed) and also 
> OP mentioned, there is skewed distribution of data for customer_id 
> column. However what is the problem if OP will opt for HASH subpartition 
> on customer_id in this situation?
> 
> Is it because the number of partitions will be higher i.e.
> 
> If you go with simple range partitioning, for 5 months you will have 
> ~150 daily range partitions and with each index the count of partition 
> will gets double, for e.g if you will have 10 indexes, the total 
> partitions will be = ~150 table partition+ (10*150)index partition= 1650 
> total number of partitions.
> 
> If OP goes for , range-hash, and hash will mostly have to be 2^N, so say 
> 8, hash sub-partitions , then the total number of partitions will be = 
> (8*150) table partitions+ (8*150*10) index partitions= ~13200 partitions.
> 
> Though there are no theoretical limits to the number of partitions in 
> postgres, there are some serious issues noted in the past with higher 
> number of table partitions. One such is below. Is this the reason?
> 
> https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits
<https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits>

The issue with partitioning by customer_id is that it won't do much (if 
anything) to improve data locality. When partitioning by date, you can 
at least benefit from partition elimination *IF* your most frequent 
queries limit the number of days that the query will look at. Per the 
OP, all queries will include transaction date. Note that does NOT 
actually mean the number of days/partitions will be limited (ie, WHERE 
date > today - 150 will hit all the partitions), but if we assume that 
the majority of queries will limit themselves to the past few days then 
partitioning by date should greatly increase data locality.

Also, when it comes to customer partitioning... really what you probably 
want there isn't partitioning but sharding.
-- 
Jim Nasby, Data Architect, Austin TX

pgsql-general by date:

From: Alpaslan AKDAĞ
Date: 08 February 2024, 20:28:03
Subject: Re: archive command doesnt work

From: Alpaslan AKDAĞ
Date: 08 February 2024, 20:35:28
Subject: Re: archive command doesnt work

Re: Partitioning options - Mailing list pgsql-general

Previous

Next