Re: Partitioning options - Mailing list pgsql-general
From | Jim Nasby |
---|---|
Subject | Re: Partitioning options |
Date | |
Msg-id | 4ef16a66-a54f-47b2-956d-803561c06e47@gmail.com Whole thread Raw |
In response to | Re: Partitioning options (veem v <veema0000@gmail.com>) |
List | pgsql-general |
On 2/8/24 1:43 PM, veem v wrote: > > On Thu, 8 Feb 2024 at 20:08, Greg Sabino Mullane <htamfids@gmail.com > <mailto:htamfids@gmail.com>> wrote: <snip> > > Should we go for simple daily range partitioning on the > transaction_date column? > > > This one gets my vote. That and some good indexes. > > > Hello Greg, > > Out of curiosity, As OP mentioned that there will be Joins and also > filters on column Customer_id column , so why don't you think that > subpartition by customer_id will be a good option? I understand List > subpartition may not be an option considering the new customer_ids gets > added slowly in the future(and default list may not be allowed) and also > OP mentioned, there is skewed distribution of data for customer_id > column. However what is the problem if OP will opt for HASH subpartition > on customer_id in this situation? > > Is it because the number of partitions will be higher i.e. > > If you go with simple range partitioning, for 5 months you will have > ~150 daily range partitions and with each index the count of partition > will gets double, for e.g if you will have 10 indexes, the total > partitions will be = ~150 table partition+ (10*150)index partition= 1650 > total number of partitions. > > If OP goes for , range-hash, and hash will mostly have to be 2^N, so say > 8, hash sub-partitions , then the total number of partitions will be = > (8*150) table partitions+ (8*150*10) index partitions= ~13200 partitions. > > Though there are no theoretical limits to the number of partitions in > postgres, there are some serious issues noted in the past with higher > number of table partitions. One such is below. Is this the reason? > > https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits <https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits> The issue with partitioning by customer_id is that it won't do much (if anything) to improve data locality. When partitioning by date, you can at least benefit from partition elimination *IF* your most frequent queries limit the number of days that the query will look at. Per the OP, all queries will include transaction date. Note that does NOT actually mean the number of days/partitions will be limited (ie, WHERE date > today - 150 will hit all the partitions), but if we assume that the majority of queries will limit themselves to the past few days then partitioning by date should greatly increase data locality. Also, when it comes to customer partitioning... really what you probably want there isn't partitioning but sharding. -- Jim Nasby, Data Architect, Austin TX
pgsql-general by date: