Re: Should we warn against using too many partitions? - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: Should we warn against using too many partitions? |
Date | |
Msg-id | 20190608063858.GF3079@telsasoft.com Whole thread Raw |
In response to | Re: Should we warn against using too many partitions? (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Should we warn against using too many partitions?
|
List | pgsql-hackers |
I made another pass, hopefully it's useful and not too much of a pain. diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index cce1618fc1..be2ca3be48 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -4674,6 +4675,88 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; </itemizedlist> </para> </sect2> + + <sect2 id="ddl-partitioning-declarative-best-practices"> + <title>Declarative Partitioning Best Practices</title> + + <para> + The choice of how to partition a table should be made carefully as the + performance of query planning and execution can be negatively affected by + poorly made design decisions. Maybe just "poor design" + partitioned table. <literal>WHERE</literal> clause items that match and + are compatible with the partition key can be used to prune away unneeded remove "away" ? + requirements for the <literal>PRIMARY KEY</literal> or a + <literal>UNIQUE</literal> constraint. Removal of unwanted data is also + a factor to consider when planning your partitioning strategy as an entire + partition can be removed fairly quickly. However, if data that you want Can we just say "dropped" ? On my first (re)reading, I briefly thought this was now referring to "pruning" as "removal". + to keep exists in that partition then that means having to resort to using + <command>DELETE</command> instead of removing the partition. + </para> + + <para> + Choosing the target number of partitions by which the table should be + divided into is also a critical decision to make. Not having enough Should be: ".. target number .. into which .. should be divided .." + partitions may mean that indexes remain too large and that data locality + remains poor which could result in poor cache hit ratios. However, Change the 2nd remains to "is" and the second poor to "low" ? + dividing the table into too many partitions can also cause issues. + Too many partitions can mean slower query planning times and higher memory s/slower/longer/ + consumption during both query planning and execution. It's also important + to consider what changes may occur in the future when choosing how to + partition your table. For example, if you choose to have one partition Remove "when choosing ..."? Or say: |When choosing how to partition your table, it's also important to consider |what changes may occur in the future. + per customer and you currently have a small number of large customers, + what will the implications be if in several years you obtain a large + number of small customers. In this case, it may be better to choose to + partition by <literal>HASH</literal> and choose a reasonable number of + partitions rather than trying to partition by <literal>LIST</literal> and + hoping that the number of customers does not increase significantly over + time. + </para> It's an unusual thing for which to hope :) + <para> + Sub-partitioning can be useful to further divide partitions that are + expected to become larger than other partitions, although excessive + sub-partitioning can easily lead to large numbers of partitions and can + cause the problems mentioned in the preceding paragraph. + </para> cause the SAME problems ? + It is also important to consider the overhead of partitioning during + query planning and execution. The query planner is generally able to + handle partition hierarchies up a few thousand partitions fairly well, + provided that typical queries prune all but a small number of partitions + during query planning. Planning times become slower and memory s/slower/longer/ Hm, maybe say "typical queries ALLOW PRUNNING .." + consumption becomes higher when more partitions remain after the planner + performs partition pruning. This is particularly true for the Just say: "remain after planning" ? + <command>UPDATE</command> and <command>DELETE</command> commands. Also, + even if most queries are able to prune a large number of partitions during + query planning, it still may be undesirable to have a large number of may still ? + partitions as each partition requires metadata about the partition to be + stored in each session that touches it. If each session touches a large stored for ? + number of partitions over a period of time then the memory consumption for + this may become significant. + </para> Remove "over a period of time" ? Add a comma? Maybe say: |If each session touches a large number of partitions, then the memory |overhead may become significant. + <para> + With data warehouse type workloads it can make sense to use a larger + number of partitions than with an OLTP type workload. Generally, in data + warehouses, query planning time is less of a concern as the majority of VAST majority? Or "essentially all"? Or " .. query planning time is insignificant compared to the time spent during query execution. + processing time is spent during query execution. With either of these two + types of workload it is important to make the right decisions early as early COMMA Justin
pgsql-hackers by date: