Re: Should we warn against using too many partitions? - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Should we warn against using too many partitions?
Date
Msg-id 20190608063858.GF3079@telsasoft.com
Whole thread Raw
In response to Re: Should we warn against using too many partitions?  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Should we warn against using too many partitions?
List pgsql-hackers
I made another pass, hopefully it's useful and not too much of a pain.

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index cce1618fc1..be2ca3be48 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4674,6 +4675,88 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
    </itemizedlist>
    </para>
   </sect2>
+  
+  <sect2 id="ddl-partitioning-declarative-best-practices">
+   <title>Declarative Partitioning Best Practices</title>
+
+   <para>
+    The choice of how to partition a table should be made carefully as the
+    performance of query planning and execution can be negatively affected by
+    poorly made design decisions.

Maybe just "poor design"

+    partitioned table.  <literal>WHERE</literal> clause items that match and
+    are compatible with the partition key can be used to prune away unneeded

remove "away" ?

+    requirements for the <literal>PRIMARY KEY</literal> or a
+    <literal>UNIQUE</literal> constraint.  Removal of unwanted data is also
+    a factor to consider when planning your partitioning strategy as an entire
+    partition can be removed fairly quickly.  However, if data that you want

Can we just say "dropped" ?  On my first (re)reading, I briefly thought this
was now referring to "pruning" as "removal".

+    to keep exists in that partition then that means having to resort to using
+    <command>DELETE</command> instead of removing the partition.
+   </para>
+
+   <para>
+    Choosing the target number of partitions by which the table should be
+    divided into is also a critical decision to make.  Not having enough

Should be: ".. target number .. into which .. should be divided .."

+    partitions may mean that indexes remain too large and that data locality
+    remains poor which could result in poor cache hit ratios.  However,

Change the 2nd remains to "is" and the second poor to "low" ?

+    dividing the table into too many partitions can also cause issues.
+    Too many partitions can mean slower query planning times and higher memory

s/slower/longer/

+    consumption during both query planning and execution.  It's also important
+    to consider what changes may occur in the future when choosing how to
+    partition your table.  For example, if you choose to have one partition

Remove "when choosing ..."?  Or say:

|When choosing how to partition your table, it's also important to consider
|what changes may occur in the future.

+    per customer and you currently have a small number of large customers,
+    what will the implications be if in several years you obtain a large
+    number of small customers.  In this case, it may be better to choose to
+    partition by <literal>HASH</literal> and choose a reasonable number of
+    partitions rather than trying to partition by <literal>LIST</literal> and
+    hoping that the number of customers does not increase significantly over
+    time.
+   </para>

It's an unusual thing for which to hope :)

+   <para>
+    Sub-partitioning can be useful to further divide partitions that are
+    expected to become larger than other partitions, although excessive
+    sub-partitioning can easily lead to large numbers of partitions and can
+    cause the problems mentioned in the preceding paragraph.
+   </para>

cause the SAME problems ?

+    It is also important to consider the overhead of partitioning during
+    query planning and execution.  The query planner is generally able to
+    handle partition hierarchies up a few thousand partitions fairly well,
+    provided that typical queries prune all but a small number of partitions
+    during query planning.  Planning times become slower and memory

s/slower/longer/

Hm, maybe say "typical queries ALLOW PRUNNING .."

+    consumption becomes higher when more partitions remain after the planner
+    performs partition pruning.  This is particularly true for the

Just say: "remain after planning" ?

+    <command>UPDATE</command> and <command>DELETE</command> commands.  Also,
+    even if most queries are able to prune a large number of partitions during
+    query planning, it still may be undesirable to have a large number of

may still ?

+    partitions as each partition requires metadata about the partition to be
+    stored in each session that touches it.  If each session touches a large

stored for ?

+    number of partitions over a period of time then the memory consumption for
+    this may become significant.
+   </para>

Remove "over a period of time" ?
Add a comma?

Maybe say:

|If each session touches a large number of partitions, then the memory
|overhead may become significant.

+   <para>
+    With data warehouse type workloads it can make sense to use a larger
+    number of partitions than with an OLTP type workload.  Generally, in data
+    warehouses, query planning time is less of a concern as the majority of

VAST majority?  Or "essentially all"?  Or " .. query planning time is
insignificant compared to the time spent during query execution.

+    processing time is spent during query execution.  With either of these two
+    types of workload it is important to make the right decisions early as

early COMMA

Justin



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Bloom Indexes - bit array length and the total number of bits(or hash functions ?? ) !
Next
From: Thierry Husson
Date:
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG#15840)