Re: Should we warn against using too many partitions? - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Should we warn against using too many partitions?
Date
Msg-id CA+HiwqHZ_YTu5ZR7q_dcQ+XO-57_bc1j=q5gzgrmw+B+qNJnQw@mail.gmail.com
Whole thread Raw
In response to Re: Should we warn against using too many partitions?  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Should we warn against using too many partitions?
List pgsql-hackers
Hi,

Thanks for the updated patches.

On Sun, Jun 9, 2019 at 5:29 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On Fri, 7 Jun 2019 at 19:00, Amit Langote <amitlangote09@gmail.com> wrote:
> > Maybe:
> >
> > ...    Removal of unwanted data is also a factor to consider when
> > planning your partitioning strategy as an entire partition can be
> > removed fairly quickly, especially if the partition keys are chosen
> > such that all data that can be deleted together are grouped into
> > separate partitions.
>
> It seems like a good idea to change this to have this mention the
> benefits rather than the drawbacks. I've reworded it, but not using
> your exact words as it seems the "especially" means that a partition
> can be removed faster with properly chosen partition keys, which is
> not the case.
>
> I also split this out into its own paragraph since it's talking about
> something quite different from the previous paragraph.

Did you miss to split?  In v4 patches, I still see this point
mentioned in the same paragraph that it was in before:

+   <para>
+    One of the most critical design decisions will be the column or columns
+    by which you partition your data.  Often the best choice will be to
+    partition by the column or set of columns which most commonly appear in
+    <literal>WHERE</literal> clauses of queries being executed on the
+    partitioned table.  <literal>WHERE</literal> clause items that match and
+    are compatible with the partition key can be used to prune unneeded
+    partitions.  Removal of unwanted data is also a factor to consider when
+    planning your partitioning strategy.  An entire partition can be detached
+    fairly quickly, so it may be beneficial to design the partition strategy
+    in such a way that all data to be removed at once is located in a single
+    partition.
+   </para>

> > 2.
> >
> > +    ... For example, if you choose to have one partition
> > +    per customer and you currently have a small number of large customers,
> > +    what will the implications be if in several years you obtain a large
> > +    number of small customers.
> >
> > The sentence could be rewritten a bit.  Maybe as:
> >
> > ... For example, choosing a design with one partition per customer,
> > because you currently have a small number of large customers, will not
> > scale well several years down the line when you might have a large
> > number of small customers.
> >
> > Btw, doesn't it suffice here to say "large number of customers"
> > instead of "large number of small customers"?
>
> I'm not really trying to imply to plan for business growth here, I'm
> trying to angle it as "what if your business changes".

Hmm, okay.  I thought you were intending this as an example of how a
particular partitioning design may not *scale with time*.

> I've reworded
> this slightly and it now says "what will the implications be if in
> several years you instead find yourself with a large number of small
> customers."

I suggest "consider the implications" in place of "what will the
implications be...".  Also a user may choose a particular design (one
partition per customer) *because* of their business situation (small
number of large customers), so I suggest linking the two clauses with
"because".  With these two changes, the whole sentence will read more
connected, imho:

For example, if you choose to have one partition per customer because
you currently have a small number of large customers, consider the
implications if in several years you instead find yourself with a
large number of small customers.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: Why to index a "Recently DEAD" tuple when creating index
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw: oddity in costing presorted foreign scans withlocal stats