Re: When to use PARTITION BY HASH? - Mailing list pgsql-general

From Oleksandr Shulgin
Subject Re: When to use PARTITION BY HASH?
Date
Msg-id CACACo5RxgQBO_cB97yFkWwLo_-Vs3WtUei2KszGg2LkJ95uiXA@mail.gmail.com
Whole thread Raw
In response to Re: When to use PARTITION BY HASH?  (Michel Pelletier <pelletier.michel@gmail.com>)
Responses Re: When to use PARTITION BY HASH?  (Jeff Janes <jeff.janes@gmail.com>)
Re: When to use PARTITION BY HASH?  (Michel Pelletier <pelletier.michel@gmail.com>)
List pgsql-general
(sticking to pgsql-general)

On Tue, Jun 2, 2020 at 7:45 PM Michel Pelletier <pelletier.michel@gmail.com> wrote:

On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin <oleksandr.shulgin@zalando.de> wrote:

I was reading up on declarative partitioning[1] and I'm not sure what could be a possible application of Hash partitioning.

Is anyone actually using it?  What are typical use cases?  What benefits does such a partitioning scheme provide?

On its face, it seems that it can only give you a number of tables which are smaller than the un-partitioned one, but I fail to see how it would provide any of the potential advantages listed in the documentation.

 
From my point of view, hash partitioning is very useful for spreading out high insert/update load.

Do you also assign the partitions to different tablespaces as you've hinted below or do you see performance improvement from partitioning alone?  How does that work?  Does it give better  results than using a RAID to spread the disk IO, for example?

Yes its' true you end up with more smaller tables than one big large one, but remember the indexes are (often) tree data structures.  Smaller trees are faster than bigger trees.  By making the indexes smaller they are faster.  Since the planner can knows to only examine the specific index it needs, this ends up being a lot faster.

That sounds logical, but can it be demonstrated?  If the index(es) fit in memory fully, it doesn't make a measurable difference, I guess?

With hash partitioning you are not expected, in general, to end up with a small number of partitions being accessed more heavily than the rest.  So your indexes will also not fit into memory.

I have the feeling that using a hash function to distribute rows simply contradicts the basic assumption of when you would think of partitioning your table at all: that is to make sure the most active part of the table and indexes is small enough to be cached in memory.

Regards,
--
Alex

pgsql-general by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Replication conflicts despite hot_standby_feedback = on?
Next
From: Laurenz Albe
Date:
Subject: Re: Replication conflicts despite hot_standby_feedback = on?