Re: When to use PARTITION BY HASH? - Mailing list pgsql-general

From Stephen Frost
Subject Re: When to use PARTITION BY HASH?
Date
Msg-id 20200602174712.GX6680@tamriel.snowman.net
Whole thread Raw
In response to When to use PARTITION BY HASH?  (Oleksandr Shulgin <oleksandr.shulgin@zalando.de>)
Responses Re: When to use PARTITION BY HASH?  (Oleksandr Shulgin <oleksandr.shulgin@zalando.de>)
List pgsql-general
Greetings,

Please don't cross post to multiple lists without any particular reason
for doing so- pick whichever list makes sense and post to that.

* Oleksandr Shulgin (oleksandr.shulgin@zalando.de) wrote:
> I was reading up on declarative partitioning[1] and I'm not sure what could
> be a possible application of Hash partitioning.

Yeah, I tend to agree with this.

> Is anyone actually using it?  What are typical use cases?  What benefits
> does such a partitioning scheme provide?

I'm sure folks are using it but that doesn't make it a good solution.

> On its face, it seems that it can only give you a number of tables which
> are smaller than the un-partitioned one, but I fail to see how it would
> provide any of the potential advantages listed in the documentation.

Having smaller tables can be helpful when it comes to dealing with
things like VACUUM (particularly since, even though we can avoid having
to scan the entire heap, we have to go through the indexes in order to
clean them up and generally larger tables have larger indexes),
however..

> With a reasonable hash function, the distribution of rows across partitions
> should be more or less equal, so I wouldn't expect any of the following to
> hold true:
> - "...most of the heavily accessed rows of the table are in a single
> partition or a small number of partitions."
> - "Bulk loads and deletes can be accomplished by adding or removing
> partitions...",
> etc.
>
> That *might* turn out to be the case with a small number of distinct values
> in the partitioning column(s), but then why rely on hash assignment instead
> of using PARTITION BY LIST in the first place?

You're entirely correct with this- there's certainly no small number of
situations where you end up with a 'hot' partition when using hashing
(which is true in other RDBMS's too, of course...) and that ends up
being pretty painful to deal with.

Also, you're right that you don't get to do bulk load/drop when using
hash partitioning, which is absolutely one of the largest benefits to
partitioning in the first place, so, yeah, their usefullness is.. rather
limited.  Better to do your own partitioning based on actual usage
patterns that you know and the database's hash function certainly
doesn't.

Thanks,

Stephen

Attachment

pgsql-general by date:

Previous
From: Michel Pelletier
Date:
Subject: Re: When to use PARTITION BY HASH?
Next
From: TALLURI Nareshkumar
Date:
Subject: RE: LOG: could not send data to client: Broken pipe