Home > mailing lists

Re: When to use PARTITION BY HASH? - Mailing list pgsql-general

From	David Rowley
Subject	Re: When to use PARTITION BY HASH?
Date	June 8, 2020 21:25:12
Msg-id	CAApHDvqsGdqZR4hMi3W3POvhOSMLcEB+fPprY_OgDZUN8gb01Q@mail.gmail.com Whole thread Raw
In response to	Re: When to use PARTITION BY HASH? (Ron <ronljohnsonjr@gmail.com>)
List	pgsql-general

Tree view

On Tue, 9 Jun 2020 at 01:07, Ron <ronljohnsonjr@gmail.com> wrote:
>
> On 6/8/20 3:40 AM, Oleksandr Shulgin wrote:
> [snip]
>
> I've found the original commit adding this feature in version 11:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e
> It says:
>
> "Hash partitioning is useful when you want to partition a growing data
> set evenly.  This can be useful to keep table sizes reasonable, which
> makes maintenance operations such as VACUUM faster, or to enable
> partition-wise join."
>
>
> How does hashed (meaning "randomly?) distribution of records make partition-wise joins more efficient?

Hash partitioning certainly does not mean putting the tuple in some
random partition. It means putting the tuple in the partition with the
correct remainder value after dividing the hash value by the largest
partition modulus.

> Or -- since I interpret that as having to do with "locality of data" -- am I misunderstanding the meaning of
"partition-wisejoins"?

If it was not a partitioned table before then partition-wise joins
wouldn't be possible.  Having partition-wise joins could make joining
two identically partitioned tables faster. We need only look in the
corresponding partition on the other side of the join for join
partners for each tuple. For hash joins, hash tables can be smaller,
which can mean not having to batch, and possibly having the hash table
fit better into a CPU cache. For merge joins, sorts, having the data
partially pre-sorted in chunks means fewer operations for qsort which
can result in speedups.

David

pgsql-general by date:

From: Alistair Johnson
Date: 08 June 2020, 21:21:43
Subject: Is it possible to use keywords (date units) in a function definition?

From: "David G. Johnston"
Date: 08 June 2020, 22:10:42
Subject: Re: Is it possible to use keywords (date units) in a function definition?

Re: When to use PARTITION BY HASH? - Mailing list pgsql-general

Previous

Next