Home > mailing lists

Re: When to use PARTITION BY HASH? - Mailing list pgsql-performance

From	David Rowley
Subject	Re: When to use PARTITION BY HASH?
Date	June 8, 2020 00:23:10
Msg-id	CAApHDvq615Snu3bu7EyWP6z2WqhC=RZJW1dv4JSYvZVDQ_SCwA@mail.gmail.com Whole thread Raw
In response to	Re: When to use PARTITION BY HASH? (MichaelDBA <MichaelDBA@sqlexec.com>)
Responses	Re: When to use PARTITION BY HASH? ("Michaeldba@sqlexec.com" <Michaeldba@sqlexec.com>)
List	pgsql-performance

Tree view

On Sun, 7 Jun 2020 at 23:41, MichaelDBA <MichaelDBA@sqlexec.com> wrote:
> The article referenced below assumes a worst case scenario for bulk-loading with hash partitioned tables. It assumes
thatthe values being inserted are in strict ascending or descending order with no gaps (like a sequence number
incrementingby 1), thereby ensuring every partition is hit in order before repeating the process. If the values being
insertedare not strictly sequential with no gaps, then the performance is much better. Obviously, what part of the
tablesand indexes are in memory has a lot to do with it as well.

In PostgreSQL 12, COPY was modified to support bulk-inserts for
partitioned tables. This did speed up many scenarios. Internally, how
this works is that we maintain a series of multi insert buffers, one
per partition. We generally only flush those buffers to the table when
the buffer for the partition fills. However, there is a sort of
sanity limit [1] on the number of multi insert buffers we maintain at
once and currently, that is 32. Technically we could increase that
limit, but there would still need to be a limit. Unfortunately, for
this particular case, since we're most likely touching between 199-799
other partitions before hitting the first one again, that will mean
that we really don't get any multi-inserts, which is likely the reason
why the performance is worse for hash partitioning.

With PG12 and for this particular case, you're likely to see COPY
performance drop quite drastically when going from 32 to 33
partitions. The code was more designed for hitting partitions more
randomly rather than in this sort-of round-robin way that we're likely
to get from hash partitioning on a serial column.

David

[1] https://github.com/postgres/postgres/blob/master/src/backend/commands/copy.c#L2569

pgsql-performance by date:

From: MichaelDBA
Date: 07 June 2020, 14:41:28
Subject: Re: When to use PARTITION BY HASH?

From: "Michaeldba@sqlexec.com"
Date: 08 June 2020, 12:50:35
Subject: Re: When to use PARTITION BY HASH?

Re: When to use PARTITION BY HASH? - Mailing list pgsql-performance

Previous

Next