Thread: Re: When to use PARTITION BY HASH?

Re: When to use PARTITION BY HASH?

From
Justin Pryzby
Date:
> To: pgsql-general@lists.postgresql.org, pgsql-performance@lists.postgresql.org

Please don't cross post to multiple lists.

On Tue, Jun 02, 2020 at 07:17:11PM +0200, Oleksandr Shulgin wrote:
> I was reading up on declarative partitioning[1] and I'm not sure what could
> be a possible application of Hash partitioning.

It's a good question.  See Tom's complaint here.
https://www.postgresql.org/message-id/31605.1586112900%40sss.pgh.pa.us

It *does* provide the benefit of smaller indexes and smaller tables, which
might allow seq scans to outpeform index scans.

It's maybe only useful for equality conditions on the partition key, and not
for ranges.  Here, it scans a single partition:

postgres=# CREATE TABLE t(i int) PARTITION BY HASH(i); CREATE TABLE t1 PARTITION OF t FOR VALUES WITH (REMAINDER 0,
MODULUS3);
 
postgres=# CREATE TABLE t2 PARTITION OF t FOR VALUES WITH (MODULUS 3, REMAINDER 1);
postgres=# CREATE TABLE t3 PARTITION OF t FOR VALUES WITH (MODULUS 3, REMAINDER 2);
postgres=# INSERT INTO t SELECT i%9 FROM generate_series(1,9999)i; ANALYZE t;
postgres=# explain analyze SELECT * FROM t WHERE i=3;
 Seq Scan on t2  (cost=0.00..75.55 rows=2222 width=4) (actual time=0.021..0.518 rows=2222 loops=1)
   Filter: (i = 3)
   Rows Removed by Filter: 2222

-- 
Justin



Re: When to use PARTITION BY HASH?

From
Oleksandr Shulgin
Date:
On Tue, Jun 2, 2020 at 7:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> To: pgsql-general@lists.postgresql.org, pgsql-performance@lists.postgresql.org

Please don't cross post to multiple lists.

On Tue, Jun 02, 2020 at 07:17:11PM +0200, Oleksandr Shulgin wrote:
> I was reading up on declarative partitioning[1] and I'm not sure what could
> be a possible application of Hash partitioning.

It's a good question.  See Tom's complaint here.
https://www.postgresql.org/message-id/31605.1586112900%40sss.pgh.pa.us

It *does* provide the benefit of smaller indexes and smaller tables, which
might allow seq scans to outpeform index scans.

It's maybe only useful for equality conditions on the partition key, and not
for ranges.  Here, it scans a single partition:

postgres=# CREATE TABLE t(i int) PARTITION BY HASH(i); CREATE TABLE t1 PARTITION OF t FOR VALUES WITH (REMAINDER 0, MODULUS 3);
postgres=# CREATE TABLE t2 PARTITION OF t FOR VALUES WITH (MODULUS 3, REMAINDER 1);
postgres=# CREATE TABLE t3 PARTITION OF t FOR VALUES WITH (MODULUS 3, REMAINDER 2);
postgres=# INSERT INTO t SELECT i%9 FROM generate_series(1,9999)i; ANALYZE t;
postgres=# explain analyze SELECT * FROM t WHERE i=3;
 Seq Scan on t2  (cost=0.00..75.55 rows=2222 width=4) (actual time=0.021..0.518 rows=2222 loops=1)
   Filter: (i = 3)
   Rows Removed by Filter: 2222

I see.  So it works with low cardinality in the partitioned column.  With high cardinality an index scan on an unpartitioned table would be preferable I guess.

The documentation page I've linked only contains examples around partitioning BY RANGE.  I believe it'd be helpful to extend it with some meaningful examples for LIST and HASH partitioning.

Regards,
-- 
Alex

Re: When to use PARTITION BY HASH?

From
Justin Pryzby
Date:
On Wed, Jun 03, 2020 at 09:45:48AM +0200, Oleksandr Shulgin wrote:
> I see.  So it works with low cardinality in the partitioned column.  With
> high cardinality an index scan on an unpartitioned table would be
> preferable I guess.
> 
> The documentation page I've linked only contains examples around
> partitioning BY RANGE.  I believe it'd be helpful to extend it with some
> meaningful examples for LIST and HASH partitioning.

I agree.  I think it would also be useful to mention the "benefits" which
aren't likely to apply to hash partitioning.

Would you want to propose an example to include ?
Eventually it needs to be submitted as a patch to -hackers.

-- 
Justin