Thread: hash partitioning

hash partitioning

From

"David West"

Date:

03 September 2008, 13:24:30

Hi folks,

I’m wondering why the postgres planner is not capable of determining the correct partition for a simple select for the following partitioning scheme, in which I’d like to automatically divide rows into four sub-tables, ie, a simple form of hash partitioning.

Any ideas why this doesn’t work, or a work around to make it work? I would have expected the query plan below to only query the test_1 table.

Regards

David

CREATE TABLE test (

id int not null primary key

);

CREATE TABLE test_0 ( CHECK ( id % 4 = 0) ) INHERITS (test);

CREATE TABLE test_1 ( CHECK ( id % 4 = 1) ) INHERITS (test);

CREATE TABLE test_2 ( CHECK ( id % 4 = 2) ) INHERITS (test);

CREATE TABLE test_3 ( CHECK ( id % 4 = 3) ) INHERITS (test);

CREATE RULE test_0 AS ON INSERT TO test WHERE ( id % 4 = 0 ) DO INSTEAD INSERT INTO test_0 VALUES ( NEW.id );

CREATE RULE test_1 AS ON INSERT TO test WHERE ( id % 4 = 1 ) DO INSTEAD INSERT INTO test_1 VALUES ( NEW.id );

CREATE RULE test_2 AS ON INSERT TO test WHERE ( id % 4 = 2 ) DO INSTEAD INSERT INTO test_2 VALUES ( NEW.id );

CREATE RULE test_3 AS ON INSERT TO test WHERE ( id % 4 = 3 ) DO INSTEAD INSERT INTO test_3 VALUES ( NEW.id );

insert into test values(1);

explain analyse select * from test;

"Result (cost=0.00..170.00 rows=12000 width=4) (actual time=0.027..0.042 rows=1 loops=1)"

" -> Append (cost=0.00..170.00 rows=12000 width=4) (actual time=0.020..0.032 rows=1 loops=1)"

" -> Seq Scan on test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.002..0.002 rows=0 loops=1)"

" -> Seq Scan on test_0 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.001..0.001 rows=0 loops=1)"

" -> Seq Scan on test_1 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.007..0.009 rows=1 loops=1)"

" -> Seq Scan on test_2 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.001..0.001 rows=0 loops=1)"

" -> Seq Scan on test_3 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.001..0.001 rows=0 loops=1)"

"Total runtime: 0.115 ms"

Re: hash partitioning

From

"Scott Marlowe"

Date:

03 September 2008, 13:57:40

On Wed, Sep 3, 2008 at 10:24 AM, David West <david.west@cusppoint.com> wrote:
> Hi folks,
>
>
>
> I'm wondering why the postgres planner is not capable of determining the
> correct partition for a simple select for the following partitioning scheme,
> in which I'd like to automatically divide rows into four sub-tables, ie, a
> simple form of hash partitioning.

Have you got constraint_exclusion turned on?

Re: hash partitioning

From

Tom Lane

Date:

03 September 2008, 14:21:32

"David West" <david.west@cusppoint.com> writes:
> I'm wondering why the postgres planner is not capable of determining the
> correct partition for a simple select for the following partitioning scheme,

The planner doesn't know anything about the behavior of %.
Heed the fine manual's advice:

    Keep the partitioning constraints simple, else the planner may not be
    able to prove that partitions don't need to be visited. Use simple
    equality conditions for list partitioning, or simple range tests for
    range partitioning, as illustrated in the preceding examples. A good
    rule of thumb is that partitioning constraints should contain only
    comparisons of the partitioning column(s) to constants using
    B-tree-indexable operators.


            regards, tom lane

Re: hash partitioning

From

William Garrison

Date:

03 September 2008, 14:36:41

When I attended the PostgreSQL East conference, someone presented a way of doing this that they used for http://www.mailermailer.com/ and they did this:

SET constraint_exclusion = on;
EXPLAIN
SELECT
    *
FROM
    test
WHERE
    id = 7
    AND id % 4 = 3

Their business layer then generated the "AND id % 4 = 3" part of the SQL. :(

Does anyone know if Oracle or any other database can handle this?

Does this work with stored procs? Ex, suppose a stored procedure like this:
    get_from_test(id int, id_mod_4 int)
    SELECT id FROM test WHERE id = $1 and id % 4 = $2;
Would the optimizer know the correct table to use in that case?

David West wrote:

Hi folks,

I’m wondering why the postgres planner is not capable of determining the correct partition for a simple select for the following partitioning scheme, in which I’d like to automatically divide rows into four sub-tables, ie, a simple form of hash partitioning.

Any ideas why this doesn’t work, or a work around to make it work? I would have expected the query plan below to only query the test_1 table.

Regards
David

CREATE TABLE test (
    id         int not null primary key
);

CREATE TABLE test_0 ( CHECK ( id % 4 = 0) ) INHERITS (test);
CREATE TABLE test_1 ( CHECK ( id % 4 = 1) ) INHERITS (test);
CREATE TABLE test_2 ( CHECK ( id % 4 = 2) ) INHERITS (test);
CREATE TABLE test_3 ( CHECK ( id % 4 = 3) ) INHERITS (test);

CREATE RULE test_0 AS ON INSERT TO test WHERE ( id % 4 = 0 ) DO INSTEAD INSERT INTO test_0 VALUES ( NEW.id );
CREATE RULE test_1 AS ON INSERT TO test WHERE ( id % 4 = 1 ) DO INSTEAD INSERT INTO test_1 VALUES ( NEW.id );
CREATE RULE test_2 AS ON INSERT TO test WHERE ( id % 4 = 2 ) DO INSTEAD INSERT INTO test_2 VALUES ( NEW.id );
CREATE RULE test_3 AS ON INSERT TO test WHERE ( id % 4 = 3 ) DO INSTEAD INSERT INTO test_3 VALUES ( NEW.id );

insert into test values(1);

explain analyse select * from test;
"Result (cost=0.00..170.00 rows=12000 width=4) (actual time=0.027..0.042 rows=1 loops=1)"
" -> Append (cost=0.00..170.00 rows=12000 width=4) (actual time=0.020..0.032 rows=1 loops=1)"
"        -> Seq Scan on test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.002..0.002 rows=0 loops=1)"
"        -> Seq Scan on test_0 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.001..0.001 rows=0 loops=1)"
"        -> Seq Scan on test_1 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.007..0.009 rows=1 loops=1)"
"        -> Seq Scan on test_2 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.001..0.001 rows=0 loops=1)"
"        -> Seq Scan on test_3 test (cost=0.00..34.00 rows=2400 width=4) (actual time=0.001..0.001 rows=0 loops=1)"
"Total runtime: 0.115 ms"

Re: hash partitioning

From

"Ian Harding"

Date:

03 September 2008, 16:15:09

On Wed, Sep 3, 2008 at 10:36 AM, William Garrison <postgres@mobydisk.com> wrote:
> When I attended the PostgreSQL East conference, someone presented a way of
> doing this that they used for http://www.mailermailer.com/ and they did
> this:
>
> SET constraint_exclusion = on;
> EXPLAIN
> SELECT
>     *
> FROM
>     test
> WHERE
>     id = 7
>     AND id % 4 = 3
>
> Their business layer then generated the "AND id % 4 = 3" part of the SQL.
> :(
>
> Does anyone know if Oracle or any other database can handle this?
>

Oracle has support for hash partitioning like so:

CREATE TABLE sales_hash
(salesman_id  NUMBER(5),
salesman_name VARCHAR2(30),
sales_amount  NUMBER(10),
week_no       NUMBER(2))
PARTITION BY HASH(salesman_id)
PARTITIONS 4
STORE IN (data1, data2, data3, data4);

There is no need to specify which partition to search or reference any
hash function in queries, it's all magic.

>
> David West wrote:
>
> Hi folks,
>
>
>
> I'm wondering why the postgres planner is not capable of determining the
> correct partition for a simple select for the following partitioning scheme,
> in which I'd like to automatically divide rows into four sub-tables, ie, a
> simple form of hash partitioning.
>
>
>
> Any ideas why this doesn't work, or a work around to make it work?  I would
> have expected the query plan below to only query the test_1 table.
>
>
>