Thread: Partitioning and performance

Partitioning and performance

From
Ravi Krishna
Date:
I am testing partitioning of a large table. I am INHERITING child tables.
It is using a range
partitioning based on a sequence col, which also acts as the primary
key. For inserts I am using a trigger which will redirect insert to
the right table based on the value of the primary key.

Based on my testing, I see that the insert speed is less than 10%
different than a non partitioned table. I am using  SET
constraint_exclusion = on and I checked that via ANALYZE that the
planner does not consider non qualifying child tables.

yet, selects and updates based on the primary key show anywhere from
40 to 200% slowness as compared to non partition. One thing I notice
is that, even with partition pruning, the planner scans the base table
and the table matching the condition. Is that the additional overhead.

I am attaching below the output of analyze.

===========================
On a non partitioned table

explain select count(*) from tstesting.account where account_row_inst = 101 ;
Aggregate (cost=8.16..8.17 rows=1 width=0)
-> Index Only Scan using account_pkey on account (cost=0.14..8.16
rows=1 width=0)
Index Cond: (account_row_inst = 101)
(3 rows)


With partition pruning:

Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)

On a partitioned table, with no partition pruning.

explain analyze select count(*) from tstesting.account where
account_row_inst = 101 ;
Aggregate (cost=29.77..29.78 rows=1 width=0) (actual time=0.032..0.032
rows=1 loops=1)
-> Append (cost=0.00..29.76 rows=5 width=0) (actual time=0.029..0.029
rows=0 loops=1)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0) (actual
time=0.000..0.000 rows=0 loops=1)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..4.44 rows=1 width=0) (actual time=0.008..0.008 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part2_pkey on account_part2
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part3_pkey on account_part3
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part4_pkey on account_part4
(cost=0.42..8.44 rows=1 width=0) (actual time=0.006..0.006 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
Planning time: 0.635 ms
Execution time: 0.137 ms
(18 rows)


Re: Partitioning and performance

From
Ravi Krishna
Date:
> Have you set up constraints on the partitions? The planner needs to know
> what is in the child tables so it can avoid scanning them.

Yes. each child table is defined as follows

CREATE TABLE TSTESTING.ACCOUNT_PART1

 ( CHECK (ACCOUNT_ROW_INST BETWEEN 1001 and 271660))

 INHERITS (TSTESTING.ACCOUNT);

ALTER TABLE TSTESTING.ACCOUNT_PART1 ADD CONSTRAINT ACCOUNT_PART1_PKEY
PRIMARY KEY (ACCOUNT_ROW_INST);

Perhaps I was not clear. The planner is excluding partitions which can
not contain the rows looked up in the WHERE clause. However it is
still scanning the parent table.

Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)


Re: Partitioning and performance

From
Jan Lentfer
Date:


Am 28. Mai 2015 17:15:22 MESZ, schrieb Ravi Krishna <sravikrishna3@gmail.com>:
I am testing partitioning of a large table. I am INHERITING child tables.
It is using a range
partitioning based on a sequence col, which also acts as the primary
key. For inserts I am using a trigger which will redirect insert to
the right table based on the value of the primary key.

Based on my testing, I see that the insert speed is less than 10%
different than a non partitioned table. I am using SET
constraint_exclusion = on and I checked that via ANALYZE that the
planner does not consider non qualifying child tables.

yet, selects and updates based on the primary key show anywhere from
40 to 200% slowness as compared to non partition. One thing I notice
is that, even with partition pruning, the planner scans the base table
and the table matching the condition. Is that the additional overhead.

I am attaching below the output of analyze.

===========================
On a non partitioned table

explain select count(*) from tstesting.account where account_row_inst = 101 ;
Aggregate (cost=8.16..8.17 rows=1 width=0)
-> Index Only Scan using account_pkey on account (cost=0.14..8.16
rows=1 width=0)
Index Cond: (account_row_inst = 101)
(3 rows)


With partition pruning:

Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)

On a partitioned table, with no partition pruning.

explain analyze select count(*) from tstesting.account where
account_row_inst = 101 ;
Aggregate (cost=29.77..29.78 rows=1 width=0) (actual time=0.032..0.032
rows=1 loops=1)
-> Append (cost=0.00..29.76 rows=5 width=0) (actual time=0.029..0.029
rows=0 loops=1)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0) (actual
time=0.000..0.000 rows=0 loops=1)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..4.44 rows=1 width=0) (actual time=0.008..0.008 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part2_pkey on account_part2
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part3_pkey on account_part3
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part4_pkey on account_part4
(cost=0.42..8.44 rows=1 width=0) (actual time=0.006..0.006 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
Planning time: 0.635 ms
Execution time: 0.137 ms
(18 rows)


Have you set up constraints on the partitions? The planner needs to know what is in the child tables so it can avoid scanning them.

Jan


Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Re: Partitioning and performance

From
Melvin Davidson
Date:
Generally, when you partition, data should only be in child tables, and the parent table should be empty, otherwise you defeat the purpose of parttioning.`

On Thu, May 28, 2015 at 12:25 PM, Ravi Krishna <sravikrishna3@gmail.com> wrote:
> Have you set up constraints on the partitions? The planner needs to know
> what is in the child tables so it can avoid scanning them.

Yes. each child table is defined as follows

CREATE TABLE TSTESTING.ACCOUNT_PART1

 ( CHECK (ACCOUNT_ROW_INST BETWEEN 1001 and 271660))

 INHERITS (TSTESTING.ACCOUNT);

ALTER TABLE TSTESTING.ACCOUNT_PART1 ADD CONSTRAINT ACCOUNT_PART1_PKEY
PRIMARY KEY (ACCOUNT_ROW_INST);

Perhaps I was not clear. The planner is excluding partitions which can
not contain the rows looked up in the WHERE clause. However it is
still scanning the parent table.

Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Partitioning and performance

From
Jan Lentfer
Date:


Am 28. Mai 2015 18:25:42 MESZ, schrieb Ravi Krishna <sravikrishna3@gmail.com>:
Have you set up constraints on the partitions? The planner needs to know
what is in the child tables so it can avoid scanning them.

Yes. each child table is defined as follows

CREATE TABLE TSTESTING.ACCOUNT_PART1

( CHECK (ACCOUNT_ROW_INST BETWEEN 1001 and 271660))

INHERITS (TSTESTING.ACCOUNT);

ALTER TABLE TSTESTING.ACCOUNT_PART1 ADD CONSTRAINT ACCOUNT_PART1_PKEY
PRIMARY KEY (ACCOUNT_ROW_INST);

Perhaps I was not clear. The planner is excluding partitions which can
not contain the rows looked up in the WHERE clause. However it is
still scanning the parent table.

Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)

You can have a look at pg_partman. It makes setting up partitioning quite easy and provides a tool to easily move existing data from parent to child tables.

Jan

Re: Partitioning and performance

From
Tom Lane
Date:
Ravi Krishna <sravikrishna3@gmail.com> writes:
> Perhaps I was not clear. The planner is excluding partitions which can
> not contain the rows looked up in the WHERE clause. However it is
> still scanning the parent table.

Sure, because you don't have a constraint forbidding the parent from
having a matching row, no?

In older versions of PG there wasn't any way around this, but recent
versions allow you to mark a constraint as NO INHERIT, which would
let you attach such a constraint to the parent only.

By and large, though, this doesn't really matter, since an empty
parent table won't cost anything much to scan.  If it's significant
relative to the child table access time then you probably didn't
need partitioning in the first place.

            regards, tom lane


Re: Partitioning and performance

From
Ravi Krishna
Date:
On Thu, May 28, 2015 at 12:42 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
>
> Generally, when you partition, data should only be in child tables, and the parent table should be empty, otherwise
youdefeat the purpose of parttioning.` 

yes of course the parent table is empty. The trigger on insert is
redirecting it to the proper child table.

select count(*) from only tstesting.account ;
 count
-------
     0
(1 row)

select count(*) from only tstesting.account_part1 ;
 count
-------
 83659
(1 row)

select count(*) from only tstesting.account_part5 ;
 count
-------
 83659
(1 row)


Re: Partitioning and performance

From
Ravi Krishna
Date:
> By and large, though, this doesn't really matter, since an empty
> parent table won't cost anything much to scan.  If it's significant
> relative to the child table access time then you probably didn't
> need partitioning in the first place.

Is there a rule of thumb as to at what size does the partitioning
start performing better than non partitioned table.


Re: Partitioning and performance

From
Ravi Krishna
Date:
On Thu, May 28, 2015 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Sure, because you don't have a constraint forbidding the parent from
> having a matching row, no?

As suggested by you, I included a bogus condition in the parent table
which will prevent any row addition in the parent table and made the
constraint NO INHERIT.

i run this

SET constraint_exclusion = on;
explain select * from tstesting.account where account_row_inst = 1001 ;



Append  (cost=0.14..8.16 rows=1 width=832)
   ->  Index Scan using account_part1_pkey on account_part1
(cost=0.14..8.16 rows=1 width=832)
         Index Cond: (account_row_inst = 1001)
(3 rows)

The planner shows this for the non partitioned table

 Index Scan using account_pkey on account  (cost=0.14..8.16 rows=1 width=832)
   Index Cond: (account_row_inst = 1001)
(2 rows)

So cost wise they both  look same, still when i run the sql in a loop
in large numbers, it takes rougly 1.8 to 2 times more than non
partitioned table.


Re: Partitioning and performance

From
Tom Lane
Date:
Ravi Krishna <sravikrishna3@gmail.com> writes:
> Is there a rule of thumb as to at what size does the partitioning
> start performing better than non partitioned table.

Personally I'd not worry about partitioning until I had a table
approaching maybe a billion (1e9) rows.  You could argue that
an order of magnitude either way, but it's just not worth the
extra complexity for data volumes very much less than that.

            regards, tom lane


Re: Partitioning and performance

From
Tom Lane
Date:
Ravi Krishna <sravikrishna3@gmail.com> writes:
> So cost wise they both  look same, still when i run the sql in a loop
> in large numbers, it takes rougly 1.8 to 2 times more than non
> partitioned table.

If you're testing cases that only involve fetching a single row,
the discrepancy could well be down to extra planning time.  Proving
that the other partitions don't need to be scanned is far from free.

It's also worth realizing that for queries that fetch just one or
a few rows, it's very unlikely that partitioning can beat an unpartitioned
table, period.  Basically, partitioning replaces a runtime search of the
top levels of a large index with a plan-time proof that other partitions
need not be visited.  That is not going to be cheaper and could well be a
lot more expensive.

The situations where partitioning is useful boil down to:

1. You have repetitive, stylized requirements such as "every month,
delete all data older than X months" that can be mapped to "drop
the oldest partition" instead of doing an expensive table scan.

2. You can arrange things so that certain partitions are accessed
far more often than others, thus directing most disk traffic to
specific child tables that will remain in RAM cache most of the time.
(In principle, you could get similar cache-friendly behavior from a
clustered unpartitioned table, but it's usually too hard to ensure
that such a table stays clustered.)

It does not sound like your test case is exercising either of those
win scenarios, and all you're measuring is the overhead of partitioning,
which as I said is substantial.

            regards, tom lane