Thread: Partitioning and performance
I am testing partitioning of a large table. I am INHERITING child tables. It is using a range partitioning based on a sequence col, which also acts as the primary key. For inserts I am using a trigger which will redirect insert to the right table based on the value of the primary key. Based on my testing, I see that the insert speed is less than 10% different than a non partitioned table. I am using SET constraint_exclusion = on and I checked that via ANALYZE that the planner does not consider non qualifying child tables. yet, selects and updates based on the primary key show anywhere from 40 to 200% slowness as compared to non partition. One thing I notice is that, even with partition pruning, the planner scans the base table and the table matching the condition. Is that the additional overhead. I am attaching below the output of analyze. =========================== On a non partitioned table explain select count(*) from tstesting.account where account_row_inst = 101 ; Aggregate (cost=8.16..8.17 rows=1 width=0) -> Index Only Scan using account_pkey on account (cost=0.14..8.16 rows=1 width=0) Index Cond: (account_row_inst = 101) (3 rows) With partition pruning: Aggregate (cost=8.45..8.46 rows=1 width=0) -> Append (cost=0.00..8.44 rows=2 width=0) -> Seq Scan on account (cost=0.00..0.00 rows=1 width=0) Filter: (account_row_inst = 101) -> Index Only Scan using account_part1_pkey on account_part1 (cost=0.42..8.44 rows=1 width=0) Index Cond: (account_row_inst = 101) (6 rows) On a partitioned table, with no partition pruning. explain analyze select count(*) from tstesting.account where account_row_inst = 101 ; Aggregate (cost=29.77..29.78 rows=1 width=0) (actual time=0.032..0.032 rows=1 loops=1) -> Append (cost=0.00..29.76 rows=5 width=0) (actual time=0.029..0.029 rows=0 loops=1) -> Seq Scan on account (cost=0.00..0.00 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1) Filter: (account_row_inst = 101) -> Index Only Scan using account_part1_pkey on account_part1 (cost=0.42..4.44 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 -> Index Only Scan using account_part2_pkey on account_part2 (cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 -> Index Only Scan using account_part3_pkey on account_part3 (cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 -> Index Only Scan using account_part4_pkey on account_part4 (cost=0.42..8.44 rows=1 width=0) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 Planning time: 0.635 ms Execution time: 0.137 ms (18 rows)
> Have you set up constraints on the partitions? The planner needs to know > what is in the child tables so it can avoid scanning them. Yes. each child table is defined as follows CREATE TABLE TSTESTING.ACCOUNT_PART1 ( CHECK (ACCOUNT_ROW_INST BETWEEN 1001 and 271660)) INHERITS (TSTESTING.ACCOUNT); ALTER TABLE TSTESTING.ACCOUNT_PART1 ADD CONSTRAINT ACCOUNT_PART1_PKEY PRIMARY KEY (ACCOUNT_ROW_INST); Perhaps I was not clear. The planner is excluding partitions which can not contain the rows looked up in the WHERE clause. However it is still scanning the parent table. Aggregate (cost=8.45..8.46 rows=1 width=0) -> Append (cost=0.00..8.44 rows=2 width=0) -> Seq Scan on account (cost=0.00..0.00 rows=1 width=0) Filter: (account_row_inst = 101) -> Index Only Scan using account_part1_pkey on account_part1 (cost=0.42..8.44 rows=1 width=0) Index Cond: (account_row_inst = 101) (6 rows)
Am 28. Mai 2015 17:15:22 MESZ, schrieb Ravi Krishna <sravikrishna3@gmail.com>:
I am testing partitioning of a large table. I am INHERITING child tables.
It is using a range
partitioning based on a sequence col, which also acts as the primary
key. For inserts I am using a trigger which will redirect insert to
the right table based on the value of the primary key.
Based on my testing, I see that the insert speed is less than 10%
different than a non partitioned table. I am using SET
constraint_exclusion = on and I checked that via ANALYZE that the
planner does not consider non qualifying child tables.
yet, selects and updates based on the primary key show anywhere from
40 to 200% slowness as compared to non partition. One thing I notice
is that, even with partition pruning, the planner scans the base table
and the table matching the condition. Is that the additional overhead.
I am attaching below the output of analyze.
===========================
On a non partitioned table
explain select count(*) from tstesting.account where account_row_inst = 101 ;
Aggregate (cost=8.16..8.17 rows=1 width=0)
-> Index Only Scan using account_pkey on account (cost=0.14..8.16
rows=1 width=0)
Index Cond: (account_row_inst = 101)
(3 rows)
With partition pruning:
Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)
On a partitioned table, with no partition pruning.
explain analyze select count(*) from tstesting.account where
account_row_inst = 101 ;
Aggregate (cost=29.77..29.78 rows=1 width=0) (actual time=0.032..0.032
rows=1 loops=1)
-> Append (cost=0.00..29.76 rows=5 width=0) (actual time=0.029..0.029
rows=0 loops=1)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0) (actual
time=0.000..0.000 rows=0 loops=1)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..4.44 rows=1 width=0) (actual time=0.008..0.008 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part2_pkey on account_part2
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part3_pkey on account_part3
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
-> Index Only Scan using account_part4_pkey on account_part4
(cost=0.42..8.44 rows=1 width=0) (actual time=0.006..0.006 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
Planning time: 0.635 ms
Execution time: 0.137 ms
(18 rows)
Have you set up constraints on the partitions? The planner needs to know what is in the child tables so it can avoid scanning them.
Jan
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
Generally, when you partition, data should only be in child tables, and the parent table should be empty, otherwise you defeat the purpose of parttioning.`
On Thu, May 28, 2015 at 12:25 PM, Ravi Krishna <sravikrishna3@gmail.com> wrote:
> Have you set up constraints on the partitions? The planner needs to know
> what is in the child tables so it can avoid scanning them.
Yes. each child table is defined as follows
CREATE TABLE TSTESTING.ACCOUNT_PART1
( CHECK (ACCOUNT_ROW_INST BETWEEN 1001 and 271660))
INHERITS (TSTESTING.ACCOUNT);
ALTER TABLE TSTESTING.ACCOUNT_PART1 ADD CONSTRAINT ACCOUNT_PART1_PKEY
PRIMARY KEY (ACCOUNT_ROW_INST);
Perhaps I was not clear. The planner is excluding partitions which can
not contain the rows looked up in the WHERE clause. However it is
still scanning the parent table.--
Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
Am 28. Mai 2015 18:25:42 MESZ, schrieb Ravi Krishna <sravikrishna3@gmail.com>:
Have you set up constraints on the partitions? The planner needs to know
what is in the child tables so it can avoid scanning them.
Yes. each child table is defined as follows
CREATE TABLE TSTESTING.ACCOUNT_PART1
( CHECK (ACCOUNT_ROW_INST BETWEEN 1001 and 271660))
INHERITS (TSTESTING.ACCOUNT);
ALTER TABLE TSTESTING.ACCOUNT_PART1 ADD CONSTRAINT ACCOUNT_PART1_PKEY
PRIMARY KEY (ACCOUNT_ROW_INST);
Perhaps I was not clear. The planner is excluding partitions which can
not contain the rows looked up in the WHERE clause. However it is
still scanning the parent table.
Aggregate (cost=8.45..8.46 rows=1 width=0)
-> Append (cost=0.00..8.44 rows=2 width=0)
-> Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
-> Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)
You can have a look at pg_partman. It makes setting up partitioning quite easy and provides a tool to easily move existing data from parent to child tables.
Jan
Ravi Krishna <sravikrishna3@gmail.com> writes: > Perhaps I was not clear. The planner is excluding partitions which can > not contain the rows looked up in the WHERE clause. However it is > still scanning the parent table. Sure, because you don't have a constraint forbidding the parent from having a matching row, no? In older versions of PG there wasn't any way around this, but recent versions allow you to mark a constraint as NO INHERIT, which would let you attach such a constraint to the parent only. By and large, though, this doesn't really matter, since an empty parent table won't cost anything much to scan. If it's significant relative to the child table access time then you probably didn't need partitioning in the first place. regards, tom lane
On Thu, May 28, 2015 at 12:42 PM, Melvin Davidson <melvin6925@gmail.com> wrote: > > Generally, when you partition, data should only be in child tables, and the parent table should be empty, otherwise youdefeat the purpose of parttioning.` yes of course the parent table is empty. The trigger on insert is redirecting it to the proper child table. select count(*) from only tstesting.account ; count ------- 0 (1 row) select count(*) from only tstesting.account_part1 ; count ------- 83659 (1 row) select count(*) from only tstesting.account_part5 ; count ------- 83659 (1 row)
> By and large, though, this doesn't really matter, since an empty > parent table won't cost anything much to scan. If it's significant > relative to the child table access time then you probably didn't > need partitioning in the first place. Is there a rule of thumb as to at what size does the partitioning start performing better than non partitioned table.
On Thu, May 28, 2015 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sure, because you don't have a constraint forbidding the parent from > having a matching row, no? As suggested by you, I included a bogus condition in the parent table which will prevent any row addition in the parent table and made the constraint NO INHERIT. i run this SET constraint_exclusion = on; explain select * from tstesting.account where account_row_inst = 1001 ; Append (cost=0.14..8.16 rows=1 width=832) -> Index Scan using account_part1_pkey on account_part1 (cost=0.14..8.16 rows=1 width=832) Index Cond: (account_row_inst = 1001) (3 rows) The planner shows this for the non partitioned table Index Scan using account_pkey on account (cost=0.14..8.16 rows=1 width=832) Index Cond: (account_row_inst = 1001) (2 rows) So cost wise they both look same, still when i run the sql in a loop in large numbers, it takes rougly 1.8 to 2 times more than non partitioned table.
Ravi Krishna <sravikrishna3@gmail.com> writes: > Is there a rule of thumb as to at what size does the partitioning > start performing better than non partitioned table. Personally I'd not worry about partitioning until I had a table approaching maybe a billion (1e9) rows. You could argue that an order of magnitude either way, but it's just not worth the extra complexity for data volumes very much less than that. regards, tom lane
Ravi Krishna <sravikrishna3@gmail.com> writes: > So cost wise they both look same, still when i run the sql in a loop > in large numbers, it takes rougly 1.8 to 2 times more than non > partitioned table. If you're testing cases that only involve fetching a single row, the discrepancy could well be down to extra planning time. Proving that the other partitions don't need to be scanned is far from free. It's also worth realizing that for queries that fetch just one or a few rows, it's very unlikely that partitioning can beat an unpartitioned table, period. Basically, partitioning replaces a runtime search of the top levels of a large index with a plan-time proof that other partitions need not be visited. That is not going to be cheaper and could well be a lot more expensive. The situations where partitioning is useful boil down to: 1. You have repetitive, stylized requirements such as "every month, delete all data older than X months" that can be mapped to "drop the oldest partition" instead of doing an expensive table scan. 2. You can arrange things so that certain partitions are accessed far more often than others, thus directing most disk traffic to specific child tables that will remain in RAM cache most of the time. (In principle, you could get similar cache-friendly behavior from a clustered unpartitioned table, but it's usually too hard to ensure that such a table stays clustered.) It does not sound like your test case is exercising either of those win scenarios, and all you're measuring is the overhead of partitioning, which as I said is substantial. regards, tom lane