Thread: pg12 partitions show bad performance vs pg96
Hey,
I upgraded from 96 to 12 in our test env and I'm seeing that for queries that involve join operation between a partition table and other tables there is degradation is performance compared to pg96 performance.
My machine : 8cpu,16gb,regular hd,linux redhat 6
pg settings :
max_wal_size = 2GB
min_wal_size = 1GB
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
log_checkpoints = on
log_lock_waits = on
log_temp_files = 1024
log_min_duration_statement = 10000
log_autovacuum_min_duration = 10000
standard_conforming_strings = off
max_locks_per_transaction = 5000
max_connections = 500
log_line_prefix = '%t %d %p '
random_page_cost = 4
deadlock_timeout = 5s
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 32764
log_directory = 'pg_log'
enable_partitionwise_join = on # for pg12
enable_partitionwise_aggregate = on # for pg12
listen_addresses = '*'
ssl = on
maintenance_work_mem = 333MB
work_mem = 16MB
shared_buffers = 4020MB
effective_cache_size = 8040MB
min_wal_size = 1GB
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
log_checkpoints = on
log_lock_waits = on
log_temp_files = 1024
log_min_duration_statement = 10000
log_autovacuum_min_duration = 10000
standard_conforming_strings = off
max_locks_per_transaction = 5000
max_connections = 500
log_line_prefix = '%t %d %p '
random_page_cost = 4
deadlock_timeout = 5s
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 32764
log_directory = 'pg_log'
enable_partitionwise_join = on # for pg12
enable_partitionwise_aggregate = on # for pg12
listen_addresses = '*'
ssl = on
maintenance_work_mem = 333MB
work_mem = 16MB
shared_buffers = 4020MB
effective_cache_size = 8040MB
postgresql12.2
I used this table as the joined table for both cases :
create table iot_device(id serial primary key,name text);
insert into iot_device(name) select generate_series(1,100)||'a';
In pg96 I created the following regular table :
create table iot_data(id serial primary key,data text,metadata bigint,device bigint references iot_device(id));
inserted the data :
insert into iot_data select generate_series(1,10000000),random()*10,random()*254,random()*99+1;
In pg12 I created a table with 3 hash partitiones :
create table iot_data(id serial ,data text,metadata bigint,device bigint references iot_device(id),primary key(id,device)) partition by hash(device);
create table iot_data_0 partition of iot_data for values with (MODULUS 3, remainder 0);
create table iot_data_1 partition of iot_data for values with (MODULUS 3, remainder 1);
create table iot_data_2 partition of iot_data for values with (MODULUS 3, remainder 2);
create table iot_data_0 partition of iot_data for values with (MODULUS 3, remainder 0);
create table iot_data_1 partition of iot_data for values with (MODULUS 3, remainder 1);
create table iot_data_2 partition of iot_data for values with (MODULUS 3, remainder 2);
I generated a dump of the data in the pg96 machine and inserted it into the pg12 db :
pg_dump -d postgres -U postgres -a -t iot_data > iot_data.dump
psql -d postgres -U postgres -f -h pg12_machine /tmp/iot_data.dump
postgres=# select count(*) from iot_data_0;
count
---------
3028682
(1 row)
postgres =# select count(*) from iot_data_1;
count
---------
3234335
(1 row)
postgres =# select count(*) from iot_data_2;
count
---------
3736983
(1 row)
count
---------
3028682
(1 row)
postgres =# select count(*) from iot_data_1;
count
---------
3234335
(1 row)
postgres =# select count(*) from iot_data_2;
count
---------
3736983
(1 row)
create index on iot_data(metadata,lower(data));
vacuum analyze iot_data;
and now for the performance:
query : explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';
PG12 :
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=5.16..773.61 rows=2 width=43) (actual time=2.858..2.858 rows=0 loops=1)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.014..0.020 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Append (cost=5.16..771.30 rows=6 width=36) (actual time=2.835..2.835 rows=0 loops=1)
-> Bitmap Heap Scan on iot_data_0 da (cost=5.16..233.78 rows=2 width=36) (actual time=2.829..2.829 rows=0 loops=1)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_0_metadata_lower_idx (cost=0.00..5.14 rows=59 width=0) (actual time=2.827..2.827 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
-> Bitmap Heap Scan on iot_data_1 da_1 (cost=5.20..249.32 rows=2 width=37) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_1_metadata_lower_idx (cost=0.00..5.18 rows=63 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
-> Bitmap Heap Scan on iot_data_2 da_2 (cost=5.30..288.16 rows=2 width=36) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_2_metadata_lower_idx (cost=0.00..5.29 rows=73 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Planning Time: 8.157 ms
Execution Time: 2.920 ms
(22 rows)
---------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=5.16..773.61 rows=2 width=43) (actual time=2.858..2.858 rows=0 loops=1)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.014..0.020 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Append (cost=5.16..771.30 rows=6 width=36) (actual time=2.835..2.835 rows=0 loops=1)
-> Bitmap Heap Scan on iot_data_0 da (cost=5.16..233.78 rows=2 width=36) (actual time=2.829..2.829 rows=0 loops=1)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_0_metadata_lower_idx (cost=0.00..5.14 rows=59 width=0) (actual time=2.827..2.827 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
-> Bitmap Heap Scan on iot_data_1 da_1 (cost=5.20..249.32 rows=2 width=37) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_1_metadata_lower_idx (cost=0.00..5.18 rows=63 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
-> Bitmap Heap Scan on iot_data_2 da_2 (cost=5.30..288.16 rows=2 width=36) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_2_metadata_lower_idx (cost=0.00..5.29 rows=73 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Planning Time: 8.157 ms
Execution Time: 2.920 ms
(22 rows)
PG96 :
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6.57..397.19 rows=2 width=44) (actual time=0.121..0.121 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.016..0.022 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Bitmap Heap Scan on iot_data da (cost=6.57..392.49 rows=196 width=37) (actual time=0.097..0.097 rows=0 loops=1)
Recheck Cond: (metadata = 50)
Filter: (lower(data) ~~ '50'::text)
-> Bitmap Index Scan on iot_data_metadata_lower_idx (cost=0.00..6.52 rows=196 width=0) (actual time=0.095..0.095 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Planning time: 0.815 ms
Execution time: 0.158 ms
(12 rows)
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6.57..397.19 rows=2 width=44) (actual time=0.121..0.121 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.016..0.022 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Bitmap Heap Scan on iot_data da (cost=6.57..392.49 rows=196 width=37) (actual time=0.097..0.097 rows=0 loops=1)
Recheck Cond: (metadata = 50)
Filter: (lower(data) ~~ '50'::text)
-> Bitmap Index Scan on iot_data_metadata_lower_idx (cost=0.00..6.52 rows=196 width=0) (actual time=0.095..0.095 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Planning time: 0.815 ms
Execution time: 0.158 ms
(12 rows)
As you can see, the results are better in pg96. This example only shows the results for a small data set. In bigger data sets I get a bigger diff...
I tried changing many postgresql.conf parameters that were added (max_workers_per_gather,enable_partitionwise_join and so on..).
I dont understand why in pg12 it scans all the partitions instead of the relevant one..
I added all the commands to recreate the test, please feel free to share any useful notes.
I realized that the planner goes to the right partition because "(never executed)" is mentioned near the scan of the other partitions. However, still i'm not sure why performance is better in pg96.
בתאריך יום א׳, 8 במרץ 2020 ב-18:05 מאת Mariel Cherkassky <mariel.cherkassky@gmail.com>:
Hey,I upgraded from 96 to 12 in our test env and I'm seeing that for queries that involve join operation between a partition table and other tables there is degradation is performance compared to pg96 performance.My machine : 8cpu,16gb,regular hd,linux redhat 6pg settings :max_wal_size = 2GB
min_wal_size = 1GB
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
log_checkpoints = on
log_lock_waits = on
log_temp_files = 1024
log_min_duration_statement = 10000
log_autovacuum_min_duration = 10000
standard_conforming_strings = off
max_locks_per_transaction = 5000
max_connections = 500
log_line_prefix = '%t %d %p '
random_page_cost = 4
deadlock_timeout = 5s
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 32764
log_directory = 'pg_log'
enable_partitionwise_join = on # for pg12
enable_partitionwise_aggregate = on # for pg12
listen_addresses = '*'
ssl = on
maintenance_work_mem = 333MB
work_mem = 16MB
shared_buffers = 4020MB
effective_cache_size = 8040MBpostgresql12.2I used this table as the joined table for both cases :create table iot_device(id serial primary key,name text);insert into iot_device(name) select generate_series(1,100)||'a';In pg96 I created the following regular table :create table iot_data(id serial primary key,data text,metadata bigint,device bigint references iot_device(id));inserted the data :insert into iot_data select generate_series(1,10000000),random()*10,random()*254,random()*99+1;In pg12 I created a table with 3 hash partitiones :create table iot_data(id serial ,data text,metadata bigint,device bigint references iot_device(id),primary key(id,device)) partition by hash(device);
create table iot_data_0 partition of iot_data for values with (MODULUS 3, remainder 0);
create table iot_data_1 partition of iot_data for values with (MODULUS 3, remainder 1);
create table iot_data_2 partition of iot_data for values with (MODULUS 3, remainder 2);I generated a dump of the data in the pg96 machine and inserted it into the pg12 db :pg_dump -d postgres -U postgres -a -t iot_data > iot_data.dumppsql -d postgres -U postgres -f -h pg12_machine /tmp/iot_data.dumppostgres=# select count(*) from iot_data_0;
count
---------
3028682
(1 row)
postgres =# select count(*) from iot_data_1;
count
---------
3234335
(1 row)
postgres =# select count(*) from iot_data_2;
count
---------
3736983
(1 row)create index on iot_data(metadata,lower(data));vacuum analyze iot_data;and now for the performance:query : explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';PG12 :QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=5.16..773.61 rows=2 width=43) (actual time=2.858..2.858 rows=0 loops=1)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.014..0.020 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Append (cost=5.16..771.30 rows=6 width=36) (actual time=2.835..2.835 rows=0 loops=1)
-> Bitmap Heap Scan on iot_data_0 da (cost=5.16..233.78 rows=2 width=36) (actual time=2.829..2.829 rows=0 loops=1)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_0_metadata_lower_idx (cost=0.00..5.14 rows=59 width=0) (actual time=2.827..2.827 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
-> Bitmap Heap Scan on iot_data_1 da_1 (cost=5.20..249.32 rows=2 width=37) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_1_metadata_lower_idx (cost=0.00..5.18 rows=63 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
-> Bitmap Heap Scan on iot_data_2 da_2 (cost=5.30..288.16 rows=2 width=36) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_2_metadata_lower_idx (cost=0.00..5.29 rows=73 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Planning Time: 8.157 ms
Execution Time: 2.920 ms
(22 rows)PG96 :QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6.57..397.19 rows=2 width=44) (actual time=0.121..0.121 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.016..0.022 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Bitmap Heap Scan on iot_data da (cost=6.57..392.49 rows=196 width=37) (actual time=0.097..0.097 rows=0 loops=1)
Recheck Cond: (metadata = 50)
Filter: (lower(data) ~~ '50'::text)
-> Bitmap Index Scan on iot_data_metadata_lower_idx (cost=0.00..6.52 rows=196 width=0) (actual time=0.095..0.095 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Planning time: 0.815 ms
Execution time: 0.158 ms
(12 rows)As you can see, the results are better in pg96. This example only shows the results for a small data set. In bigger data sets I get a bigger diff...I tried changing many postgresql.conf parameters that were added (max_workers_per_gather,enable_partitionwise_join and so on..).I dont understand why in pg12 it scans all the partitions instead of the relevant one..I added all the commands to recreate the test, please feel free to share any useful notes.
On Mon, 9 Mar 2020 at 05:05, Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote: > PG12 : > Planning Time: 8.157 ms > Execution Time: 2.920 ms > (22 rows) > > > PG96 : > Planning time: 0.815 ms > Execution time: 0.158 ms > (12 rows) 8 ms seems pretty slow to planning that query. Does the planning time drop if you execute this multiple times in the same session? Does the time change if you try again without any foreign keys? The planning time for the partitioned case is certainly going to take longer. Partitioned tables are never going to improve the times of query planning. It's only possible that they'll improve the performance during query execution. For such small fast queries as the ones you've shown, it's important to remember that more complex query plans (ones with more nodes) do lead to longer times for executor startup and shutdown. EXPLAIN (without ANALYZE), will perform query planning and executor startup/shutdown. If you enable \timing on in psql and test the EXPLAIN performance of these queries in each version, then you might get an idea of where the overheads are. Additionally, you're unlikely to see performance improvements with table partitioning unless you're accessing many rows and partitioning allows the data locality of the rows that you are accessing to improve. i.e accesses fewer buffers and/or improves cache hit ratios. In PG12, if the partition pruning can be done during query planning then the planning and executor startup overhead is much lower since there are fewer relations to generate access paths for and fewer nodes in the final plan. This also improves the situation during execution as it means fewer locks to take and fewer nodes to startup/shutdown. > As you can see, the results are better in pg96. This example only shows the results for a small data set. In bigger datasets I get a bigger diff... Can you share the results of that? > I tried changing many postgresql.conf parameters that were added (max_workers_per_gather,enable_partitionwise_join andso on..). The former only does anything for parallel queries. None of the plans you've shown are parallel ones. The latter also does not count in this case. It only counts when joining two identically partitioned tables. > I dont understand why in pg12 it scans all the partitions instead of the relevant one.. If you'd specified a specific "device" in the query SQL, then the query planner would know which partition to scan for that particular device. However, since you're looking up the device in another table and performing a join, the device is only known during query execution. The plan nodes for the non-matching partitions do go through executor startup, but they're not scanned during execution, as you've seen with the "(never executed)" appearing in the EXPLAIN ANALYZE output. Since executor startup occurs before execution, the device you mean is still unknown during executor startup, so the executor must startup the nodes for all partitions that are in the plan. Starting up a plan node is not free, but not really very expensive either. However, the overhead of it might be quite large proportionally in your case since the executor is doing so little work. The most basic guidelines for table partitioning are, don't partition your tables unless it's a net win. If partitioning was always faster, we'd just have designed Postgres to implicitly partition all of your tables for you. There are some other guidelines in [1]. [1] https://www.postgresql.org/docs/12/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES David
On Sun, Mar 08, 2020 at 06:05:26PM +0200, Mariel Cherkassky wrote: > In pg12 I created a table with 3 hash partitiones : > create table iot_data(id serial ,data text,metadata bigint,device bigint > references iot_device(id),primary key(id,device)) partition by hash(device); > and now for the performance: > query : explain analyze select * from iot_data da,iot_device de where > de.name in ('50a') and de.id = da.device and da.metadata=50 and > lower(da.data) like '50'; > I dont understand why in pg12 it scans all the partitions instead of the > relevant one.. As you noticed, it doesn't actually scan them. I believe v11 "partition elimination during query execution" is coming into play here. There's no option to disable that, but as a quick test, you could possibly try under PG10 (note, that doesn't support inherited indexes). Or you could try to profile under PG12 (and consider comparing with pg13dev). You partitioned on hash(iot_data.device), but your query doesn't specify device, except that da.device=de.id AND de.name IN ('50'). If that's a typical query, maybe it'd work better to partition on metadata or lower(name) (or possibly both). On Sun, Mar 08, 2020 at 06:05:26PM +0200, Mariel Cherkassky wrote: > PG12 : > Nested Loop (cost=5.16..773.61 rows=2 width=43) (actual time=2.858..2.858 > rows=0 loops=1) ... > -> Bitmap Heap Scan on iot_data_1 da_1 (cost=5.20..249.32 rows=2 > width=37) (NEVER EXECUTED) ... > Planning Time: 8.157 ms > Execution Time: 2.920 ms > PG96 : > Nested Loop (cost=6.57..397.19 rows=2 width=44) (actual time=0.121..0.121 > rows=0 loops=1) ... > Planning time: 0.815 ms > Execution time: 0.158 ms -- Justin
8 ms seems pretty slow to planning that query. Does the planning time
drop if you execute this multiple times in the same session? Does the
time change if you try again without any foreign keys?
No one is using the system besides me, therefore after running the query one time
most of the data is in the cache... If I run it multiple times the query time is reduced :
Planning Time: 0.361 ms
Execution Time: 0.110 ms
Execution Time: 0.110 ms
Can you share the results of that?
Sure. I did the same procedure but this time I inserted 100m records instead of 10m. This time the results were by far worse in pg12 :
PG12 :
postgres=# explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1002.26..1563512.35 rows=10 width=44) (actual time=95161.056..95218.764 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=2.26..1562511.35 rows=4 width=44) (actual time=94765.860..94765.861 rows=0 loops=3)
Hash Cond: (da_2.device = de.id)
-> Parallel Append (cost=0.00..1562506.90 rows=814 width=37) (actual time=94765.120..94765.120 rows=0 loops=3)
-> Parallel Seq Scan on iot_data_2 da_2 (cost=0.00..584064.14 rows=305 width=37) (actual time=36638.829..36638.829 rows=0 loops=3)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 12460009
-> Parallel Seq Scan on iot_data_1 da_1 (cost=0.00..504948.69 rows=262 width=36) (actual time=43990.427..43990.427 rows=0 loops=2)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 16158316
-> Parallel Seq Scan on iot_data_0 da (cost=0.00..473490.00 rows=247 width=37) (actual time=86396.665..86396.665 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 30303339
-> Hash (cost=2.25..2.25 rows=1 width=7) (never executed)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (never executed)
Filter: (name = '50a'::text)
Planning Time: 45.724 ms
Execution Time: 95252.712 ms
(20 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1002.26..1563512.35 rows=10 width=44) (actual time=95161.056..95218.764 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=2.26..1562511.35 rows=4 width=44) (actual time=94765.860..94765.861 rows=0 loops=3)
Hash Cond: (da_2.device = de.id)
-> Parallel Append (cost=0.00..1562506.90 rows=814 width=37) (actual time=94765.120..94765.120 rows=0 loops=3)
-> Parallel Seq Scan on iot_data_2 da_2 (cost=0.00..584064.14 rows=305 width=37) (actual time=36638.829..36638.829 rows=0 loops=3)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 12460009
-> Parallel Seq Scan on iot_data_1 da_1 (cost=0.00..504948.69 rows=262 width=36) (actual time=43990.427..43990.427 rows=0 loops=2)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 16158316
-> Parallel Seq Scan on iot_data_0 da (cost=0.00..473490.00 rows=247 width=37) (actual time=86396.665..86396.665 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 30303339
-> Hash (cost=2.25..2.25 rows=1 width=7) (never executed)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (never executed)
Filter: (name = '50a'::text)
Planning Time: 45.724 ms
Execution Time: 95252.712 ms
(20 rows)
PG96 :
postgres=# explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2583361.51 rows=20 width=44) (actual time=18345.229..18345.229 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.022..0.037 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Seq Scan on iot_data da (cost=0.00..2583334.84 rows=1954 width=37) (actual time=18345.187..18345.187 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 100000000
Planning time: 35.450 ms
Execution time: 18345.301 ms
(10 rows)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2583361.51 rows=20 width=44) (actual time=18345.229..18345.229 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.022..0.037 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Seq Scan on iot_data da (cost=0.00..2583334.84 rows=1954 width=37) (actual time=18345.187..18345.187 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 100000000
Planning time: 35.450 ms
Execution time: 18345.301 ms
(10 rows)
The most basic guidelines for table partitioning are, don't partition
your tables unless it's a net win. If partitioning was always
faster, we'd just have designed Postgres to implicitly partition all
of your tables for you. There are some other guidelines in [1].
Isnt creating partition should increase the execution time ? I mean, instead of running on a table with 10m records, I can run over a partition with 3m records. isnt less data means better performance for simple queries like the one I used ?
I read the best practice for the docs, and I think that I met most of them - I chose the right partition key(in this case it was device),
Regarding the amount of partitions - I choose 3 just to test the results. I didnt create a lot of partitions, and my logic tells me that querying a table with 3m records should be faster than 10m records.. Am I missing something ?
I tried to do even something simpler, run the query with only the partition column in the where clause and the results werent good for pg12 :
PG12 :
postgres=# explain analyze select * from iot_data where device=51;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..514086.40 rows=1027284 width=37) (actual time=6.777..61558.272 rows=1010315 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on iot_data_0 (cost=0.00..410358.00 rows=428035 width=37) (actual time=1.152..61414.483 rows=336772 loops=3)
Filter: (device = 51)
Rows Removed by Filter: 9764341
Planning Time: 15.720 ms
Execution Time: 61617.851 ms
(8 rows)
PG9.6
postgres=# explain analyze select * from iot_data where device=51;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Seq Scan on iot_data (cost=0.00..2083334.60 rows=976667 width=37) (actual time=21.922..16753.492 rows=1010315 loops=1)
Filter: (device = 51)
Rows Removed by Filter: 98989685
Planning time: 0.119 ms
Execution time: 16810.787 ms
(5 rows)
Besides hardware, anything else worth checking ? the machine are identical in aspect of resources.
On Mon, Mar 09, 2020 at 12:31:15PM +0200, Mariel Cherkassky wrote: > > I tried to do even something simpler, run the query with only the > > partition column in the where clause and the results werent good for pg12 : > > PG12 : > postgres=# explain analyze select * from iot_data where device=51; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------- > Gather (cost=1000.00..514086.40 rows=1027284 width=37) (actual time=6.777..61558.272 rows=1010315 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on iot_data_0 (cost=0.00..410358.00 rows=428035 width=37) (actual time=1.152..61414.483 rows=336772loops=3) For whatever reason, your storage/OS seem to be handling parallel reads poorly. I would SET max_parallel_workers_per_gather=0 and retest (but also look into improving the storage). Also, it's not required, but I think a typical partitioning schema would have an index on the column being partitioned. I see you have an index on iot_data(metadata,lower(data)), so I still wonder whether you'd have better results partitioned on metadata, or otherwise maybe adding an index on "device". But I don't know what your typical queries are. > PG9.6 > postgres=# explain analyze select * from iot_data where device=51; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Seq Scan on iot_data (cost=0.00..2083334.60 rows=976667 width=37) (actual time=21.922..16753.492 rows=1010315 loops=1) -- Justin
OK so I found the problem but other problem appeared.
I found out that the pg12 machine had some limits on the vm settings in aspect of cpu and memory. Now both machines are exactly the same in aspect of all hardware and dont have any limit.
CPU - 8
RAM - 32GB.
I tested it with cold cache :
service postgresql stop;
echo 1 > /proc/sys/vm/drop_caches;
service postgresql start;
psql -d postgres -U postgres;
echo 1 > /proc/sys/vm/drop_caches;
service postgresql start;
psql -d postgres -U postgres;
I used two simples queries, one that implicitly comparing the partition key with a const value and another one that joins other table by the partition column(and in this query the problem).
The first query : results are better with pg12 :
explain analyze select * from iot_data where device=51;
PG96
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on iot_data (cost=0.00..2083334.60 rows=976667 width=37) (actual time=1.560..67144.164 rows=1010315 loops=1)
Filter: (device = 51)
Rows Removed by Filter: 98989685
Planning time: 9.219 ms
Execution time: 67,228.431 ms
(5 rows)
PG12 - 3 PARTITIONS
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..514086.40 rows=1027284 width=37) (actual time=3.871..15022.118 rows=1010315 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on iot_data_0 (cost=0.00..410358.00 rows=428035 width=37) (actual time=1.670..14815.480 rows=336772 loops=3)
Filter: (device = 51)
Rows Removed by Filter: 9764341
Planning Time: 27.292 ms
Execution Time: 15085.317 ms
(8 rows)
PG96
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on iot_data (cost=0.00..2083334.60 rows=976667 width=37) (actual time=1.560..67144.164 rows=1010315 loops=1)
Filter: (device = 51)
Rows Removed by Filter: 98989685
Planning time: 9.219 ms
Execution time: 67,228.431 ms
(5 rows)
PG12 - 3 PARTITIONS
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..514086.40 rows=1027284 width=37) (actual time=3.871..15022.118 rows=1010315 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on iot_data_0 (cost=0.00..410358.00 rows=428035 width=37) (actual time=1.670..14815.480 rows=336772 loops=3)
Filter: (device = 51)
Rows Removed by Filter: 9764341
Planning Time: 27.292 ms
Execution Time: 15085.317 ms
(8 rows)
The second query with pg12 :
QUERY : explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';
PG96
postgres=# explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2583361.51 rows=20 width=44) (actual time=44894.312..44894.312 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.018..0.028 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Seq Scan on iot_data da (cost=0.00..2583334.84 rows=1954 width=37) (actual time=44894.279..44894.279 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 100000000
Planning time: 11.313 ms
Execution time: 44894.357 ms
(10 rows)
PG12 - 3 PARTITIONS
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1002.26..1563512.35 rows=10 width=44) (actual time=22306.091..22309.209 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=2.26..1562511.35 rows=4 width=44) (actual time=22299.412..22299.413 rows=0 loops=3)
Hash Cond: (da_2.device = de.id)
-> Parallel Append (cost=0.00..1562506.90 rows=814 width=37) (actual time=22299.411..22299.411 rows=0 loops=3)
-> Parallel Seq Scan on iot_data_2 da_2 (cost=0.00..584064.14 rows=305 width=37) (actual time=9076.535..9076.535 rows=0 loops=3)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 12460009
-> Parallel Seq Scan on iot_data_1 da_1 (cost=0.00..504948.69 rows=262 width=36) (actual time=10296.751..10296.751 rows=0 loops=2)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 16158316
-> Parallel Seq Scan on iot_data_0 da (cost=0.00..473490.00 rows=247 width=37) (actual time=19075.081..19075.081 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 30303339
-> Hash (cost=2.25..2.25 rows=1 width=7) (never executed)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (never executed)
Filter: (name = '50a'::text)
Planning Time: 30.429 ms
Execution Time: 22309.364 ms
(20 rows)
PG96
postgres=# explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2583361.51 rows=20 width=44) (actual time=44894.312..44894.312 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual time=0.018..0.028 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Seq Scan on iot_data da (cost=0.00..2583334.84 rows=1954 width=37) (actual time=44894.279..44894.279 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 100000000
Planning time: 11.313 ms
Execution time: 44894.357 ms
(10 rows)
PG12 - 3 PARTITIONS
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1002.26..1563512.35 rows=10 width=44) (actual time=22306.091..22309.209 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=2.26..1562511.35 rows=4 width=44) (actual time=22299.412..22299.413 rows=0 loops=3)
Hash Cond: (da_2.device = de.id)
-> Parallel Append (cost=0.00..1562506.90 rows=814 width=37) (actual time=22299.411..22299.411 rows=0 loops=3)
-> Parallel Seq Scan on iot_data_2 da_2 (cost=0.00..584064.14 rows=305 width=37) (actual time=9076.535..9076.535 rows=0 loops=3)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 12460009
-> Parallel Seq Scan on iot_data_1 da_1 (cost=0.00..504948.69 rows=262 width=36) (actual time=10296.751..10296.751 rows=0 loops=2)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 16158316
-> Parallel Seq Scan on iot_data_0 da (cost=0.00..473490.00 rows=247 width=37) (actual time=19075.081..19075.081 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 30303339
-> Hash (cost=2.25..2.25 rows=1 width=7) (never executed)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (never executed)
Filter: (name = '50a'::text)
Planning Time: 30.429 ms
Execution Time: 22309.364 ms
(20 rows)
I tried disabling max_parallel_workers_gathers but It just decreased the db`s performance.
Now regarding the main issue here - as u can see when I used the second query, I didnt mentioned the partition column specificly but I joined another table based on it( where de.name in ('50a') and de.id = da.device)
This condition should be enough for the optimizer to understand that it needs to scan a specific partition but it scans all the partitions. The "never executed" tag isnt added to the partitions scans but it is added to the joined table.
Justin - Regarding adding index on the parittion column - I dont understand why ? the value in that column is the same for all rows in the partition, when exactly the index will be used ?
Also, it's not required, but I think a typical partitioning schema would havean index on the column being partitioned. I see you have an index oniot_data(metadata,lower(data)), so I still wonder whether you'd have betterresults partitioned on metadata, or otherwise maybe adding an index on"device". But I don't know what your typical queries are.
I understood now why u suggested an index on the partition column. It depends on how many distinct values of the partition column I'll have in that partition.
Thanks for the suggestion , good idea !
On Tue, 10 Mar 2020 at 02:08, Mariel Cherkassky <mariel.cherkassky@gmail.com> wrote: > PG12 - 3 PARTITIONS > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Gather (cost=1002.26..1563512.35 rows=10 width=44) (actual time=22306.091..22309.209 rows=0 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Hash Join (cost=2.26..1562511.35 rows=4 width=44) (actual time=22299.412..22299.413 rows=0 loops=3) > Hash Cond: (da_2.device = de.id) > -> Parallel Append (cost=0.00..1562506.90 rows=814 width=37) (actual time=22299.411..22299.411 rows=0 loops=3) > -> Parallel Seq Scan on iot_data_2 da_2 (cost=0.00..584064.14 rows=305 width=37) (actual time=9076.535..9076.535rows=0 loops=3) > Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text)) > Rows Removed by Filter: 12460009 > -> Parallel Seq Scan on iot_data_1 da_1 (cost=0.00..504948.69 rows=262 width=36) (actual time=10296.751..10296.751rows=0 loops=2) > Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text)) > Rows Removed by Filter: 16158316 > -> Parallel Seq Scan on iot_data_0 da (cost=0.00..473490.00 rows=247 width=37) (actual time=19075.081..19075.081rows=0 loops=1) > Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text)) > Rows Removed by Filter: 30303339 > -> Hash (cost=2.25..2.25 rows=1 width=7) (never executed) > -> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (never executed) > Filter: (name = '50a'::text) > Planning Time: 30.429 ms > Execution Time: 22309.364 ms > (20 rows) From what I can work out, the DDL you used here is: -- you didn't seem to specify the DDL for iot_device, so I used: create table iot_device ( id bigint primary key, name text not null ); insert into iot_device select x,x::Text || 'a' from generate_Series(1,100) x; create table iot_data(id serial ,data text,metadata bigint,device bigint references iot_device(id),primary key(id,device)) partition by hash(device); create table iot_data_0 partition of iot_data for values with (MODULUS 3, remainder 0); create table iot_data_1 partition of iot_data for values with (MODULUS 3, remainder 1); create table iot_data_2 partition of iot_data for values with (MODULUS 3, remainder 2); insert into iot_data select generate_series(1,10000000),random()*10,random()*254,random()*99+1; create index on iot_data(metadata,lower(data)); vacuum analyze iot_data; In which case, you're getting a pretty different plan than I am. (I admit that I only tried on current master and not PG12.2, however, I see no reason that PG12.2 shouldn't produce the same plan) I get: # explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.56..28.04 rows=1 width=49) (actual time=0.058..0.058 rows=0 loops=1) Join Filter: (da.device = de.id) -> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=11) (actual time=0.013..0.016 rows=1 loops=1) Filter: (name = '50a'::text) Rows Removed by Filter: 99 -> Append (cost=0.56..25.76 rows=3 width=38) (actual time=0.040..0.040 rows=0 loops=1) -> Index Scan using iot_data_0_metadata_lower_idx on iot_data_0 da_1 (cost=0.56..8.58 rows=1 width=38) (actual time=0.020..0.020 rows=0 loops=1) Index Cond: ((metadata = 50) AND (lower(data) = '50'::text)) Filter: (lower(data) ~~ '50'::text) -> Index Scan using iot_data_1_metadata_lower_idx on iot_data_1 da_2 (cost=0.56..8.58 rows=1 width=38) (actual time=0.010..0.010 rows=0 loops=1) Index Cond: ((metadata = 50) AND (lower(data) = '50'::text)) Filter: (lower(data) ~~ '50'::text) -> Index Scan using iot_data_2_metadata_lower_idx on iot_data_2 da_3 (cost=0.56..8.58 rows=1 width=38) (actual time=0.009..0.009 rows=0 loops=1) Index Cond: ((metadata = 50) AND (lower(data) = '50'::text)) Filter: (lower(data) ~~ '50'::text) Planning Time: 0.280 ms Execution Time: 0.094 ms (17 rows) Are you certain that you added an index on iot_data (metadata, lower(data)); ?