Re: pg12 partitions show bad performance vs pg96 - Mailing list pgsql-performance

From Mariel Cherkassky
Subject Re: pg12 partitions show bad performance vs pg96
Date
Msg-id CA+t6e1mx_KiR735zbPFmkG-HBmLOOGRX3-ZMWNok7F-ieTdB3A@mail.gmail.com
Whole thread Raw
In response to Re: pg12 partitions show bad performance vs pg96  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: pg12 partitions show bad performance vs pg96  (Mariel Cherkassky <mariel.cherkassky@gmail.com>)
List pgsql-performance

8 ms seems pretty slow to planning that query. Does the planning time
drop if you execute this multiple times in the same session? Does the
time change if you try again without any foreign keys? 

No one is using the system besides me, therefore after running the query one time
most of the data is in the cache... If I run it multiple times the query time is reduced : 
 Planning Time: 0.361 ms
 Execution Time: 0.110 ms

Can you share the results of that?
Sure. I did the same procedure but this time I inserted 100m records instead of 10m. This time the results were by far worse in pg12 : 

PG12 : 
postgres=# explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';

                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1002.26..1563512.35 rows=10 width=44) (actual time=95161.056..95218.764 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Hash Join  (cost=2.26..1562511.35 rows=4 width=44) (actual time=94765.860..94765.861 rows=0 loops=3)
         Hash Cond: (da_2.device = de.id)
         ->  Parallel Append  (cost=0.00..1562506.90 rows=814 width=37) (actual time=94765.120..94765.120 rows=0 loops=3)
               ->  Parallel Seq Scan on iot_data_2 da_2  (cost=0.00..584064.14 rows=305 width=37) (actual time=36638.829..36638.829 rows=0 loops=3)
                     Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
                     Rows Removed by Filter: 12460009
               ->  Parallel Seq Scan on iot_data_1 da_1  (cost=0.00..504948.69 rows=262 width=36) (actual time=43990.427..43990.427 rows=0 loops=2)
                     Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
                     Rows Removed by Filter: 16158316
               ->  Parallel Seq Scan on iot_data_0 da  (cost=0.00..473490.00 rows=247 width=37) (actual time=86396.665..86396.665 rows=0 loops=1)
                     Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
                     Rows Removed by Filter: 30303339
         ->  Hash  (cost=2.25..2.25 rows=1 width=7) (never executed)
               ->  Seq Scan on iot_device de  (cost=0.00..2.25 rows=1 width=7) (never executed)
                     Filter: (name = '50a'::text)
 Planning Time: 45.724 ms
 Execution Time: 95252.712 ms
(20 rows)

PG96 : 
postgres=# explain analyze select * from iot_data da,iot_device de where de.name in ('50a') and de.id = da.device and da.metadata=50 and lower(da.data) like '50';
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2583361.51 rows=20 width=44) (actual time=18345.229..18345.229 rows=0 loops=1)
   Join Filter: (da.device = de.id)
   ->  Seq Scan on iot_device de  (cost=0.00..2.25 rows=1 width=7) (actual time=0.022..0.037 rows=1 loops=1)
         Filter: (name = '50a'::text)
         Rows Removed by Filter: 99
   ->  Seq Scan on iot_data da  (cost=0.00..2583334.84 rows=1954 width=37) (actual time=18345.187..18345.187 rows=0 loops=1)
         Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
         Rows Removed by Filter: 100000000
 Planning time: 35.450 ms
 Execution time: 18345.301 ms
(10 rows)


The most basic guidelines for table partitioning are, don't partition
your tables unless it's a net win.   If partitioning was always
faster, we'd just have designed Postgres to implicitly partition all
of your tables for you. There are some other guidelines in [1].


Isnt creating partition should increase the execution time ? I mean, instead of running on a table with 10m records, I can run over a partition with 3m records. isnt less data means better performance for simple queries like the one I used ?
I read the best practice for the docs, and I think that I met most of them - I chose the right partition key(in this case it was device),
Regarding the amount of partitions - I choose 3 just to test the results. I didnt create a lot of partitions, and my logic tells me that querying a table with 3m records should be faster than 10m records.. Am I missing something ?



pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pg12 partitions show bad performance vs pg96
Next
From: Mariel Cherkassky
Date:
Subject: Re: pg12 partitions show bad performance vs pg96