Re: pg12 partitions show bad performance vs pg96 - Mailing list pgsql-performance

From David Rowley
Subject Re: pg12 partitions show bad performance vs pg96
Date
Msg-id CAApHDvpCenS3t_dV99P==XOvO6EJi9575hdBz43JqwdCWuz8Mw@mail.gmail.com
Whole thread Raw
In response to Re: pg12 partitions show bad performance vs pg96  (Mariel Cherkassky <mariel.cherkassky@gmail.com>)
List pgsql-performance
On Tue, 10 Mar 2020 at 02:08, Mariel Cherkassky
<mariel.cherkassky@gmail.com> wrote:

> PG12 - 3 PARTITIONS
>
>                                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Gather  (cost=1002.26..1563512.35 rows=10 width=44) (actual time=22306.091..22309.209 rows=0 loops=1)
>    Workers Planned: 2
>    Workers Launched: 2
>    ->  Hash Join  (cost=2.26..1562511.35 rows=4 width=44) (actual time=22299.412..22299.413 rows=0 loops=3)
>          Hash Cond: (da_2.device = de.id)
>          ->  Parallel Append  (cost=0.00..1562506.90 rows=814 width=37) (actual time=22299.411..22299.411 rows=0
loops=3)
>                ->  Parallel Seq Scan on iot_data_2 da_2  (cost=0.00..584064.14 rows=305 width=37) (actual
time=9076.535..9076.535rows=0 loops=3)
 
>                      Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
>                      Rows Removed by Filter: 12460009
>                ->  Parallel Seq Scan on iot_data_1 da_1  (cost=0.00..504948.69 rows=262 width=36) (actual
time=10296.751..10296.751rows=0 loops=2)
 
>                      Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
>                      Rows Removed by Filter: 16158316
>                ->  Parallel Seq Scan on iot_data_0 da  (cost=0.00..473490.00 rows=247 width=37) (actual
time=19075.081..19075.081rows=0 loops=1)
 
>                      Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
>                      Rows Removed by Filter: 30303339
>          ->  Hash  (cost=2.25..2.25 rows=1 width=7) (never executed)
>                ->  Seq Scan on iot_device de  (cost=0.00..2.25 rows=1 width=7) (never executed)
>                      Filter: (name = '50a'::text)
>  Planning Time: 30.429 ms
>  Execution Time: 22309.364 ms
> (20 rows)

From what I can work out, the DDL you used here is:

-- you didn't seem to specify the DDL for iot_device, so I used:
create table iot_device (
id bigint primary key,
name text not null
);

insert into iot_device select x,x::Text || 'a' from generate_Series(1,100) x;

create table iot_data(id serial ,data text,metadata bigint,device
bigint references iot_device(id),primary key(id,device)) partition by
hash(device);
create table iot_data_0 partition of iot_data for values with (MODULUS
3, remainder 0);
create table iot_data_1 partition of iot_data for values with (MODULUS
3, remainder 1);
create table iot_data_2 partition of iot_data for values with (MODULUS
3, remainder 2);

insert into iot_data select
generate_series(1,10000000),random()*10,random()*254,random()*99+1;
create index on iot_data(metadata,lower(data));
vacuum analyze iot_data;

In which case, you're getting a pretty different plan than I am. (I
admit that I only tried on current master and not PG12.2, however, I
see no reason that PG12.2 shouldn't produce the same plan)

I get:

# explain analyze select * from iot_data da,iot_device de where
de.name in ('50a') and de.id = da.device and da.metadata=50 and
lower(da.data) like '50';

  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.56..28.04 rows=1 width=49) (actual
time=0.058..0.058 rows=0 loops=1)
   Join Filter: (da.device = de.id)
   ->  Seq Scan on iot_device de  (cost=0.00..2.25 rows=1 width=11)
(actual time=0.013..0.016 rows=1 loops=1)
         Filter: (name = '50a'::text)
         Rows Removed by Filter: 99
   ->  Append  (cost=0.56..25.76 rows=3 width=38) (actual
time=0.040..0.040 rows=0 loops=1)
         ->  Index Scan using iot_data_0_metadata_lower_idx on
iot_data_0 da_1  (cost=0.56..8.58 rows=1 width=38) (actual
time=0.020..0.020 rows=0 loops=1)
               Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
               Filter: (lower(data) ~~ '50'::text)
         ->  Index Scan using iot_data_1_metadata_lower_idx on
iot_data_1 da_2  (cost=0.56..8.58 rows=1 width=38) (actual
time=0.010..0.010 rows=0 loops=1)
               Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
               Filter: (lower(data) ~~ '50'::text)
         ->  Index Scan using iot_data_2_metadata_lower_idx on
iot_data_2 da_3  (cost=0.56..8.58 rows=1 width=38) (actual
time=0.009..0.009 rows=0 loops=1)
               Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
               Filter: (lower(data) ~~ '50'::text)
 Planning Time: 0.280 ms
 Execution Time: 0.094 ms
(17 rows)

Are you certain that you added an index on iot_data (metadata, lower(data)); ?



pgsql-performance by date:

Previous
From: Mariel Cherkassky
Date:
Subject: Re: pg12 partitions show bad performance vs pg96
Next
From: Richard Michael
Date:
Subject: Slow ext'd query via client native implementation vs. libpq & simple psql