Re: Increasing work_mem slows down query, why? - Mailing list pgsql-performance

From Pavel Stehule
Subject Re: Increasing work_mem slows down query, why?
Date
Msg-id CAFj8pRADkvF1BwpX7Dv3cYOkcEFAHNNcRX1rM3eqbkGe7rbTbg@mail.gmail.com
Whole thread Raw
In response to Re: Increasing work_mem slows down query, why?  (Silvio Moioli <moio@suse.de>)
Responses Re: Increasing work_mem slows down query, why?  (Silvio Moioli <moio@suse.de>)
Re: Increasing work_mem slows down query, why?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance


po 30. 3. 2020 v 10:12 odesílatel Silvio Moioli <moio@suse.de> napsal:
On 3/30/20 8:56 AM, Pavel Stehule wrote:
> please, can you send explain in text form?

Sure. With work_mem = 80MB:

                                                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=608228.26..608228.27 rows=2 width=36) (actual time=13360.241..13360.454 rows=1100 loops=1)
   Output: wanted_capability.ordering, inserted_capability.id
   Buffers: shared hit=14448
   CTE wanted_capability
     ->  Values Scan on "*VALUES*"  (cost=0.00..552.75 rows=1100 width=68) (actual time=0.001..0.246 rows=1100 loops=1)
           Output: "*VALUES*".column1, "*VALUES*".column2, "*VALUES*".column3
   CTE missing_capability
     ->  Merge Left Join  (cost=300263.57..303282.17 rows=1 width=68) (actual time=6686.320..6686.320 rows=0 loops=1)
           Output: wanted_capability_2.ordering, wanted_capability_2.name, wanted_capability_2.version
           Merge Cond: (wanted_capability_2.name = (rhnpackagecapability_1.name)::text)
           Join Filter: (NOT (wanted_capability_2.version IS DISTINCT FROM (rhnpackagecapability_1.version)::text))
           Filter: (rhnpackagecapability_1.id IS NULL)
           Rows Removed by Filter: 1100
           Buffers: shared hit=7222
           ->  Sort  (cost=1155.57..1158.32 rows=1100 width=68) (actual time=10.011..10.053 rows=1100 loops=1)
                 Output: wanted_capability_2.ordering, wanted_capability_2.name, wanted_capability_2.version
                 Sort Key: wanted_capability_2.name
                 Sort Method: quicksort  Memory: 203kB
                 Buffers: shared hit=5
                 ->  CTE Scan on wanted_capability wanted_capability_2  (cost=0.00..1100.00 rows=1100 width=68) (actual time=0.000..0.516 rows=1100 loops=1)
                       Output: wanted_capability_2.ordering, wanted_capability_2.name, wanted_capability_2.version
           ->  Sort  (cost=299108.00..300335.41 rows=490964 width=79) (actual time=6475.147..6494.111 rows=462600 loops=1)
                 Output: rhnpackagecapability_1.name, rhnpackagecapability_1.version, rhnpackagecapability_1.id
                 Sort Key: rhnpackagecapability_1.name
                 Sort Method: quicksort  Memory: 79862kB
                 Buffers: shared hit=7217
                 ->  Seq Scan on public.rhnpackagecapability rhnpackagecapability_1  (cost=0.00..252699.00 rows=490964 width=79) (actual time=0.016..59.976 rows=490964 loops=1)
                       Output: rhnpackagecapability_1.name, rhnpackagecapability_1.version, rhnpackagecapability_1.id
                       Buffers: shared hit=7217
   CTE inserted_capability
     ->  Insert on public.rhnpackagecapability rhnpackagecapability_2  (cost=0.00..1.51 rows=1 width=1080) (actual time=6686.322..6686.322 rows=0 loops=1)
           Output: rhnpackagecapability_2.id, rhnpackagecapability_2.name, rhnpackagecapability_2.version
           Conflict Resolution: NOTHING
           Tuples Inserted: 0
           Conflicting Tuples: 0
           Buffers: shared hit=7222
           ->  Subquery Scan on "*SELECT*"  (cost=0.00..1.51 rows=1 width=1080) (actual time=6686.321..6686.321 rows=0 loops=1)
                 Output: "*SELECT*".nextval, "*SELECT*".name, "*SELECT*".version, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP
                 Buffers: shared hit=7222
                 ->  CTE Scan on missing_capability  (cost=0.00..1.00 rows=1 width=72) (actual time=6686.320..6686.320 rows=0 loops=1)
                       Output: nextval('rhn_pkg_capability_id_seq'::regclass), missing_capability.name, missing_capability.version
                       Buffers: shared hit=7222
   ->  Sort  (cost=304391.82..304391.83 rows=2 width=36) (actual time=13360.240..13360.283 rows=1100 loops=1)
         Output: wanted_capability.ordering, inserted_capability.id
         Sort Key: wanted_capability.ordering, inserted_capability.id
         Sort Method: quicksort  Memory: 100kB
         Buffers: shared hit=14448
         ->  Append  (cost=1.50..304391.81 rows=2 width=36) (actual time=13357.167..13360.051 rows=1100 loops=1)
               Buffers: shared hit=14442
               ->  Hash Join  (cost=1.50..1108.64 rows=1 width=36) (actual time=6686.340..6686.340 rows=0 loops=1)
                     Output: wanted_capability.ordering, inserted_capability.id
                     Hash Cond: (wanted_capability.name = (inserted_capability.name)::text)
                     Join Filter: (NOT (wanted_capability.version IS DISTINCT FROM (inserted_capability.version)::text))
                     Buffers: shared hit=7225
                     ->  CTE Scan on wanted_capability  (cost=0.00..1100.00 rows=1100 width=68) (actual time=0.002..0.002 rows=1 loops=1)
                           Output: wanted_capability.ordering, wanted_capability.name, wanted_capability.version
                     ->  Hash  (cost=1.00..1.00 rows=1 width=1064) (actual time=6686.323..6686.323 rows=0 loops=1)
                           Output: inserted_capability.id, inserted_capability.name, inserted_capability.version
                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                           Buffers: shared hit=7222
                           ->  CTE Scan on inserted_capability  (cost=0.00..1.00 rows=1 width=1064) (actual time=6686.322..6686.322 rows=0 loops=1)
                                 Output: inserted_capability.id, inserted_capability.name, inserted_capability.version
                                 Buffers: shared hit=7222
               ->  Merge Join  (cost=300263.57..303282.17 rows=1 width=10) (actual time=6670.825..6673.642 rows=1100 loops=1)
                     Output: wanted_capability_1.ordering, rhnpackagecapability.id
                     Merge Cond: (wanted_capability_1.name = (rhnpackagecapability.name)::text)
                     Join Filter: (NOT (wanted_capability_1.version IS DISTINCT FROM (rhnpackagecapability.version)::text))
                     Buffers: shared hit=7217
                     ->  Sort  (cost=1155.57..1158.32 rows=1100 width=68) (actual time=9.430..9.474 rows=1100 loops=1)
                           Output: wanted_capability_1.ordering, wanted_capability_1.name, wanted_capability_1.version
                           Sort Key: wanted_capability_1.name
                           Sort Method: quicksort  Memory: 203kB
                           ->  CTE Scan on wanted_capability wanted_capability_1  (cost=0.00..1100.00 rows=1100 width=68) (actual time=0.001..0.066 rows=1100 loops=1)
                                 Output: wanted_capability_1.ordering, wanted_capability_1.name, wanted_capability_1.version
                     ->  Sort  (cost=299108.00..300335.41 rows=490964 width=79) (actual time=6458.988..6477.151 rows=462600 loops=1)
                           Output: rhnpackagecapability.id, rhnpackagecapability.name, rhnpackagecapability.version
                           Sort Key: rhnpackagecapability.name
                           Sort Method: quicksort  Memory: 79862kB
                           Buffers: shared hit=7217
                           ->  Seq Scan on public.rhnpackagecapability  (cost=0.00..252699.00 rows=490964 width=79) (actual time=0.012..50.467 rows=490964 loops=1)
                                 Output: rhnpackagecapability.id, rhnpackagecapability.name, rhnpackagecapability.version
                                 Buffers: shared hit=7217
 Planning time: 2.110 ms
 Execution time: 13362.965 ms



With work_mem = 5MB:

                                                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=648953.89..648953.91 rows=2 width=36) (actual time=221.127..221.337 rows=1100 loops=1)
   Output: wanted_capability.ordering, inserted_capability.id
   Buffers: shared hit=7226 read=7217
   CTE wanted_capability
     ->  Values Scan on "*VALUES*"  (cost=0.00..552.75 rows=1100 width=68) (actual time=0.001..0.266 rows=1100 loops=1)
           Output: "*VALUES*".column1, "*VALUES*".column2, "*VALUES*".column3
   CTE missing_capability
     ->  Hash Right Join  (cost=1652.75..323644.99 rows=1 width=68) (actual time=137.544..137.544 rows=0 loops=1)
           Output: wanted_capability_2.ordering, wanted_capability_2.name, wanted_capability_2.version
           Hash Cond: ((rhnpackagecapability_1.name)::text = wanted_capability_2.name)
           Join Filter: (NOT (wanted_capability_2.version IS DISTINCT FROM (rhnpackagecapability_1.version)::text))
           Filter: (rhnpackagecapability_1.id IS NULL)
           Rows Removed by Filter: 1100
           Buffers: shared read=7217
           ->  Seq Scan on public.rhnpackagecapability rhnpackagecapability_1  (cost=0.00..252699.00 rows=490964 width=79) (actual time=0.016..77.305 rows=490964 loops=1)
                 Output: rhnpackagecapability_1.id, rhnpackagecapability_1.name, rhnpackagecapability_1.version, rhnpackagecapability_1.created, rhnpackagecapability_1.modified
                 Buffers: shared read=7217
           ->  Hash  (cost=1100.00..1100.00 rows=1100 width=68) (actual time=0.812..0.812 rows=1100 loops=1)
                 Output: wanted_capability_2.ordering, wanted_capability_2.name, wanted_capability_2.version
                 Buckets: 2048  Batches: 1  Memory Usage: 134kB
                 ->  CTE Scan on wanted_capability wanted_capability_2  (cost=0.00..1100.00 rows=1100 width=68) (actual time=0.000..0.574 rows=1100 loops=1)
                       Output: wanted_capability_2.ordering, wanted_capability_2.name, wanted_capability_2.version
   CTE inserted_capability
     ->  Insert on public.rhnpackagecapability rhnpackagecapability_2  (cost=0.00..1.51 rows=1 width=1080) (actual time=137.546..137.546 rows=0 loops=1)
           Output: rhnpackagecapability_2.id, rhnpackagecapability_2.name, rhnpackagecapability_2.version
           Conflict Resolution: NOTHING
           Tuples Inserted: 0
           Conflicting Tuples: 0
           Buffers: shared read=7217
           ->  Subquery Scan on "*SELECT*"  (cost=0.00..1.51 rows=1 width=1080) (actual time=137.545..137.545 rows=0 loops=1)
                 Output: "*SELECT*".nextval, "*SELECT*".name, "*SELECT*".version, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP
                 Buffers: shared read=7217
                 ->  CTE Scan on missing_capability  (cost=0.00..1.00 rows=1 width=72) (actual time=137.544..137.545 rows=0 loops=1)
                       Output: nextval('rhn_pkg_capability_id_seq'::regclass), missing_capability.name, missing_capability.version
                       Buffers: shared read=7217
   ->  Sort  (cost=324754.64..324754.65 rows=2 width=36) (actual time=221.126..221.165 rows=1100 loops=1)
         Output: wanted_capability.ordering, inserted_capability.id
         Sort Key: wanted_capability.ordering, inserted_capability.id
         Sort Method: quicksort  Memory: 100kB
         Buffers: shared hit=7226 read=7217
         ->  Append  (cost=1.50..324754.63 rows=2 width=36) (actual time=169.421..220.870 rows=1100 loops=1)
               Buffers: shared hit=7220 read=7217
               ->  Hash Join  (cost=1.50..1108.64 rows=1 width=36) (actual time=137.573..137.573 rows=0 loops=1)
                     Output: wanted_capability.ordering, inserted_capability.id
                     Hash Cond: (wanted_capability.name = (inserted_capability.name)::text)
                     Join Filter: (NOT (wanted_capability.version IS DISTINCT FROM (inserted_capability.version)::text))
                     Buffers: shared hit=3 read=7217
                     ->  CTE Scan on wanted_capability  (cost=0.00..1100.00 rows=1100 width=68) (actual time=0.002..0.002 rows=1 loops=1)
                           Output: wanted_capability.ordering, wanted_capability.name, wanted_capability.version
                     ->  Hash  (cost=1.00..1.00 rows=1 width=1064) (actual time=137.547..137.547 rows=0 loops=1)
                           Output: inserted_capability.id, inserted_capability.name, inserted_capability.version
                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                           Buffers: shared read=7217
                           ->  CTE Scan on inserted_capability  (cost=0.00..1.00 rows=1 width=1064) (actual time=137.547..137.547 rows=0 loops=1)
                                 Output: inserted_capability.id, inserted_capability.name, inserted_capability.version
                                 Buffers: shared read=7217
               ->  Hash Join  (cost=1652.75..323644.99 rows=1 width=10) (actual time=31.846..83.234 rows=1100 loops=1)
                     Output: wanted_capability_1.ordering, rhnpackagecapability.id
                     Hash Cond: ((rhnpackagecapability.name)::text = wanted_capability_1.name)
                     Join Filter: (NOT (wanted_capability_1.version IS DISTINCT FROM (rhnpackagecapability.version)::text))
                     Buffers: shared hit=7217
                     ->  Seq Scan on public.rhnpackagecapability  (cost=0.00..252699.00 rows=490964 width=79) (actual time=0.007..29.702 rows=490964 loops=1)
                           Output: rhnpackagecapability.id, rhnpackagecapability.name, rhnpackagecapability.version, rhnpackagecapability.created, rhnpackagecapability.modified
                           Buffers: shared hit=7217
                     ->  Hash  (cost=1100.00..1100.00 rows=1100 width=68) (actual time=0.257..0.257 rows=1100 loops=1)
                           Output: wanted_capability_1.ordering, wanted_capability_1.name, wanted_capability_1.version
                           Buckets: 2048  Batches: 1  Memory Usage: 134kB
                           ->  CTE Scan on wanted_capability wanted_capability_1  (cost=0.00..1100.00 rows=1100 width=68) (actual time=0.001..0.067 rows=1100 loops=1)
                                 Output: wanted_capability_1.ordering, wanted_capability_1.name, wanted_capability_1.version
 Planning time: 3.232 ms
 Execution time: 221.668 ms

> Probably, there is a problem in wrong estimation.

Yes, that's what I would also assume.

> What can be expected because CTE is optimization fence in this version
I am aware of that, but would not expect it to really be a problem in this specific case. Fact that CTE is an optimization fence is true regardless of work_mem, so ATM I cannot see why it would lead to slow down the query in high work_mem case.

I am sure I am still missing something...

This parts looks strange

                     ->  Sort  (cost=299108.00..300335.41 rows=490964 width=79) (actual time=6458.988..6477.151 rows=462600 loops=1)
                           Output: rhnpackagecapability.id, rhnpackagecapability.name, rhnpackagecapability.version
                           Sort Key: rhnpackagecapability.name
                           Sort Method: quicksort  Memory: 79862kB
                           Buffers: shared hit=7217
                           ->  Seq Scan on public.rhnpackagecapability  (cost=0.00..252699.00 rows=490964 width=79) (actual time=0.012..50.467 rows=490964 loops=1)
                                 Output: rhnpackagecapability.id, rhnpackagecapability.name, rhnpackagecapability.version
                                 Buffers: shared hit=7217

I did some test case


postgres=# explain (analyze, buffers) select * from foo2 join foo3 on foo2.name = foo3.name;
                                                     QUERY PLAN                                                    
--------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=188.62..33869.93 rows=866330 width=78) (actual time=6.247..369.081 rows=934000 loops=1)
   Hash Cond: (foo2.name = foo3.name)
   Buffers: shared hit=2224 read=4092
   ->  Seq Scan on foo2  (cost=0.00..12518.00 rows=625000 width=48) (actual time=0.095..70.174 rows=625000 loops=1)
         Buffers: shared hit=2176 read=4092
   ->  Hash  (cost=110.50..110.50 rows=6250 width=30) (actual time=6.116..6.116 rows=6250 loops=1)
         Buckets: 8192  Batches: 1  Memory Usage: 447kB
         Buffers: shared hit=48
         ->  Seq Scan on foo3  (cost=0.00..110.50 rows=6250 width=30) (actual time=0.014..1.801 rows=6250 loops=1)
               Buffers: shared hit=48
 Planning Time: 1.190 ms
 Execution Time: 414.264 ms
(12 rows)

postgres=# explain (analyze, buffers) select * from foo2 join foo3 on foo2.name = foo3.name;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=73189.73..86215.92 rows=866330 width=78) (actual time=1499.805..1835.262 rows=934000 loops=1)
   Merge Cond: (foo3.name = foo2.name)
   Buffers: shared hit=2261 read=4060, temp read=13104 written=10023
   ->  Sort  (cost=504.55..520.18 rows=6250 width=30) (actual time=21.313..21.895 rows=6250 loops=1)
         Sort Key: foo3.name
         Sort Method: quicksort  Memory: 763kB
         Buffers: shared hit=53
         ->  Seq Scan on foo3  (cost=0.00..110.50 rows=6250 width=30) (actual time=0.017..1.802 rows=6250 loops=1)
               Buffers: shared hit=48
   ->  Sort  (cost=72685.18..74247.68 rows=625000 width=48) (actual time=1478.480..1602.358 rows=933999 loops=1)
         Sort Key: foo2.name
         Sort Method: external sort  Disk: 40088kB
         Buffers: shared hit=2208 read=4060, temp read=12196 written=10023
         ->  Seq Scan on foo2  (cost=0.00..12518.00 rows=625000 width=48) (actual time=0.039..63.340 rows=625000 loops=1)
               Buffers: shared hit=2208 read=4060
 Planning Time: 1.116 ms
 Execution Time: 1884.985 ms
(17 rows)

And looks little bit strange the cost on seq scan on foo2 12K against cost of your public.rhnpackagecapability - 252K.

Do you have some planner variables changed - like seq_page_cost?

I did some tests and it looks so a penalization for sort long keys is not too high. In your case it is reason why sort is very slow (probably due slow locales). Then the cost of hash join and sort is similar, although in reality it is not true.

On your plan is strange the cost of seq scan. It is surprisingly high.






Regards

Pavel





Thanks!

Regards,
--
Silvio Moioli
SUSE Manager Development Team


pgsql-performance by date:

Previous
From: Silvio Moioli
Date:
Subject: Re: Increasing work_mem slows down query, why?
Next
From: Silvio Moioli
Date:
Subject: Re: Increasing work_mem slows down query, why?