Thread: plan with result cache is very slow when work_mem is not enough

plan with result cache is very slow when work_mem is not enough

From
Pavel Stehule
Date:
Hi

I am testing new features of Postgres 14, and now I am trying to check the result cache. Unfortunately on my test data, the result is not too good.
the behaviour is very non linear. Is it expected?

create table t1(a int, t2_id int);
insert into t1 select random() * 100000, random() * 100000 from generate_series(1,1000000);
create table t2(b int, id int);
insert into t2 select random() * 100000, random() * 100000 from generate_series(1,1000000);
create index on t2(id);

vacuum analyze t1, t2;

when work_mem is 40MB

                                                                QUERY PLAN                                                                                                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------                                                                                    
 Nested Loop  (cost=4.65..472639.79 rows=1000000 width=16) (actual time=0.041..1078.882 rows=1000000 loops=1)                                                                                                                  
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..60.212 rows=1000000 loops=1)                                                                                                            
   ->  Result Cache  (cost=4.65..4.67 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1000000)                                                                                                                          
         Cache Key: t1.t2_id                                                                                                                                                                                                  
         Hits: 900006  Misses: 99994  Evictions: 0  Overflows: 0  Memory Usage: 10547kB                                                                                                                                        
         ->  Aggregate  (cost=4.64..4.65 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=99994)                                                                                                                        
               ->  Index Only Scan using t2_id_idx on t2  (cost=0.42..4.62 rows=11 width=0) (actual time=0.002..0.003 rows=10 loops=99994)                                                                                    
                     Index Cond: (id = t1.t2_id)                                                                                                                                                                              
                     Heap Fetches: 0                                                                                                                                                                                          
 Planning Time: 0.091 ms                                                                                                                                                                                                      
 Execution Time: 1120.177 ms 

when work_mem is 10MB


postgres=# set work_mem to '10MB'; -- 11MB is ok
SET
postgres=# explain analyze select * from t1, lateral(select count(*) from t2 where t1.t2_id = t2.id) s ;
                                                                 QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.65..472639.79 rows=1000000 width=16) (actual time=0.040..56576.187 rows=1000000 loops=1)
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..76.753 rows=1000000 loops=1)
   ->  Result Cache  (cost=4.65..4.67 rows=1 width=8) (actual time=0.056..0.056 rows=1 loops=1000000)
         Cache Key: t1.t2_id
         Hits: 884158  Misses: 115842  Evictions: 18752  Overflows: 0  Memory Usage: 10241kB
         ->  Aggregate  (cost=4.64..4.65 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=115842)
               ->  Index Only Scan using t2_id_idx on t2  (cost=0.42..4.62 rows=11 width=0) (actual time=0.003..0.004 rows=10 loops=115842)
                     Index Cond: (id = t1.t2_id)
                     Heap Fetches: 0
 Planning Time: 0.087 ms
 Execution Time: 56621.421 ms
(11 rows)

The query without result cache

postgres=# explain analyze select * from t1, lateral(select count(*) from t2 where t1.t2_id = t2.id) s ;
                                                              QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.64..4689425.00 rows=1000000 width=16) (actual time=0.031..3260.858 rows=1000000 loops=1)
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.008..71.792 rows=1000000 loops=1)
   ->  Aggregate  (cost=4.64..4.65 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000000)
         ->  Index Only Scan using t2_id_idx on t2  (cost=0.42..4.62 rows=11 width=0) (actual time=0.002..0.002 rows=10 loops=1000000)
               Index Cond: (id = t1.t2_id)
               Heap Fetches: 0
 Planning Time: 0.081 ms
 Execution Time: 3293.543 ms
(8 rows)



Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
Overhead  Shared Object             Symbol                      
  79.20%  postgres                  [.] cache_reduce_memory    
   1.94%  [kernel]                  [k] native_write_msr_safe  
   1.63%  [kernel]                  [k] update_cfs_shares      
   1.00%  [kernel]                  [k] trigger_load_balance    
   0.97%  [kernel]                  [k] timerqueue_add          
   0.51%  [kernel]                  [k] task_tick_fair          
   0.51%  [kernel]                  [k] task_cputime            
   0.50%  [kernel]                  [k] perf_event_task_tick    
   0.50%  [kernel]                  [k] update_curr            
   0.49%  [kernel]                  [k] hrtimer_active         

Regards

Pavel

Re: plan with result cache is very slow when work_mem is not enough

From
Pavel Stehule
Date:


pá 7. 5. 2021 v 20:24 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

I am testing new features of Postgres 14, and now I am trying to check the result cache. Unfortunately on my test data, the result is not too good.
the behaviour is very non linear. Is it expected?

create table t1(a int, t2_id int);
insert into t1 select random() * 100000, random() * 100000 from generate_series(1,1000000);
create table t2(b int, id int);
insert into t2 select random() * 100000, random() * 100000 from generate_series(1,1000000);
create index on t2(id);

vacuum analyze t1, t2;

when work_mem is 40MB

                                                                QUERY PLAN                                                                                                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------                                                                                    
 Nested Loop  (cost=4.65..472639.79 rows=1000000 width=16) (actual time=0.041..1078.882 rows=1000000 loops=1)                                                                                                                  
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..60.212 rows=1000000 loops=1)                                                                                                            
   ->  Result Cache  (cost=4.65..4.67 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1000000)                                                                                                                          
         Cache Key: t1.t2_id                                                                                                                                                                                                  
         Hits: 900006  Misses: 99994  Evictions: 0  Overflows: 0  Memory Usage: 10547kB                                                                                                                                        
         ->  Aggregate  (cost=4.64..4.65 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=99994)                                                                                                                        
               ->  Index Only Scan using t2_id_idx on t2  (cost=0.42..4.62 rows=11 width=0) (actual time=0.002..0.003 rows=10 loops=99994)                                                                                    
                     Index Cond: (id = t1.t2_id)                                                                                                                                                                              
                     Heap Fetches: 0                                                                                                                                                                                          
 Planning Time: 0.091 ms                                                                                                                                                                                                      
 Execution Time: 1120.177 ms 

when work_mem is 10MB


postgres=# set work_mem to '10MB'; -- 11MB is ok
SET
postgres=# explain analyze select * from t1, lateral(select count(*) from t2 where t1.t2_id = t2.id) s ;
                                                                 QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.65..472639.79 rows=1000000 width=16) (actual time=0.040..56576.187 rows=1000000 loops=1)
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..76.753 rows=1000000 loops=1)
   ->  Result Cache  (cost=4.65..4.67 rows=1 width=8) (actual time=0.056..0.056 rows=1 loops=1000000)
         Cache Key: t1.t2_id
         Hits: 884158  Misses: 115842  Evictions: 18752  Overflows: 0  Memory Usage: 10241kB
         ->  Aggregate  (cost=4.64..4.65 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=115842)
               ->  Index Only Scan using t2_id_idx on t2  (cost=0.42..4.62 rows=11 width=0) (actual time=0.003..0.004 rows=10 loops=115842)
                     Index Cond: (id = t1.t2_id)
                     Heap Fetches: 0
 Planning Time: 0.087 ms
 Execution Time: 56621.421 ms
(11 rows)


can be possible to disable caching when the number of evictions across some limit ?

Can be calculated some average cache hit ratio against evictions, and when this ratio will be too big, then the cache can be bypassed.


 
The query without result cache

postgres=# explain analyze select * from t1, lateral(select count(*) from t2 where t1.t2_id = t2.id) s ;
                                                              QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.64..4689425.00 rows=1000000 width=16) (actual time=0.031..3260.858 rows=1000000 loops=1)
   ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.008..71.792 rows=1000000 loops=1)
   ->  Aggregate  (cost=4.64..4.65 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000000)
         ->  Index Only Scan using t2_id_idx on t2  (cost=0.42..4.62 rows=11 width=0) (actual time=0.002..0.002 rows=10 loops=1000000)
               Index Cond: (id = t1.t2_id)
               Heap Fetches: 0
 Planning Time: 0.081 ms
 Execution Time: 3293.543 ms
(8 rows)



Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
Overhead  Shared Object             Symbol                      
  79.20%  postgres                  [.] cache_reduce_memory    
   1.94%  [kernel]                  [k] native_write_msr_safe  
   1.63%  [kernel]                  [k] update_cfs_shares      
   1.00%  [kernel]                  [k] trigger_load_balance    
   0.97%  [kernel]                  [k] timerqueue_add          
   0.51%  [kernel]                  [k] task_tick_fair          
   0.51%  [kernel]                  [k] task_cputime            
   0.50%  [kernel]                  [k] perf_event_task_tick    
   0.50%  [kernel]                  [k] update_curr            
   0.49%  [kernel]                  [k] hrtimer_active         

Regards

Pavel

Re: plan with result cache is very slow when work_mem is not enough

From
Yura Sokolov
Date:
Pavel Stehule писал 2021-05-07 21:45:
>> 
>> Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
>> Overhead  Shared Object             Symbol
>> 79.20%  postgres                  [.] cache_reduce_memory
>> 1.94%  [kernel]                  [k] native_write_msr_safe
>> 1.63%  [kernel]                  [k] update_cfs_shares
>> 1.00%  [kernel]                  [k] trigger_load_balance
>> 0.97%  [kernel]                  [k] timerqueue_add
>> 0.51%  [kernel]                  [k] task_tick_fair
>> 0.51%  [kernel]                  [k] task_cputime
>> 0.50%  [kernel]                  [k] perf_event_task_tick
>> 0.50%  [kernel]                  [k] update_curr
>> 0.49%  [kernel]                  [k] hrtimer_active
>> 
>> Regards
>> 
>> Pavel

It is strange to see cache_reduce_memory itself consumes a lot of CPU.
It doesn't contain CPU hungry code.
It calls prepare_probe_slot, that calls some tuple forming. Then
it calls resultcache_lookup that may call to ResultCacheHash_hash
and ResultCacheHash_equal. And finally it calls remove_cache_entry.
I suppose remove_cache_entry should consume most of CPU time since
it does deallocations.
And if you compile with --enable-cassert, then remove_cache_entry
iterates through whole cache hashtable, therefore it reaches
quadratic complexity easily (or more correct M*N, where M - size
of a table, N - eviction count).

regards,
Yura Sokolov



Re: plan with result cache is very slow when work_mem is not enough

From
Tomas Vondra
Date:

On 5/7/21 9:06 PM, Yura Sokolov wrote:
> Pavel Stehule писал 2021-05-07 21:45:
>>>
>>> Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
>>> Overhead  Shared Object             Symbol
>>> 79.20%  postgres                  [.] cache_reduce_memory
>>> 1.94%  [kernel]                  [k] native_write_msr_safe
>>> 1.63%  [kernel]                  [k] update_cfs_shares
>>> 1.00%  [kernel]                  [k] trigger_load_balance
>>> 0.97%  [kernel]                  [k] timerqueue_add
>>> 0.51%  [kernel]                  [k] task_tick_fair
>>> 0.51%  [kernel]                  [k] task_cputime
>>> 0.50%  [kernel]                  [k] perf_event_task_tick
>>> 0.50%  [kernel]                  [k] update_curr
>>> 0.49%  [kernel]                  [k] hrtimer_active
>>>
>>> Regards
>>>
>>> Pavel
> 
> It is strange to see cache_reduce_memory itself consumes a lot of CPU.
> It doesn't contain CPU hungry code.
> It calls prepare_probe_slot, that calls some tuple forming. Then
> it calls resultcache_lookup that may call to ResultCacheHash_hash
> and ResultCacheHash_equal. And finally it calls remove_cache_entry.
> I suppose remove_cache_entry should consume most of CPU time since
> it does deallocations.
> And if you compile with --enable-cassert, then remove_cache_entry
> iterates through whole cache hashtable, therefore it reaches
> quadratic complexity easily (or more correct M*N, where M - size
> of a table, N - eviction count).
> 

Yeah. I tried reproducing the issue, but without success ...

Not sure what's wrong, but --enable-cassert is one option. Or maybe 
there's some funny behavior due to collecting timing info?

FWIW the timings on my laptop look like this:

work_mem=40MB    5065ms
work_mem=10MB    5104ms
resultcache=off 13453ms

So a very different behavior from what Pavel reported. But if I rebuild 
with casserts, I get the same massive slowdown, so I guess that's it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: plan with result cache is very slow when work_mem is not enough

From
Pavel Stehule
Date:


pá 7. 5. 2021 v 21:06 odesílatel Yura Sokolov <y.sokolov@postgrespro.ru> napsal:
Pavel Stehule писал 2021-05-07 21:45:
>>
>> Samples: 119K of event 'cycles', 4000 Hz, Event count (approx.):
>> Overhead  Shared Object             Symbol
>> 79.20%  postgres                  [.] cache_reduce_memory
>> 1.94%  [kernel]                  [k] native_write_msr_safe
>> 1.63%  [kernel]                  [k] update_cfs_shares
>> 1.00%  [kernel]                  [k] trigger_load_balance
>> 0.97%  [kernel]                  [k] timerqueue_add
>> 0.51%  [kernel]                  [k] task_tick_fair
>> 0.51%  [kernel]                  [k] task_cputime
>> 0.50%  [kernel]                  [k] perf_event_task_tick
>> 0.50%  [kernel]                  [k] update_curr
>> 0.49%  [kernel]                  [k] hrtimer_active
>>
>> Regards
>>
>> Pavel

It is strange to see cache_reduce_memory itself consumes a lot of CPU.
It doesn't contain CPU hungry code.
It calls prepare_probe_slot, that calls some tuple forming. Then
it calls resultcache_lookup that may call to ResultCacheHash_hash
and ResultCacheHash_equal. And finally it calls remove_cache_entry.
I suppose remove_cache_entry should consume most of CPU time since
it does deallocations.
And if you compile with --enable-cassert, then remove_cache_entry
iterates through whole cache hashtable, therefore it reaches
quadratic complexity easily (or more correct M*N, where M - size
of a table, N - eviction count).

yes, the slowdown is related to debug assertions

Pavel


regards,
Yura Sokolov

Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sat, 8 May 2021 at 07:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> yes, the slowdown is related to debug assertions

With USE_ASSERT_CHECKING builds, I did add some code that verifies the
memory tracking is set correctly when evicting from the cache. This
code is pretty expensive as it loops over the entire cache to check
the memory accounting every time we evict something from the cache.
Originally, I had this code only run when some other constant was
defined, but I ended up changing it to compile that code in for all
assert enabled builds.

I considered that it might be too expensive as you can see from the
comment in [1].  I just wanted to get a few machines other than my own
to verify that the memory accounting code was working as expected.
There have been no complaints of any Assert failures yet, so maybe
it's safe to consider either removing the code entirely or just having
it run when some other more specific to purpose constant is defined.
If we did the latter, then I'd have concerns that nothing would ever
run the code to check the memory accounting, that's why I ended up
changing it to run with USE_ASSERT_CHECKING builds.

David

[1] https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeResultCache.c#L305



Re: plan with result cache is very slow when work_mem is not enough

From
Pavel Stehule
Date:


pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Sat, 8 May 2021 at 07:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> yes, the slowdown is related to debug assertions

With USE_ASSERT_CHECKING builds, I did add some code that verifies the
memory tracking is set correctly when evicting from the cache. This
code is pretty expensive as it loops over the entire cache to check
the memory accounting every time we evict something from the cache.
Originally, I had this code only run when some other constant was
defined, but I ended up changing it to compile that code in for all
assert enabled builds.

I considered that it might be too expensive as you can see from the
comment in [1].  I just wanted to get a few machines other than my own
to verify that the memory accounting code was working as expected.
There have been no complaints of any Assert failures yet, so maybe
it's safe to consider either removing the code entirely or just having
it run when some other more specific to purpose constant is defined.
If we did the latter, then I'd have concerns that nothing would ever
run the code to check the memory accounting, that's why I ended up
changing it to run with USE_ASSERT_CHECKING builds.

I understand. I think this is too slow for generic assertions, because the overhead is about 50x. 

But I understand, so it may be necessary to have this code active some time.

Regards

Pavel


David

[1] https://github.com/postgres/postgres/blob/master/src/backend/executor/nodeResultCache.c#L305

Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sat, 8 May 2021 at 08:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
>> With USE_ASSERT_CHECKING builds, I did add some code that verifies the
>> memory tracking is set correctly when evicting from the cache. This
>> code is pretty expensive as it loops over the entire cache to check
>> the memory accounting every time we evict something from the cache.
>> Originally, I had this code only run when some other constant was
>> defined, but I ended up changing it to compile that code in for all
>> assert enabled builds.
>>
>> I considered that it might be too expensive as you can see from the
>> comment in [1].  I just wanted to get a few machines other than my own
>> to verify that the memory accounting code was working as expected.
>> There have been no complaints of any Assert failures yet, so maybe
>> it's safe to consider either removing the code entirely or just having
>> it run when some other more specific to purpose constant is defined.
>> If we did the latter, then I'd have concerns that nothing would ever
>> run the code to check the memory accounting, that's why I ended up
>> changing it to run with USE_ASSERT_CHECKING builds.
>
>
> I understand. I think this is too slow for generic assertions, because the overhead is about 50x.

I didn't expect it would show up quite that much.  If you scaled the
test up a bit more and increased work_mem further, then it would be
even more than 50x.

At one point when I was developing the patch, I had two high water
marks for cache memory. When we reached the upper of the two marks,
I'd reduce the memory down to the lower of two marks.  The lower of
the two marks was set to 98% of the higher mark.  In the end, I got
rid of that as I didn't really see what extra overhead there was from
just running the eviction code every time we require another byte.
However, if we did have that again, then the memory checking could
just be done when we run the eviction code. We'd then need to consume
that 2% more memory before it would run again.

My current thinking is that I don't really want to add that complexity
just for some Assert code. I'd only want to do it if there was another
valid reason to.

Another thought I have is that maybe it would be ok just to move
memory accounting debug code so it only runs once in
ExecEndResultCache.  I struggling to imagine that if the memory
tracking did go out of whack, that the problem would have accidentally
fixed itself by the time we got to ExecEndResultCache().  I guess even
if the accounting was counting far too much memory and we'd evicted
everything from the cache to try and get the memory usage down, we'd
still find the problem during ExecEndResultCache(), even if the cache
had become completely empty as a result.

David



Re: plan with result cache is very slow when work_mem is not enough

From
Tomas Vondra
Date:
On 5/7/21 11:04 PM, David Rowley wrote:
> On Sat, 8 May 2021 at 08:18, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
>>> With USE_ASSERT_CHECKING builds, I did add some code that verifies the
>>> memory tracking is set correctly when evicting from the cache. This
>>> code is pretty expensive as it loops over the entire cache to check
>>> the memory accounting every time we evict something from the cache.
>>> Originally, I had this code only run when some other constant was
>>> defined, but I ended up changing it to compile that code in for all
>>> assert enabled builds.
>>>
>>> I considered that it might be too expensive as you can see from the
>>> comment in [1].  I just wanted to get a few machines other than my own
>>> to verify that the memory accounting code was working as expected.
>>> There have been no complaints of any Assert failures yet, so maybe
>>> it's safe to consider either removing the code entirely or just having
>>> it run when some other more specific to purpose constant is defined.
>>> If we did the latter, then I'd have concerns that nothing would ever
>>> run the code to check the memory accounting, that's why I ended up
>>> changing it to run with USE_ASSERT_CHECKING builds.
>>
>>
>> I understand. I think this is too slow for generic assertions, because the overhead is about 50x.
> 
> I didn't expect it would show up quite that much.  If you scaled the
> test up a bit more and increased work_mem further, then it would be
> even more than 50x.
> 
> At one point when I was developing the patch, I had two high water
> marks for cache memory. When we reached the upper of the two marks,
> I'd reduce the memory down to the lower of two marks.  The lower of
> the two marks was set to 98% of the higher mark.  In the end, I got
> rid of that as I didn't really see what extra overhead there was from
> just running the eviction code every time we require another byte.
> However, if we did have that again, then the memory checking could
> just be done when we run the eviction code. We'd then need to consume
> that 2% more memory before it would run again.
> 
> My current thinking is that I don't really want to add that complexity
> just for some Assert code. I'd only want to do it if there was another
> valid reason to.
> 

Agreed. I think this approach to eviction (i.e. evicting more than you 
need) would be useful if the actual eviction code was expensive, and 
doing it in a "batch" would make it significantly cheaper. But I don't 
think "asserts are expensive" is a good reason for it.

> Another thought I have is that maybe it would be ok just to move
> memory accounting debug code so it only runs once in
> ExecEndResultCache.  I struggling to imagine that if the memory
> tracking did go out of whack, that the problem would have accidentally
> fixed itself by the time we got to ExecEndResultCache().  I guess even
> if the accounting was counting far too much memory and we'd evicted
> everything from the cache to try and get the memory usage down, we'd
> still find the problem during ExecEndResultCache(), even if the cache
> had become completely empty as a result.
> 

I don't think postponing the debug code until much later is a great 
idea. When something goes wrong it's good to know ASAP, otherwise it's 
much more difficult to identify the issue.

Not sure we need to do something here - for regression tests this is not 
an issue, because those generally work with small data sets. And if you 
run with asserts on large amounts of data, I think this is acceptable.

I had the same dilemma with the new BRIN index opclasses, which also 
have some extensive and expensive assert checks - for the regression 
tests that's fine, and it proved very useful during development.

I have considered enabling those extra checks only on request somehow, 
but I'd bet no one would do that and I'd forget it exists pretty soon.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sat, 8 May 2021 at 09:16, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> On 5/7/21 11:04 PM, David Rowley wrote:
> > Another thought I have is that maybe it would be ok just to move
> > memory accounting debug code so it only runs once in
> > ExecEndResultCache.  I struggling to imagine that if the memory
> > tracking did go out of whack, that the problem would have accidentally
> > fixed itself by the time we got to ExecEndResultCache().  I guess even
> > if the accounting was counting far too much memory and we'd evicted
> > everything from the cache to try and get the memory usage down, we'd
> > still find the problem during ExecEndResultCache(), even if the cache
> > had become completely empty as a result.
> >
>
> I don't think postponing the debug code until much later is a great
> idea. When something goes wrong it's good to know ASAP, otherwise it's
> much more difficult to identify the issue.

I thought about this a bit and I was about to agree, but then I changed my mind.

The biggest concern I have on this topic is that we end up with zero
validation done for the memory accounting. If we can find a cheaper
place to put the Asserts that will at least bring our attention to the
fact that some problem exists, then more investigation can ensue. I
don't personally expect that every assert failure will show us the
exact location of the bug.

Additionally, I don't really think there is a huge amount of room for
bugs creeping in here as there's not all that many places that the
'mem_used' field gets updated, so there are not many places to forget
to do it.

Another way to look at this is that, where the Asserts are today,
there are zero memory accounting checks done in all cases that don't
evict any tuples. I feel by moving the tests to ExecEndResultCache()
we'll do memory validation for all plans using a Result Cache in
Assert builds.  Yes, we might just need to do a bit more work to find
out exactly where the problem is, but some investigation would need to
happen anyway. I think if anyone changes anything which breaks the
memory accounting then they'll be aware of it quite quickly and they
can just look at what they did wrong.

David

Attachment

Re: plan with result cache is very slow when work_mem is not enough

From
Justin Pryzby
Date:
On Sat, May 08, 2021 at 02:26:44PM +1200, David Rowley wrote:
> On Sat, 8 May 2021 at 09:16, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> > On 5/7/21 11:04 PM, David Rowley wrote:
> > > Another thought I have is that maybe it would be ok just to move
> > > memory accounting debug code so it only runs once in
> > > ExecEndResultCache.  I struggling to imagine that if the memory
> > > tracking did go out of whack, that the problem would have accidentally
> > > fixed itself by the time we got to ExecEndResultCache().  I guess even
> > > if the accounting was counting far too much memory and we'd evicted
> > > everything from the cache to try and get the memory usage down, we'd
> > > still find the problem during ExecEndResultCache(), even if the cache
> > > had become completely empty as a result.
> >
> > I don't think postponing the debug code until much later is a great
> > idea. When something goes wrong it's good to know ASAP, otherwise it's
> > much more difficult to identify the issue.
> 
> I thought about this a bit and I was about to agree, but then I changed my mind.

> Yes, we might just need to do a bit more work to find
> out exactly where the problem is, but some investigation would need to
> happen anyway. I think if anyone changes anything which breaks the
> memory accounting then they'll be aware of it quite quickly and they
> can just look at what they did wrong.

You could put this into a separate function called by ExecEndResultCache().
Then anyone that breaks the memory accounting can also call the function in the
places they changed to help figure out what they broke.

-        * Validate the memory accounting code is correct in assert builds. XXX is
-        * this too expensive for USE_ASSERT_CHECKING?

-- 
Justin



Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sat, 8 May 2021 at 14:43, Justin Pryzby <pryzby@telsasoft.com> wrote:
> You could put this into a separate function called by ExecEndResultCache().
> Then anyone that breaks the memory accounting can also call the function in the
> places they changed to help figure out what they broke.

I almost did it that way and left a call to it in remove_cache_entry()
#ifdef'd out.  However, as mentioned, I'm more concerned about the
accounting being broken and left broken than I am with making it take
a little less time to find the exact place to fix the breakage.  If
the breakage was to occur when adding a new entry to the cache then it
might not narrow it down much if we just give users an easy way to
check the memory accounting during only evictions. The only way to
highlight the problem as soon as it occurs would be to validate the
memory tracking every time the mem_used field is changed. I think that
would be overkill.

I also find it hard to imagine what other reasons we'll have in the
future to adjust 'mem_used'.  At the moment there are 4 places. Two
that add bytes and two that subtract bytes. They're all hidden inside
reusable functions that are in charge of adding and removing entries
from the cache.

David



Re: plan with result cache is very slow when work_mem is not enough

From
Yura Sokolov
Date:

8 мая 2021 г. 00:16:54 GMT+03:00, Tomas Vondra <tomas.vondra@enterprisedb.com> пишет:
>On 5/7/21 11:04 PM, David Rowley wrote:
>> On Sat, 8 May 2021 at 08:18, Pavel Stehule <pavel.stehule@gmail.com>
>wrote:
>>>
>>> pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml@gmail.com>
>napsal:
>>>> With USE_ASSERT_CHECKING builds, I did add some code that verifies
>the
>>>> memory tracking is set correctly when evicting from the cache. This
>>>> code is pretty expensive as it loops over the entire cache to check
>>>> the memory accounting every time we evict something from the cache.
>>>> Originally, I had this code only run when some other constant was
>>>> defined, but I ended up changing it to compile that code in for all
>>>> assert enabled builds.
>>>>
>>>> I considered that it might be too expensive as you can see from the
>>>> comment in [1].  I just wanted to get a few machines other than my
>own
>>>> to verify that the memory accounting code was working as expected.
>>>> There have been no complaints of any Assert failures yet, so maybe
>>>> it's safe to consider either removing the code entirely or just
>having
>>>> it run when some other more specific to purpose constant is
>defined.
>>>> If we did the latter, then I'd have concerns that nothing would
>ever
>>>> run the code to check the memory accounting, that's why I ended up
>>>> changing it to run with USE_ASSERT_CHECKING builds.
>>>
>>>
>>> I understand. I think this is too slow for generic assertions,
>because the overhead is about 50x.
>>
>> I didn't expect it would show up quite that much.  If you scaled the
>> test up a bit more and increased work_mem further, then it would be
>> even more than 50x.
>>
>> At one point when I was developing the patch, I had two high water
>> marks for cache memory. When we reached the upper of the two marks,
>> I'd reduce the memory down to the lower of two marks.  The lower of
>> the two marks was set to 98% of the higher mark.  In the end, I got
>> rid of that as I didn't really see what extra overhead there was from
>> just running the eviction code every time we require another byte.
>> However, if we did have that again, then the memory checking could
>> just be done when we run the eviction code. We'd then need to consume
>> that 2% more memory before it would run again.
>>
>> My current thinking is that I don't really want to add that
>complexity
>> just for some Assert code. I'd only want to do it if there was
>another
>> valid reason to.
>>
>
>Agreed. I think this approach to eviction (i.e. evicting more than you
>need) would be useful if the actual eviction code was expensive, and
>doing it in a "batch" would make it significantly cheaper. But I don't
>think "asserts are expensive" is a good reason for it.
>
>> Another thought I have is that maybe it would be ok just to move
>> memory accounting debug code so it only runs once in
>> ExecEndResultCache.  I struggling to imagine that if the memory
>> tracking did go out of whack, that the problem would have
>accidentally
>> fixed itself by the time we got to ExecEndResultCache().  I guess
>even
>> if the accounting was counting far too much memory and we'd evicted
>> everything from the cache to try and get the memory usage down, we'd
>> still find the problem during ExecEndResultCache(), even if the cache
>> had become completely empty as a result.
>>
>
>I don't think postponing the debug code until much later is a great
>idea. When something goes wrong it's good to know ASAP, otherwise it's
>much more difficult to identify the issue.
>
>Not sure we need to do something here - for regression tests this is
>not
>an issue, because those generally work with small data sets. And if you
>
>run with asserts on large amounts of data, I think this is acceptable.
>
>I had the same dilemma with the new BRIN index opclasses, which also
>have some extensive and expensive assert checks - for the regression
>tests that's fine, and it proved very useful during development.
>
>I have considered enabling those extra checks only on request somehow,
>but I'd bet no one would do that and I'd forget it exists pretty soon.
>
>
>regards

Perhaps there is need for flag "heavy asserts". Or option "--enable-cassert=heavy". Then USE_ASSERT_CHECKING could be
definedto integer 1 or 2 depending on "heaviness" of enabled checks. 

regards
Yura Sokolov



Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sat, 8 May 2021 at 19:03, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> Perhaps there is need for flag "heavy asserts". Or option "--enable-cassert=heavy". Then USE_ASSERT_CHECKING could be
definedto integer 1 or 2 depending on "heaviness" of enabled checks.
 

I'd rather explore all other options before we went and did something
like that.  The reason I feel this way is; a couple of jobs ago, I
worked on a project that had quite a large test suite.  In order to
run the tests on a typical dev machine took about 16 hours. When
writing new tests, nobody really ever gave much thought to how long
tests would take to run. The time it took had already grown to
something much longer than was practical to run before a commit.
Instead, we had a test farm. We'd commit to master (yuck) and the test
farm would run the tests.  Unfortunately, the farm was not big enough
to run every test on every commit, so if it encountered a test failure
(as it commonly did) it tried to bisect and find the first breaking
commit for that test then alert the person (probably the next day).
bisecting works out pretty badly when someone commits a build breaking
commit.

Now that I work on PostgreSQL, I very much like the fact that I can
run make check-world -j and see very quickly that no tests fail.  I'd
rather not have something like heavy asserts as I believe it would
just become a dumping ground for badly thought through Asserts that
most likely could be done in some better way that didn't require so
much effort. (I'll put mine in that category)  It might not take long
before most committers opted to run make check-world without the heavy
asserts because it became just too slow.  When that happens, it means
the buildfarm is left to catch these errors and most likely the
feedback will take much longer to get back to the committer due to the
animals running these flags also having to churn through the slow
tests.

So, I'm -1 for "heavy asserts".

David



Re: plan with result cache is very slow when work_mem is not enough

From
Tomas Vondra
Date:

On 5/8/21 9:39 AM, David Rowley wrote:
> On Sat, 8 May 2021 at 19:03, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> Perhaps there is need for flag "heavy asserts". Or option "--enable-cassert=heavy". Then USE_ASSERT_CHECKING could
bedefined to integer 1 or 2 depending on "heaviness" of enabled checks.
 
> 
> I'd rather explore all other options before we went and did something
> like that.  The reason I feel this way is; a couple of jobs ago, I
> worked on a project that had quite a large test suite.  In order to
> run the tests on a typical dev machine took about 16 hours. When
> writing new tests, nobody really ever gave much thought to how long
> tests would take to run. The time it took had already grown to
> something much longer than was practical to run before a commit.
> Instead, we had a test farm. We'd commit to master (yuck) and the test
> farm would run the tests.  Unfortunately, the farm was not big enough
> to run every test on every commit, so if it encountered a test failure
> (as it commonly did) it tried to bisect and find the first breaking
> commit for that test then alert the person (probably the next day).
> bisecting works out pretty badly when someone commits a build breaking
> commit.
> 
> Now that I work on PostgreSQL, I very much like the fact that I can
> run make check-world -j and see very quickly that no tests fail.  I'd
> rather not have something like heavy asserts as I believe it would
> just become a dumping ground for badly thought through Asserts that
> most likely could be done in some better way that didn't require so
> much effort. (I'll put mine in that category)  It might not take long
> before most committers opted to run make check-world without the heavy
> asserts because it became just too slow.  When that happens, it means
> the buildfarm is left to catch these errors and most likely the
> feedback will take much longer to get back to the committer due to the
> animals running these flags also having to churn through the slow
> tests.
> 
> So, I'm -1 for "heavy asserts".
> 

Yeah, -1 from me to.

But I think we should differentiate between asserts that (a) 
significantly affect "our" regression tests with limited amounts of 
data, and (b) asserts that are an issue for large data sets. The asserts 
discussed in this thread are in (b), but not (a).

I don't know if there's a better way to do these tests, but if there's 
not I'd not worry about it too much for now.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sat, 8 May 2021 at 22:33, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> I don't know if there's a better way to do these tests, but if there's
> not I'd not worry about it too much for now.

So you're -1 on the proposed patch?

David



Re: plan with result cache is very slow when work_mem is not enough

From
Tomas Vondra
Date:
On 5/8/21 1:27 PM, David Rowley wrote:
> On Sat, 8 May 2021 at 22:33, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>> I don't know if there's a better way to do these tests, but if there's
>> not I'd not worry about it too much for now.
> 
> So you're -1 on the proposed patch?
>

Oh! I have not noticed there was a patch. No, I'm not against the patch, 
if it improves the testing (and you're right it's better to do some 
testing even without evictions).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: plan with result cache is very slow when work_mem is not enough

From
Pavel Stehule
Date:


so 8. 5. 2021 v 9:39 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Sat, 8 May 2021 at 19:03, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> Perhaps there is need for flag "heavy asserts". Or option "--enable-cassert=heavy". Then USE_ASSERT_CHECKING could be defined to integer 1 or 2 depending on "heaviness" of enabled checks.

I'd rather explore all other options before we went and did something
like that.  The reason I feel this way is; a couple of jobs ago, I
worked on a project that had quite a large test suite.  In order to
run the tests on a typical dev machine took about 16 hours. When
writing new tests, nobody really ever gave much thought to how long
tests would take to run. The time it took had already grown to
something much longer than was practical to run before a commit.
Instead, we had a test farm. We'd commit to master (yuck) and the test
farm would run the tests.  Unfortunately, the farm was not big enough
to run every test on every commit, so if it encountered a test failure
(as it commonly did) it tried to bisect and find the first breaking
commit for that test then alert the person (probably the next day).
bisecting works out pretty badly when someone commits a build breaking
commit.

Now that I work on PostgreSQL, I very much like the fact that I can
run make check-world -j and see very quickly that no tests fail.  I'd
rather not have something like heavy asserts as I believe it would
just become a dumping ground for badly thought through Asserts that
most likely could be done in some better way that didn't require so
much effort. (I'll put mine in that category)  It might not take long
before most committers opted to run make check-world without the heavy
asserts because it became just too slow.  When that happens, it means
the buildfarm is left to catch these errors and most likely the
feedback will take much longer to get back to the committer due to the
animals running these flags also having to churn through the slow
tests.

So, I'm -1 for "heavy asserts".

Personally, I have not problem with too slow assertions, although it is not too practical. The main problem is some shock, and feeling so some is wrong. I spent 1 hour detecting if it is a bug or not.

Can it be possible to identify this situation?

Maybe use some specific name of this routine - like

assert_only_check_xxxx

Then I can see this warning in perf, and I don't need to do other or deeper checks

Regards

Pavel


David

Re: plan with result cache is very slow when work_mem is not enough

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> so 8. 5. 2021 v 9:39 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
>> So, I'm -1 for "heavy asserts".

> Personally, I have not problem with too slow assertions, although it is not
> too practical.

I'm very much on David's side here.  I'm currently trying to figure out
why CLOBBER_CACHE_ALWAYS tests have gotten something like 10% slower
in the past six weeks or so --- if you look at hyrax for example,
it finished "make check" in 12:52:44 in its last fully-green run
back in mid-March, while the latest report shows 14:08:18.  I think
something needs to be done about that.  It's bad enough that no
developers will run cache-clobber tests by hand; but if it takes
a week to get buildfarm feedback, you've got exactly the same
problems David describes.

            regards, tom lane



Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sun, 9 May 2021 at 00:07, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
> On 5/8/21 1:27 PM, David Rowley wrote:
> > On Sat, 8 May 2021 at 22:33, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> >> I don't know if there's a better way to do these tests, but if there's
> >> not I'd not worry about it too much for now.
> >
> > So you're -1 on the proposed patch?
> >
>
> Oh! I have not noticed there was a patch. No, I'm not against the patch,
> if it improves the testing (and you're right it's better to do some
> testing even without evictions).

There seems to be plenty of other ideas floating around, but since
nobody seems to be objecting to reducing the Assert overheads, I've
pushed the proposed patch.

The more I thought about this, the more I think I should have done it
this way in the first place.

David



Re: plan with result cache is very slow when work_mem is not enough

From
David Rowley
Date:
On Sun, 9 May 2021 at 03:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Personally, I have not problem with too slow assertions, although it is not too practical. The main problem is some
shock,and feeling so some is wrong. I spent 1 hour detecting if it is a bug or not.
 

Thanks for spending the time figuring out where the slowness came from.

> Can it be possible to identify this situation?
>
> Maybe use some specific name of this routine - like
>
> assert_only_check_xxxx
>
> Then I can see this warning in perf, and I don't need to do other or deeper checks

I don't think we need to go around leaving clues for people who run
perf on cassert builds. I think anyone doing that should just never
expect any meaningful results.

David



Re: plan with result cache is very slow when work_mem is not enough

From
Pavel Stehule
Date:


ne 9. 5. 2021 v 3:01 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Sun, 9 May 2021 at 03:29, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Personally, I have not problem with too slow assertions, although it is not too practical. The main problem is some shock, and feeling so some is wrong. I spent 1 hour detecting if it is a bug or not.

Thanks for spending the time figuring out where the slowness came from.

> Can it be possible to identify this situation?
>
> Maybe use some specific name of this routine - like
>
> assert_only_check_xxxx
>
> Then I can see this warning in perf, and I don't need to do other or deeper checks

I don't think we need to go around leaving clues for people who run
perf on cassert builds. I think anyone doing that should just never
expect any meaningful results.

ok

Pavel

David

Re: plan with result cache is very slow when work_mem is not enough

From
Yura Sokolov
Date:
David Rowley писал 2021-05-09 04:01:
> On Sun, 9 May 2021 at 03:29, Pavel Stehule <pavel.stehule@gmail.com> 
> wrote:
>> Personally, I have not problem with too slow assertions, although it 
>> is not too practical. The main problem is some shock, and feeling so 
>> some is wrong. I spent 1 hour detecting if it is a bug or not.
> 
> Thanks for spending the time figuring out where the slowness came from.
> 
>> Can it be possible to identify this situation?
>> 
>> Maybe use some specific name of this routine - like
>> 
>> assert_only_check_xxxx
>> 
>> Then I can see this warning in perf, and I don't need to do other or 
>> deeper checks
> 
> I don't think we need to go around leaving clues for people who run
> perf on cassert builds. I think anyone doing that should just never
> expect any meaningful results.

Occasionally there is a need to run cassert builds in production to
catch an issue. It is usually ok if cassert build O(1) slower than
optimized biuld (ie it is slower in some constant factor C). But
if cassert build will be quadratically slower, it will unusable.

regards,
Yura



Re: plan with result cache is very slow when work_mem is not enough

From
Andres Freund
Date:
Hi,

On 2021-05-09 15:57:22 +0300, Yura Sokolov wrote:
> Occasionally there is a need to run cassert builds in production to
> catch an issue. It is usually ok if cassert build O(1) slower than
> optimized biuld (ie it is slower in some constant factor C). But
> if cassert build will be quadratically slower, it will unusable.

The memory context assertion overhead is more than O(1) expensive. I
think there's plenty other cases like it. We removed some (e.g. it used
to be that we scanned O(#shared_buffers) entries in the local pin table,
at the end of the transaction). I don't think we want to limit ourselves
to O(1) checks. That's not to say we should have a O(n^2) or such,
unless we have confidence n rarely will be big.

Greetings,

Andres Freund