Thread: strange slow query - lost lot of time somewhere

strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:
Hi

I found a query that is significantly slower with more memory



Strange - the time of last row is +/- same, but execution time is 10x worse

It looks like slow environment cleaning


Re: strange slow query - lost lot of time somewhere

From
Matthias van de Meent
Date:
On Mon, 2 May 2022 at 11:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> I found a query that is significantly slower with more memory

Which PostgreSQL version did you use? Did you enable assert checking?
Do you have an example database setup to work with?

> plan 2
>  QUERY PLAN
> ----------------
>  Nested Loop Anti Join  (cost=46.53..2914.58 rows=1 width=16) (actual time=18.306..23.065 rows=32 loops=1)
> ...
>  Execution Time: 451.161 ms

Truly strange; especially the 418ms difference between execution time
and the root node's "actual time". I haven't really seen such
differences happen, except when concurrent locks were held at the
table / index level.

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw
>
> Strange - the time of last row is +/- same, but execution time is 10x worse

The only difference between the two plans that I see is that plan 1
doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes
(one of 66 misses; one of 342 misses). The only "expensive" operation
that I see in memoize nodes is the check for memory size in
assert-enabled builds; and that should have very low overhead
considering that the size of the memoized data is only 8kB and 25kB
respectively.



Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


po 2. 5. 2022 v 15:28 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
On Mon, 2 May 2022 at 11:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> I found a query that is significantly slower with more memory

Which PostgreSQL version did you use? Did you enable assert checking?
Do you have an example database setup to work with?

> plan 2
>  QUERY PLAN
> ----------------
>  Nested Loop Anti Join  (cost=46.53..2914.58 rows=1 width=16) (actual time=18.306..23.065 rows=32 loops=1)
> ...
>  Execution Time: 451.161 ms

Truly strange; especially the 418ms difference between execution time
and the root node's "actual time". I haven't really seen such
differences happen, except when concurrent locks were held at the
table / index level.

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw
>
> Strange - the time of last row is +/- same, but execution time is 10x worse

The only difference between the two plans that I see is that plan 1
doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes
(one of 66 misses; one of 342 misses). The only "expensive" operation
that I see in memoize nodes is the check for memory size in
assert-enabled builds; and that should have very low overhead
considering that the size of the memoized data is only 8kB and 25kB
respectively.

This is PostgreSQL 14 used in production environment

 (2022-05-02 15:37:29) prd_aukro=# show debug_assertions ;
┌──────────────────┐
│ debug_assertions │
├──────────────────┤
│ off              │
└──────────────────┘
(1 řádka)

Čas: 0,295 ms
(2022-05-02 15:37:35) prd_aukro=# select version();
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                version                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 řádka)
Čas: 0,629 ms

there is just shared buffers changed to 32GB and work_mem to 70MB. Unfortunately - it is in production environment with customer data, so I cannot to play too much

This is perf of slow

  25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
  11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
   5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
   5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
   5,15%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
   3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
   3,07%  postmaster  postgres           [.] hash_search_with_hash_value
   2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
   2,49%  postmaster  postgres           [.] heap_page_prune_opt
   1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
   1,85%  postmaster  [kernel.kallsyms]  [k] swapgs_restore_regs_and_return_to_usermode
   1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
   1,73%  postmaster  postgres           [.] _bt_checkkeys
   1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
   1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
   1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
   1,08%  postmaster  [kernel.kallsyms]  [k] percpu_ref_put_many.constprop.84
   1,08%  postmaster  postgres           [.] 0x00000000003c1be6
   1,06%  postmaster  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm.part.49
   0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
   0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
   0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
   0,61%  postmaster  postgres           [.] _bt_checkpage
   0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
   0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
   0,57%  postmaster  postgres           [.] tbm_iterate
   0,56%  postmaster  [kernel.kallsyms]  [k] __count_memcg_events.part.70
   0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
   0,52%  postmaster  postgres           [.] 0x000000000015f6e5
   0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
   0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
   0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
   0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page

This is fast

  21,13%  postmaster  postgres           [.] hash_search_with_hash_value
  15,33%  postmaster  postgres           [.] heap_page_prune_opt
  10,22%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
  10,00%  postmaster  postgres           [.] _bt_checkkeys
   6,23%  postmaster  postgres           [.] 0x00000000003c1be6
   4,94%  postmaster  postgres           [.] _bt_checkpage
   2,85%  postmaster  postgres           [.] tbm_iterate
   2,31%  postmaster  postgres           [.] nocache_index_getattr
   2,13%  postmaster  postgres           [.] pg_qsort
   1,58%  postmaster  postgres           [.] heap_hot_search_buffer
   1,58%  postmaster  postgres           [.] FunctionCall2Coll
   1,58%  postmaster  postgres           [.] 0x000000000015f6e5
   1,17%  postmaster  postgres           [.] LWLockRelease
   0,85%  postmaster  libc-2.28.so       [.] __memcmp_avx2_movbe
   0,64%  postmaster  postgres           [.] 0x00000000003e4233
   0,54%  postmaster  postgres           [.] hash_bytes
   0,53%  postmaster  postgres           [.] 0x0000000000306fbb
   0,53%  postmaster  postgres           [.] LWLockAcquire
   0,42%  postmaster  postgres           [.] 0x00000000003c1c6f
   0,42%  postmaster  postgres           [.] _bt_compare




Regards

Pavel

Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


po 2. 5. 2022 v 15:28 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
On Mon, 2 May 2022 at 11:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> I found a query that is significantly slower with more memory

Which PostgreSQL version did you use? Did you enable assert checking?
Do you have an example database setup to work with?

> plan 2
>  QUERY PLAN
> ----------------
>  Nested Loop Anti Join  (cost=46.53..2914.58 rows=1 width=16) (actual time=18.306..23.065 rows=32 loops=1)
> ...
>  Execution Time: 451.161 ms

Truly strange; especially the 418ms difference between execution time
and the root node's "actual time". I haven't really seen such
differences happen, except when concurrent locks were held at the
table / index level.

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw
>
> Strange - the time of last row is +/- same, but execution time is 10x worse

The only difference between the two plans that I see is that plan 1
doesn't use memoization, whereas plan 2 does use 2 memoize plan nodes
(one of 66 misses; one of 342 misses). The only "expensive" operation
that I see in memoize nodes is the check for memory size in
assert-enabled builds; and that should have very low overhead
considering that the size of the memoized data is only 8kB and 25kB
respectively.

This is PostgreSQL 14 used in production environment

 (2022-05-02 15:37:29) prd_aukro=# show debug_assertions ;
┌──────────────────┐
│ debug_assertions │
├──────────────────┤
│ off              │
└──────────────────┘
(1 řádka)

Čas: 0,295 ms
(2022-05-02 15:37:35) prd_aukro=# select version();
┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                version                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 řádka)
Čas: 0,629 ms

there is just shared buffers changed to 32GB and work_mem to 70MB. Unfortunately - it is in production environment with customer data, so I cannot to play too much

This is perf of slow

  25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
  11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
   5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
   5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
   5,15%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
   3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
   3,07%  postmaster  postgres           [.] hash_search_with_hash_value
   2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
   2,49%  postmaster  postgres           [.] heap_page_prune_opt
   1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
   1,85%  postmaster  [kernel.kallsyms]  [k] swapgs_restore_regs_and_return_to_usermode
   1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
   1,73%  postmaster  postgres           [.] _bt_checkkeys
   1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
   1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
   1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
   1,08%  postmaster  [kernel.kallsyms]  [k] percpu_ref_put_many.constprop.84
   1,08%  postmaster  postgres           [.] 0x00000000003c1be6
   1,06%  postmaster  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm.part.49
   0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
   0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
   0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
   0,61%  postmaster  postgres           [.] _bt_checkpage
   0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
   0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
   0,57%  postmaster  postgres           [.] tbm_iterate
   0,56%  postmaster  [kernel.kallsyms]  [k] __count_memcg_events.part.70
   0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
   0,52%  postmaster  postgres           [.] 0x000000000015f6e5
   0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
   0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
   0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
   0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page

This is fast

  21,13%  postmaster  postgres           [.] hash_search_with_hash_value
  15,33%  postmaster  postgres           [.] heap_page_prune_opt
  10,22%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
  10,00%  postmaster  postgres           [.] _bt_checkkeys
   6,23%  postmaster  postgres           [.] 0x00000000003c1be6
   4,94%  postmaster  postgres           [.] _bt_checkpage
   2,85%  postmaster  postgres           [.] tbm_iterate
   2,31%  postmaster  postgres           [.] nocache_index_getattr
   2,13%  postmaster  postgres           [.] pg_qsort
   1,58%  postmaster  postgres           [.] heap_hot_search_buffer
   1,58%  postmaster  postgres           [.] FunctionCall2Coll
   1,58%  postmaster  postgres           [.] 0x000000000015f6e5
   1,17%  postmaster  postgres           [.] LWLockRelease
   0,85%  postmaster  libc-2.28.so       [.] __memcmp_avx2_movbe
   0,64%  postmaster  postgres           [.] 0x00000000003e4233
   0,54%  postmaster  postgres           [.] hash_bytes
   0,53%  postmaster  postgres           [.] 0x0000000000306fbb
   0,53%  postmaster  postgres           [.] LWLockAcquire
   0,42%  postmaster  postgres           [.] 0x00000000003c1c6f
   0,42%  postmaster  postgres           [.] _bt_compare


It looks so memoization allocate lot of memory - maybe there are some temporal memory leak

 Performance counter stats for process id '4004464':

             84,26 msec task-clock                #    0,012 CPUs utilized          
                 3      context-switches          #    0,036 K/sec                  
                 0      cpu-migrations            #    0,000 K/sec                  
                19      page-faults               #    0,225 K/sec                  
                 0      cycles                    #    0,000 GHz                    
       106 873 995      instructions                                                
        20 225 431      branches                  #  240,026 M/sec                  
           348 834      branch-misses             #    1,72% of all branches        

       7,106142051 seconds time elapsed

 Performance counter stats for process id '4004464':

          1 116,97 msec task-clock                #    0,214 CPUs utilized          
                 4      context-switches          #    0,004 K/sec                  
                 0      cpu-migrations            #    0,000 K/sec                  
            99 349      page-faults               #    0,089 M/sec                  
                 0      cycles                    #    0,000 GHz                    
       478 842 411      instructions                                                
        89 495 015      branches                  #   80,123 M/sec                  
         1 014 763      branch-misses             #    1,13% of all branches        

       5,221116331 seconds time elapsed

Regards

Pavel
 



Regards

Pavel

Re: strange slow query - lost lot of time somewhere

From
Matthias van de Meent
Date:
On Mon, 2 May 2022 at 16:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> there is just shared buffers changed to 32GB and work_mem to 70MB. Unfortunately - it is in production environment
withcustomer data, so I cannot to play too much 
>>
>> This is perf of slow
>>
>>   25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
>>   11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
>>    5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
>>    5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
>>    5,15%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
>>    3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
>>    3,07%  postmaster  postgres           [.] hash_search_with_hash_value
>>    2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
>>    2,49%  postmaster  postgres           [.] heap_page_prune_opt
>>    1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
>>    1,85%  postmaster  [kernel.kallsyms]  [k] swapgs_restore_regs_and_return_to_usermode
>>    1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
>>    1,73%  postmaster  postgres           [.] _bt_checkkeys
>>    1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
>>    1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
>>    1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
>>    1,08%  postmaster  [kernel.kallsyms]  [k] percpu_ref_put_many.constprop.84
>>    1,08%  postmaster  postgres           [.] 0x00000000003c1be6
>>    1,06%  postmaster  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm.part.49
>>    0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
>>    0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
>>    0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
>>    0,61%  postmaster  postgres           [.] _bt_checkpage
>>    0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
>>    0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
>>    0,57%  postmaster  postgres           [.] tbm_iterate
>>    0,56%  postmaster  [kernel.kallsyms]  [k] __count_memcg_events.part.70
>>    0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
>>    0,52%  postmaster  postgres           [.] 0x000000000015f6e5
>>    0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
>>    0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
>>    0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
>>    0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page
>>
>> This is fast
>>
>>   21,13%  postmaster  postgres           [.] hash_search_with_hash_value
>>   15,33%  postmaster  postgres           [.] heap_page_prune_opt
>>   10,22%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
>>   10,00%  postmaster  postgres           [.] _bt_checkkeys
>>    6,23%  postmaster  postgres           [.] 0x00000000003c1be6
>>    4,94%  postmaster  postgres           [.] _bt_checkpage
>>    2,85%  postmaster  postgres           [.] tbm_iterate
>>    2,31%  postmaster  postgres           [.] nocache_index_getattr
>>    2,13%  postmaster  postgres           [.] pg_qsort
>>    1,58%  postmaster  postgres           [.] heap_hot_search_buffer
>>    1,58%  postmaster  postgres           [.] FunctionCall2Coll
>>    1,58%  postmaster  postgres           [.] 0x000000000015f6e5
>>    1,17%  postmaster  postgres           [.] LWLockRelease
>>    0,85%  postmaster  libc-2.28.so       [.] __memcmp_avx2_movbe
>>    0,64%  postmaster  postgres           [.] 0x00000000003e4233
>>    0,54%  postmaster  postgres           [.] hash_bytes
>>    0,53%  postmaster  postgres           [.] 0x0000000000306fbb
>>    0,53%  postmaster  postgres           [.] LWLockAcquire
>>    0,42%  postmaster  postgres           [.] 0x00000000003c1c6f
>>    0,42%  postmaster  postgres           [.] _bt_compare
>>
>
> It looks so memoization allocate lot of memory - maybe there are some temporal memory leak

Memoization doesn't leak memory any more than hash tables do; so I
doubt that that is the issue.

>  Performance counter stats for process id '4004464':
>
>              84,26 msec task-clock                #    0,012 CPUs utilized
>                  3      context-switches          #    0,036 K/sec
>                  0      cpu-migrations            #    0,000 K/sec
>                 19      page-faults               #    0,225 K/sec
>                  0      cycles                    #    0,000 GHz
>        106 873 995      instructions
>         20 225 431      branches                  #  240,026 M/sec
>            348 834      branch-misses             #    1,72% of all branches
>
>        7,106142051 seconds time elapsed
>

Assuming the above was for the fast query

>  Performance counter stats for process id '4004464':
>
>           1 116,97 msec task-clock                #    0,214 CPUs utilized
>                  4      context-switches          #    0,004 K/sec
>                  0      cpu-migrations            #    0,000 K/sec
>             99 349      page-faults               #    0,089 M/sec
>                  0      cycles                    #    0,000 GHz
>        478 842 411      instructions
>         89 495 015      branches                  #   80,123 M/sec
>          1 014 763      branch-misses             #    1,13% of all branches
>
>        5,221116331 seconds time elapsed

... and this for the slow one:

It seems like this system is actively swapping memory; which has a
negative impact on your system. This seems to be indicated by the high
amount of page faults and the high amount of time spent in the kernel
(as per the perf report one mail earlier). Maybe too much (work)memory
was assigned and the machine you're running on doesn't have that
amount of memory left?

Either way, seeing that so much time is spent in the kernel I don't
think that PostgreSQL is the main/only source of the slow query here,
so I don't think pgsql-hackers is the right place to continue with
this conversation.

Regards,

Matthias


PS. Maybe next time start off in
https://www.postgresql.org/list/pgsql-performance/ if you have
performance issues with unknown origin.
The wiki also has some nice tips to debug performance issues:
https://wiki.postgresql.org/wiki/Slow_Query_Questions



Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


po 2. 5. 2022 v 16:44 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
On Mon, 2 May 2022 at 16:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> po 2. 5. 2022 v 16:02 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> there is just shared buffers changed to 32GB and work_mem to 70MB. Unfortunately - it is in production environment with customer data, so I cannot to play too much
>>
>> This is perf of slow
>>
>>   25,94%  postmaster  [kernel.kallsyms]  [k] clear_page_erms
>>   11,06%  postmaster  [kernel.kallsyms]  [k] page_fault
>>    5,51%  postmaster  [kernel.kallsyms]  [k] prepare_exit_to_usermode
>>    5,18%  postmaster  [kernel.kallsyms]  [k] __list_del_entry_valid
>>    5,15%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
>>    3,99%  postmaster  [kernel.kallsyms]  [k] unmap_page_range
>>    3,07%  postmaster  postgres           [.] hash_search_with_hash_value
>>    2,73%  postmaster  [kernel.kallsyms]  [k] cgroup_throttle_swaprate
>>    2,49%  postmaster  postgres           [.] heap_page_prune_opt
>>    1,92%  postmaster  [kernel.kallsyms]  [k] try_charge
>>    1,85%  postmaster  [kernel.kallsyms]  [k] swapgs_restore_regs_and_return_to_usermode
>>    1,82%  postmaster  [kernel.kallsyms]  [k] error_entry
>>    1,73%  postmaster  postgres           [.] _bt_checkkeys
>>    1,48%  postmaster  [kernel.kallsyms]  [k] free_pcppages_bulk
>>    1,35%  postmaster  [kernel.kallsyms]  [k] get_page_from_freelist
>>    1,20%  postmaster  [kernel.kallsyms]  [k] __pagevec_lru_add_fn
>>    1,08%  postmaster  [kernel.kallsyms]  [k] percpu_ref_put_many.constprop.84
>>    1,08%  postmaster  postgres           [.] 0x00000000003c1be6
>>    1,06%  postmaster  [kernel.kallsyms]  [k] get_mem_cgroup_from_mm.part.49
>>    0,86%  postmaster  [kernel.kallsyms]  [k] __handle_mm_fault
>>    0,79%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_charge
>>    0,70%  postmaster  [kernel.kallsyms]  [k] release_pages
>>    0,61%  postmaster  postgres           [.] _bt_checkpage
>>    0,61%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
>>    0,60%  postmaster  [kernel.kallsyms]  [k] handle_mm_fault
>>    0,57%  postmaster  postgres           [.] tbm_iterate
>>    0,56%  postmaster  [kernel.kallsyms]  [k] __count_memcg_events.part.70
>>    0,55%  postmaster  [kernel.kallsyms]  [k] __mod_memcg_lruvec_state
>>    0,52%  postmaster  postgres           [.] 0x000000000015f6e5
>>    0,50%  postmaster  [kernel.kallsyms]  [k] prep_new_page
>>    0,49%  postmaster  [kernel.kallsyms]  [k] __do_page_fault
>>    0,46%  postmaster  [kernel.kallsyms]  [k] _raw_spin_lock
>>    0,44%  postmaster  [kernel.kallsyms]  [k] do_anonymous_page
>>
>> This is fast
>>
>>   21,13%  postmaster  postgres           [.] hash_search_with_hash_value
>>   15,33%  postmaster  postgres           [.] heap_page_prune_opt
>>   10,22%  postmaster  libc-2.28.so       [.] __memset_avx2_erms
>>   10,00%  postmaster  postgres           [.] _bt_checkkeys
>>    6,23%  postmaster  postgres           [.] 0x00000000003c1be6
>>    4,94%  postmaster  postgres           [.] _bt_checkpage
>>    2,85%  postmaster  postgres           [.] tbm_iterate
>>    2,31%  postmaster  postgres           [.] nocache_index_getattr
>>    2,13%  postmaster  postgres           [.] pg_qsort
>>    1,58%  postmaster  postgres           [.] heap_hot_search_buffer
>>    1,58%  postmaster  postgres           [.] FunctionCall2Coll
>>    1,58%  postmaster  postgres           [.] 0x000000000015f6e5
>>    1,17%  postmaster  postgres           [.] LWLockRelease
>>    0,85%  postmaster  libc-2.28.so       [.] __memcmp_avx2_movbe
>>    0,64%  postmaster  postgres           [.] 0x00000000003e4233
>>    0,54%  postmaster  postgres           [.] hash_bytes
>>    0,53%  postmaster  postgres           [.] 0x0000000000306fbb
>>    0,53%  postmaster  postgres           [.] LWLockAcquire
>>    0,42%  postmaster  postgres           [.] 0x00000000003c1c6f
>>    0,42%  postmaster  postgres           [.] _bt_compare
>>
>
> It looks so memoization allocate lot of memory - maybe there are some temporal memory leak

Memoization doesn't leak memory any more than hash tables do; so I
doubt that that is the issue.

>  Performance counter stats for process id '4004464':
>
>              84,26 msec task-clock                #    0,012 CPUs utilized
>                  3      context-switches          #    0,036 K/sec
>                  0      cpu-migrations            #    0,000 K/sec
>                 19      page-faults               #    0,225 K/sec
>                  0      cycles                    #    0,000 GHz
>        106 873 995      instructions
>         20 225 431      branches                  #  240,026 M/sec
>            348 834      branch-misses             #    1,72% of all branches
>
>        7,106142051 seconds time elapsed
>

Assuming the above was for the fast query

>  Performance counter stats for process id '4004464':
>
>           1 116,97 msec task-clock                #    0,214 CPUs utilized
>                  4      context-switches          #    0,004 K/sec
>                  0      cpu-migrations            #    0,000 K/sec
>             99 349      page-faults               #    0,089 M/sec
>                  0      cycles                    #    0,000 GHz
>        478 842 411      instructions
>         89 495 015      branches                  #   80,123 M/sec
>          1 014 763      branch-misses             #    1,13% of all branches
>
>        5,221116331 seconds time elapsed

... and this for the slow one:

It seems like this system is actively swapping memory; which has a
negative impact on your system. This seems to be indicated by the high
amount of page faults and the high amount of time spent in the kernel
(as per the perf report one mail earlier). Maybe too much (work)memory
was assigned and the machine you're running on doesn't have that
amount of memory left?

This computer has 354GB RAM, and there is 63GB RAM free (unused memory)



Either way, seeing that so much time is spent in the kernel I don't
think that PostgreSQL is the main/only source of the slow query here,
so I don't think pgsql-hackers is the right place to continue with
this conversation.

I can see this issue only when Memoize is enabled. So it looks like a Postgres issue. 400MB of work mem is not too much.






 

Regards,

Matthias


PS. Maybe next time start off in
https://www.postgresql.org/list/pgsql-performance/ if you have
performance issues with unknown origin.
The wiki also has some nice tips to debug performance issues:
https://wiki.postgresql.org/wiki/Slow_Query_Questions

Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I found a query that is significantly slower with more memory

Can you clarify what you mean here?  More memory was installed on the
machine? or work_mem was increased? or?

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw

If it was work_mem you increased, it seems strange that the plan would
switch over to using a Nested Loop / Memoize plan.  Only 91 rows are
estimated on the outer side of the join. It's hard to imagine that
work_mem was so low that the Memoize costing code thought there would
ever be cache evictions.

> Strange - the time of last row is +/- same, but execution time is 10x worse
>
> It looks like slow environment cleaning

Can you also show EXPLAIN for the Memoize plan without ANALYZE?

Does the slowness present every time that plan is executed?

Can you show the EXPLAIN ANALYZE of the nested loop plan with
enable_memoize = off?  You may ned to disable hash and merge join.

David



Re: strange slow query - lost lot of time somewhere

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I found a query that is significantly slower with more memory

> If it was work_mem you increased, it seems strange that the plan would
> switch over to using a Nested Loop / Memoize plan.

Yeah, there's something unexplained there.

I think that the most probable explanation for the symptoms is that
cost_memoize_rescan is computing some insane value for est_entries,
causing ExecInitMemoize to allocate-and-zero a huge hash table,
which ExecEndMemoize then frees again.  Neither of those steps
gets counted into any plan node's runtime, but EXPLAIN's total
execution time will include them.  An insane value for est_entries
could perhaps go along with a cost misestimate that convinces the
planner to include the memoize even though it seems pointless.

I spent some time studying cost_memoize_rescan, and the only
conclusions I arrived at were that the variable names are poorly
chosen and the comments are unhelpful.  For instance, one would
think that est_entry_bytes is the expected size of one cache entry,
but it seems to actually be the total space that would be occupied
if the whole input relation were loaded into the cache.  And
the est_cache_entries computation seems nonsensical; if it does
make sense, the comment sure doesn't illuminate why.  So I am
quite prepared to buy into the idea that cost_memoize_rescan is
producing bogus answers, but it's hard to tell what it's coming out
with in this example.  Too bad EXPLAIN doesn't print est_entries.

            regards, tom lane



Re: strange slow query - lost lot of time somewhere

From
"David G. Johnston"
Date:
On Mon, May 2, 2022 at 4:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I found a query that is significantly slower with more memory

> If it was work_mem you increased, it seems strange that the plan would
> switch over to using a Nested Loop / Memoize plan.

Yeah, there's something unexplained there.

I spent some time studying cost_memoize_rescan, and the only
conclusions I arrived at were that the variable names are poorly
chosen and the comments are unhelpful.  For instance, one would
think that est_entry_bytes is the expected size of one cache entry,
but it seems to actually be the total space that would be occupied
if the whole input relation were loaded into the cache.
  And
the est_cache_entries computation seems nonsensical; if it does
make sense, the comment sure doesn't illuminate why.

My take on this is that a cache entry is keyed by a parameterization and any given entry can have, at most, every tuple saved into it (hence the computation of tuples*per-tuple-size).  So the maximum number of hash keys is the total available memory divided by input relation size.  This upper bound is stored in est_cache_entries.  If the number of unique parameterizations expected (at worst one-per-call) is less than this we use that value and never evict.  If it is more we use the est_cache_entries and plan to evict.

What I'm expecting to find but don't see is that by definition each unique parameterization must positively match a unique subset of the input relation tuples.  If we remember only those tuples that matched then at no point should the total memory for the hash table exceed the size of the input relation.

Now, I'm not completely confident the cache only holds matched tuples...but if so:

From that the mpath->est_entries should be "min(hash_mem_bytes / est_entry_bytes, 1.0) * ndistinct"
i.e., all groups or a fractional subset based upon available hash memory

Then:

ndistinct = estimate_num_groups() || calls
retention_ratio = min(hash_mem_bytes / est_entry_bytes, 1.0)
est_entries = retention_ratio * ndistinct
evict_ratio = 1.0 - retention_ratio

hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0
I don't understand the adjustment factor ndistinct/calls

evictions total cost adjustment also assumes we are evicting all tuples as opposed to tuples/est_entries

This is a "rescan" so aside from cache management isn't the cost of originally populating the cache already accounted for elsewhere?

David J.

Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Tue, 3 May 2022 at 11:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >> I found a query that is significantly slower with more memory
>
> > If it was work_mem you increased, it seems strange that the plan would
> > switch over to using a Nested Loop / Memoize plan.
>
> Yeah, there's something unexplained there.
>
> I think that the most probable explanation for the symptoms is that
> cost_memoize_rescan is computing some insane value for est_entries,
> causing ExecInitMemoize to allocate-and-zero a huge hash table,
> which ExecEndMemoize then frees again.  Neither of those steps
> gets counted into any plan node's runtime, but EXPLAIN's total
> execution time will include them.  An insane value for est_entries
> could perhaps go along with a cost misestimate that convinces the
> planner to include the memoize even though it seems pointless.

That seems pretty unlikely to me. est_entries is based on the minimum
value of the expected number of total cache entries and the ndistinct
value. ndistinct cannot be insane here as ndistinct is never going to
be higher than the number of calls, which is the row estimate from the
outer side of the join. That's 91 in both cases here.   As far as I
can see, that's just going to make a table of 128 buckets.

See estimate_num_groups_incremental() at:

/*
* We don't ever want to return an estimate of zero groups, as that tends
* to lead to division-by-zero and other unpleasantness.  The input_rows
* estimate is usually already at least 1, but clamp it just in case it
* isn't.
*/
input_rows = clamp_row_est(input_rows);


> I spent some time studying cost_memoize_rescan, and the only
> conclusions I arrived at were that the variable names are poorly
> chosen and the comments are unhelpful.  For instance, one would
> think that est_entry_bytes is the expected size of one cache entry,
> but it seems to actually be the total space that would be occupied
> if the whole input relation were loaded into the cache.

I think you've misunderstood. It *is* the estimated size of a single
entry. I think you might be going wrong in assuming "tuples" is the
expected tuples from all rescans of the inner side of the join. It's
actually from a single scan.  I can add a comment there to help make
that clear.

> And
> the est_cache_entries computation seems nonsensical; if it does
> make sense, the comment sure doesn't illuminate why.  So I am
> quite prepared to buy into the idea that cost_memoize_rescan is
> producing bogus answers, but it's hard to tell what it's coming out
> with in this example.  Too bad EXPLAIN doesn't print est_entries.

I'm wishing I put the initial hash table size and the final hash table
size in EXPLAIN + EXPLAIN ANALYZE now. Perhaps it's not too late for
v15 to do that so that it might help us figure things out in the
future.

I'm open to making improvements to the comments in that area. I do
remember spending quite a bit of time trying to make things as clear
as possible as it is fairly complex what's going on there.

David



Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Tue, 3 May 2022 at 13:43, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0
> I don't understand the adjustment factor ndistinct/calls

I've attached a spreadsheet showing you the impact of subtracting
(ndistinct / calls).  What this is correcting for is the fact that the
first scan from each unique value is a cache miss.  The more calls we
have, the more hits we'll get.  If there was only 1 call per distinct
value then there'd never be any hits. Without subtracting (ndistinct /
calls) and assuming there's space in the cache for each ndistinct
value, we'd assume 100% cache hit ratio if calls == ndistinct.  What
we should assume in that case is a 0% hit ratio as the first scan for
each distinct parameter must always be a miss as we've never had a
chance to cache any tuples for it yet.

> This is a "rescan" so aside from cache management isn't the cost of originally populating the cache already accounted
forelsewhere?
 

The cost of the first scan is calculated in create_memoize_path().
Since the first scan will always be a cache miss, the code there just
adds some cache management surcharges. Namely:

/*
* Add a small additional charge for caching the first entry.  All the
* harder calculations for rescans are performed in cost_memoize_rescan().
*/
pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;

David

Attachment

Re: strange slow query - lost lot of time somewhere

From
"David G. Johnston"
Date:
On Mon, May 2, 2022 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 3 May 2022 at 11:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >> I found a query that is significantly slower with more memory
>
> > If it was work_mem you increased, it seems strange that the plan would
> > switch over to using a Nested Loop / Memoize plan.
>
> Yeah, there's something unexplained there.
>
> I think that the most probable explanation for the symptoms is that
> cost_memoize_rescan is computing some insane value for est_entries,
> causing ExecInitMemoize to allocate-and-zero a huge hash table,
> which ExecEndMemoize then frees again.  Neither of those steps
> gets counted into any plan node's runtime, but EXPLAIN's total
> execution time will include them.  An insane value for est_entries
> could perhaps go along with a cost misestimate that convinces the
> planner to include the memoize even though it seems pointless.

That seems pretty unlikely to me. est_entries is based on the minimum
value of the expected number of total cache entries and the ndistinct
value. ndistinct cannot be insane here as ndistinct is never going to
be higher than the number of calls, which is the row estimate from the
outer side of the join. That's 91 in both cases here.   As far as I
can see, that's just going to make a table of 128 buckets.

If est_entries goes to zero due to hash_mem_bytes/est_entry_bytes < 1 (hence floor takes it to zero) the executor will use a size value of 1024 instead in build_hash_table.

That seems unlikely but there is no data to support or refute it.


I'm open to making improvements to the comments in that area. I do
remember spending quite a bit of time trying to make things as clear
as possible as it is fairly complex what's going on there.


A few more intermediate calculation variables, along with descriptions, would help.

e.g., min(est_cache_entries, ndistinct) is repeated twice after its initial definition.

retention_ratio per my other reply

The (ndistinct/calls) part of hit_ratio being described specifically.

David J.

Re: strange slow query - lost lot of time somewhere

From
"David G. Johnston"
Date:
On Mon, May 2, 2022 at 7:30 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 3 May 2022 at 13:43, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0
> I don't understand the adjustment factor ndistinct/calls

I've attached a spreadsheet showing you the impact of subtracting
(ndistinct / calls).  What this is correcting for is the fact that the
first scan from each unique value is a cache miss.  The more calls we
have, the more hits we'll get.  If there was only 1 call per distinct
value then there'd never be any hits. Without subtracting (ndistinct /
calls) and assuming there's space in the cache for each ndistinct
value, we'd assume 100% cache hit ratio if calls == ndistinct.  What
we should assume in that case is a 0% hit ratio as the first scan for
each distinct parameter must always be a miss as we've never had a
chance to cache any tuples for it yet.


Thank you.  I understand the theory and agree with it - but the math doesn't seem to be working out.

Plugging in:
n = 2,000
e = 500
c = 10,000

proper = 5%
incorrect = 25%

But of the 10,000 calls we will receive, the first 2,000 will be misses while 2,000 of the remaining 8,000 will be hits, due to sharing 2,000 distinct groups among the available inventory of 500 (25% of 8,000 is 2,000).  2,000 hits in 10,000 calls yields 20%.

I believe the correct formula to be:

((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio
.80 * .25 = .20

First we recognize that our hit ratio can be at most c-n/c since n misses are guaranteed.  We take that ratio and scale it by our cache efficiency since of the remaining hits that fraction will turn into misses due to the relevant cache not being in memory.

David J.

Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Tue, 3 May 2022 at 15:22, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Plugging in:
> n = 2,000
> e = 500
> c = 10,000
>
> proper = 5%
> incorrect = 25%
>
> But of the 10,000 calls we will receive, the first 2,000 will be misses while 2,000 of the remaining 8,000 will be
hits,due to sharing 2,000 distinct groups among the available inventory of 500 (25% of 8,000 is 2,000).  2,000 hits in
10,000calls yields 20%. 
>
> I believe the correct formula to be:
>
> ((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio
> .80 * .25 = .20

I think you're correct here.  The formula should be that.  However,
two things; 1) this being incorrect is not the cause of the original
problem reported on this thread, and 2) There's just no way we could
consider fixing this in v15, let alone back patch it to v14.

Maybe we should open a new thread about this and put an entry in the
first CF for v16 under bugs and come back to it after we branch.
Thinking the cache hit ratio is lower than it actually is going to be
will reduce the chances of the planner switching to a Nested Loop /
Memoize plan vs a Hash or Merge Join plan.

I was already fairly concerned that Memoize could cause performance
regressions when the ndistinct value or expected cache entry size is
underestimated or the outer side rows are overestimated.  What I've
got to calculate the cache hit ratio does seem incorrect given what
you're showing, however it does add an element of pessimism and
reduces the chances of a bad plan being picked when work_mem is too
low to cache all entries.  Swapping it out for your formula seems like
it would increase the chances of a Memoize plan being picked when the
row, ndistinct and cache entry size estimates are correct, however, it
could also increase the chance of a bad plan when being picked in
cases where the estimates are incorrect.

My problem with changing this now would be that we already often
perform Nested Loop joins when a Hash or Merge join would be a better
option. I'd hate to take us in a direction where we make that problem
even worse.

David



Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


po 2. 5. 2022 v 23:48 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I found a query that is significantly slower with more memory

Can you clarify what you mean here?  More memory was installed on the
machine? or work_mem was increased? or?

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw

If it was work_mem you increased, it seems strange that the plan would
switch over to using a Nested Loop / Memoize plan.  Only 91 rows are
estimated on the outer side of the join. It's hard to imagine that
work_mem was so low that the Memoize costing code thought there would
ever be cache evictions.

> Strange - the time of last row is +/- same, but execution time is 10x worse
>
> It looks like slow environment cleaning

Can you also show EXPLAIN for the Memoize plan without ANALYZE?

yes - it is strange - it is just slow without execution

 ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                                  QUERY PLAN                                                                                                  │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Anti Join  (cost=59.62..3168.15 rows=1 width=16)                                                                                                                                                 │
│   ->  Nested Loop Anti Join  (cost=59.34..3165.24 rows=1 width=16)                                                                                                                                           │
│         ->  Nested Loop Semi Join  (cost=58.48..3133.09 rows=1 width=16)                                                                                                                                     │
│               ->  Bitmap Heap Scan on item itembo0_  (cost=57.34..2708.22 rows=11 width=16)                                                                                                                  │
│                     Recheck Cond: ((ending_time IS NULL) OR ((status_id = ANY ('{1,4,5}'::bigint[])) AND (CURRENT_TIMESTAMP < ending_time) AND (starting_time <= CURRENT_TIMESTAMP)))                        │
│                     Filter: ((to_expose_date IS NULL) AND (status_id = ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP) AND ((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time))) │
│                     ->  BitmapOr  (cost=57.34..57.34 rows=1751 width=0)                                                                                                                                      │
│                           ->  Bitmap Index Scan on stehule354  (cost=0.00..2.08 rows=1 width=0)                                                                                                              │
│                                 Index Cond: (ending_time IS NULL)                                                                                                                                            │
│                           ->  Bitmap Index Scan on stehule1010  (cost=0.00..55.26 rows=1751 width=0)                                                                                                         │
│                                 Index Cond: ((status_id = ANY ('{1,4,5}'::bigint[])) AND (ending_time > CURRENT_TIMESTAMP) AND (starting_time <= CURRENT_TIMESTAMP))                                         │
│               ->  Nested Loop  (cost=1.14..37.71 rows=91 width=8)                                                                                                                                            │
│                     ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2__1  (cost=0.57..3.80 rows=91 width=16)                                                               │
│                           Index Cond: (item_id = itembo0_.id)                                                                                                                                                │
│                     ->  Memoize  (cost=0.57..2.09 rows=1 width=8)                                                                                                                                            │
│                           Cache Key: itemsharei2__1.item_image_id                                                                                                                                            │
│                           Cache Mode: logical                                                                                                                                                                │
│                           ->  Index Only Scan using pk_item_image on item_image itemimageb3__1  (cost=0.56..2.08 rows=1 width=8)                                                                             │
│                                 Index Cond: (id = itemsharei2__1.item_image_id)                                                                                                                              │
│         ->  Nested Loop  (cost=0.85..32.14 rows=1 width=8)                                                                                                                                                   │
│               ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2_  (cost=0.57..3.80 rows=91 width=16)                                                                       │
│                     Index Cond: (item_id = itembo0_.id)                                                                                                                                                      │
│               ->  Memoize  (cost=0.29..1.72 rows=1 width=8)                                                                                                                                                  │
│                     Cache Key: itemsharei2_.item_image_id                                                                                                                                                    │
│                     Cache Mode: logical                                                                                                                                                                      │
│                     ->  Index Only Scan using stehule3001 on item_image itemimageb3_  (cost=0.28..1.71 rows=1 width=8)                                                                                       │
│                           Index Cond: (id = itemsharei2_.item_image_id)                                                                                                                                      │
│   ->  Index Only Scan using ixfk_ima_itemid on item_missing_attribute itemmissin1_  (cost=0.28..1.66 rows=1 width=8)                                                                                         │
│         Index Cond: (item_id = itembo0_.id)                                                                                                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(29 řádek)

Čas: 420,392 ms



Does the slowness present every time that plan is executed?

looks yes
 

Can you show the EXPLAIN ANALYZE of the nested loop plan with
enable_memoize = off?  You may ned to disable hash and merge join.

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                           QUERY PLAN                                                                                            │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Anti Join  (cost=1093.22..4488.89 rows=1 width=16) (actual time=5.723..60.470 rows=13 loops=1)                                                                                      │
│   ->  Nested Loop Anti Join  (cost=1092.94..4485.97 rows=1 width=16) (actual time=5.165..60.368 rows=41 loops=1)                                                                                │
│         ->  Gather  (cost=1001.70..4391.26 rows=1 width=16) (actual time=1.909..56.913 rows=41 loops=1)                                                                                         │
│               Workers Planned: 2                                                                                                                                                                │
│               Workers Launched: 2                                                                                                                                                               │
│               ->  Nested Loop Semi Join  (cost=1.70..3391.16 rows=1 width=16) (actual time=22.032..39.253 rows=14 loops=3)                                                                      │
│                     ->  Parallel Index Only Scan using stehule1010 on item itembo0_  (cost=0.57..2422.83 rows=5 width=16) (actual time=21.785..38.851 rows=14 loops=3)                          │
│                           Index Cond: ((status_id = ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP))                                                                        │
│                           Filter: ((to_expose_date IS NULL) AND ((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time)))                                                                   │
│                           Rows Removed by Filter: 1589                                                                                                                                          │
│                           Heap Fetches: 21                                                                                                                                                      │
│                     ->  Nested Loop  (cost=1.13..192.76 rows=91 width=8) (actual time=0.029..0.029 rows=1 loops=41)                                                                             │
│                           ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2__1  (cost=0.57..3.80 rows=91 width=16) (actual time=0.015..0.015 rows=1 loops=41) │
│                                 Index Cond: (item_id = itembo0_.id)                                                                                                                             │
│                                 Heap Fetches: 2                                                                                                                                                 │
│                           ->  Index Only Scan using pk_item_image on item_image itemimageb3__1  (cost=0.56..2.08 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=41)                     │
│                                 Index Cond: (id = itemsharei2__1.item_image_id)                                                                                                                 │
│                                 Heap Fetches: 2                                                                                                                                                 │
│         ->  Hash Join  (cost=91.24..94.71 rows=1 width=8) (actual time=0.084..0.084 rows=0 loops=41)                                                                                            │
│               Hash Cond: (itemsharei2_.item_image_id = itemimageb3_.id)                                                                                                                         │
│               ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2_  (cost=0.57..3.80 rows=91 width=16) (actual time=0.003..0.004 rows=6 loops=41)               │
│                     Index Cond: (item_id = itembo0_.id)                                                                                                                                         │
│                     Heap Fetches: 2                                                                                                                                                             │
│               ->  Hash  (cost=67.41..67.41 rows=1861 width=8) (actual time=3.213..3.214 rows=3950 loops=1)                                                                                      │
│                     Buckets: 4096 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 187kB                                                                                             │
│                     ->  Index Only Scan using stehule3001 on item_image itemimageb3_  (cost=0.28..67.41 rows=1861 width=8) (actual time=0.029..2.479 rows=3950 loops=1)                         │
│                           Heap Fetches: 2203                                                                                                                                                    │
│   ->  Index Only Scan using ixfk_ima_itemid on item_missing_attribute itemmissin1_  (cost=0.28..1.66 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=41)                                 │
│         Index Cond: (item_id = itembo0_.id)                                                                                                                                                     │
│         Heap Fetches: 0                                                                                                                                                                         │
│ Planning Time: 1.471 ms                                                                                                                                                                         │
│ Execution Time: 60.570 ms                                                                                                                                                                       │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(32 řádek)

Čas: 62,982 ms
 

David

Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


út 3. 5. 2022 v 6:09 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


po 2. 5. 2022 v 23:48 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I found a query that is significantly slower with more memory

Can you clarify what you mean here?  More memory was installed on the
machine? or work_mem was increased? or?

> plan 1 - fast https://explain.depesz.com/s/XM1f
>
> plan 2 - slow https://explain.depesz.com/s/2rBw

If it was work_mem you increased, it seems strange that the plan would
switch over to using a Nested Loop / Memoize plan.  Only 91 rows are
estimated on the outer side of the join. It's hard to imagine that
work_mem was so low that the Memoize costing code thought there would
ever be cache evictions.

> Strange - the time of last row is +/- same, but execution time is 10x worse
>
> It looks like slow environment cleaning

Can you also show EXPLAIN for the Memoize plan without ANALYZE?

yes - it is strange - it is just slow without execution

 ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                                  QUERY PLAN                                                                                                  │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Anti Join  (cost=59.62..3168.15 rows=1 width=16)                                                                                                                                                 │
│   ->  Nested Loop Anti Join  (cost=59.34..3165.24 rows=1 width=16)                                                                                                                                           │
│         ->  Nested Loop Semi Join  (cost=58.48..3133.09 rows=1 width=16)                                                                                                                                     │
│               ->  Bitmap Heap Scan on item itembo0_  (cost=57.34..2708.22 rows=11 width=16)                                                                                                                  │
│                     Recheck Cond: ((ending_time IS NULL) OR ((status_id = ANY ('{1,4,5}'::bigint[])) AND (CURRENT_TIMESTAMP < ending_time) AND (starting_time <= CURRENT_TIMESTAMP)))                        │
│                     Filter: ((to_expose_date IS NULL) AND (status_id = ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP) AND ((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time))) │
│                     ->  BitmapOr  (cost=57.34..57.34 rows=1751 width=0)                                                                                                                                      │
│                           ->  Bitmap Index Scan on stehule354  (cost=0.00..2.08 rows=1 width=0)                                                                                                              │
│                                 Index Cond: (ending_time IS NULL)                                                                                                                                            │
│                           ->  Bitmap Index Scan on stehule1010  (cost=0.00..55.26 rows=1751 width=0)                                                                                                         │
│                                 Index Cond: ((status_id = ANY ('{1,4,5}'::bigint[])) AND (ending_time > CURRENT_TIMESTAMP) AND (starting_time <= CURRENT_TIMESTAMP))                                         │
│               ->  Nested Loop  (cost=1.14..37.71 rows=91 width=8)                                                                                                                                            │
│                     ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2__1  (cost=0.57..3.80 rows=91 width=16)                                                               │
│                           Index Cond: (item_id = itembo0_.id)                                                                                                                                                │
│                     ->  Memoize  (cost=0.57..2.09 rows=1 width=8)                                                                                                                                            │
│                           Cache Key: itemsharei2__1.item_image_id                                                                                                                                            │
│                           Cache Mode: logical                                                                                                                                                                │
│                           ->  Index Only Scan using pk_item_image on item_image itemimageb3__1  (cost=0.56..2.08 rows=1 width=8)                                                                             │
│                                 Index Cond: (id = itemsharei2__1.item_image_id)                                                                                                                              │
│         ->  Nested Loop  (cost=0.85..32.14 rows=1 width=8)                                                                                                                                                   │
│               ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2_  (cost=0.57..3.80 rows=91 width=16)                                                                       │
│                     Index Cond: (item_id = itembo0_.id)                                                                                                                                                      │
│               ->  Memoize  (cost=0.29..1.72 rows=1 width=8)                                                                                                                                                  │
│                     Cache Key: itemsharei2_.item_image_id                                                                                                                                                    │
│                     Cache Mode: logical                                                                                                                                                                      │
│                     ->  Index Only Scan using stehule3001 on item_image itemimageb3_  (cost=0.28..1.71 rows=1 width=8)                                                                                       │
│                           Index Cond: (id = itemsharei2_.item_image_id)                                                                                                                                      │
│   ->  Index Only Scan using ixfk_ima_itemid on item_missing_attribute itemmissin1_  (cost=0.28..1.66 rows=1 width=8)                                                                                         │
│         Index Cond: (item_id = itembo0_.id)                                                                                                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(29 řádek)

Čas: 420,392 ms

there is really something strange (see attached file). Looks so this issue is much more related to planning time than execution time

 



Does the slowness present every time that plan is executed?

looks yes
 

Can you show the EXPLAIN ANALYZE of the nested loop plan with
enable_memoize = off?  You may ned to disable hash and merge join.

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                           QUERY PLAN                                                                                            │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Anti Join  (cost=1093.22..4488.89 rows=1 width=16) (actual time=5.723..60.470 rows=13 loops=1)                                                                                      │
│   ->  Nested Loop Anti Join  (cost=1092.94..4485.97 rows=1 width=16) (actual time=5.165..60.368 rows=41 loops=1)                                                                                │
│         ->  Gather  (cost=1001.70..4391.26 rows=1 width=16) (actual time=1.909..56.913 rows=41 loops=1)                                                                                         │
│               Workers Planned: 2                                                                                                                                                                │
│               Workers Launched: 2                                                                                                                                                               │
│               ->  Nested Loop Semi Join  (cost=1.70..3391.16 rows=1 width=16) (actual time=22.032..39.253 rows=14 loops=3)                                                                      │
│                     ->  Parallel Index Only Scan using stehule1010 on item itembo0_  (cost=0.57..2422.83 rows=5 width=16) (actual time=21.785..38.851 rows=14 loops=3)                          │
│                           Index Cond: ((status_id = ANY ('{1,4,5}'::bigint[])) AND (starting_time <= CURRENT_TIMESTAMP))                                                                        │
│                           Filter: ((to_expose_date IS NULL) AND ((ending_time IS NULL) OR (CURRENT_TIMESTAMP < ending_time)))                                                                   │
│                           Rows Removed by Filter: 1589                                                                                                                                          │
│                           Heap Fetches: 21                                                                                                                                                      │
│                     ->  Nested Loop  (cost=1.13..192.76 rows=91 width=8) (actual time=0.029..0.029 rows=1 loops=41)                                                                             │
│                           ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2__1  (cost=0.57..3.80 rows=91 width=16) (actual time=0.015..0.015 rows=1 loops=41) │
│                                 Index Cond: (item_id = itembo0_.id)                                                                                                                             │
│                                 Heap Fetches: 2                                                                                                                                                 │
│                           ->  Index Only Scan using pk_item_image on item_image itemimageb3__1  (cost=0.56..2.08 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=41)                     │
│                                 Index Cond: (id = itemsharei2__1.item_image_id)                                                                                                                 │
│                                 Heap Fetches: 2                                                                                                                                                 │
│         ->  Hash Join  (cost=91.24..94.71 rows=1 width=8) (actual time=0.084..0.084 rows=0 loops=41)                                                                                            │
│               Hash Cond: (itemsharei2_.item_image_id = itemimageb3_.id)                                                                                                                         │
│               ->  Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2_  (cost=0.57..3.80 rows=91 width=16) (actual time=0.003..0.004 rows=6 loops=41)               │
│                     Index Cond: (item_id = itembo0_.id)                                                                                                                                         │
│                     Heap Fetches: 2                                                                                                                                                             │
│               ->  Hash  (cost=67.41..67.41 rows=1861 width=8) (actual time=3.213..3.214 rows=3950 loops=1)                                                                                      │
│                     Buckets: 4096 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 187kB                                                                                             │
│                     ->  Index Only Scan using stehule3001 on item_image itemimageb3_  (cost=0.28..67.41 rows=1861 width=8) (actual time=0.029..2.479 rows=3950 loops=1)                         │
│                           Heap Fetches: 2203                                                                                                                                                    │
│   ->  Index Only Scan using ixfk_ima_itemid on item_missing_attribute itemmissin1_  (cost=0.28..1.66 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=41)                                 │
│         Index Cond: (item_id = itembo0_.id)                                                                                                                                                     │
│         Heap Fetches: 0                                                                                                                                                                         │
│ Planning Time: 1.471 ms                                                                                                                                                                         │
│ Execution Time: 60.570 ms                                                                                                                                                                       │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(32 řádek)

Čas: 62,982 ms
 

David
Attachment

Re: strange slow query - lost lot of time somewhere

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> there is really something strange (see attached file). Looks so this issue
> is much more related to planning time than execution time

You sure there's not something taking an exclusive lock on one of these
tables every so often?

            regards, tom lane



Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


út 3. 5. 2022 v 6:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> there is really something strange (see attached file). Looks so this issue
> is much more related to planning time than execution time

You sure there's not something taking an exclusive lock on one of these
tables every so often?

I am almost sure, I can see this issue only every time when I set a higher work mem. I don't see this issue in other cases.

Regards

Pavel

 

                        regards, tom lane

Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Tue, 3 May 2022 at 17:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 3. 5. 2022 v 6:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> You sure there's not something taking an exclusive lock on one of these
>> tables every so often?
>
> I am almost sure, I can see this issue only every time when I set a higher work mem. I don't see this issue in other
cases.

hmm, I don't think the query being blocked on a table lock would cause
this behaviour. As far as I know, all table locks should be obtained
before the timer starts for the "Execution Time" timer in EXPLAIN
ANALYZE.  However, locks are obtained on indexes at executor startup,
so if there was some delay in obtaining a lock on the index it would
show up this way.  I just don't know of anything that obtains a
conflicting lock on an index without the same conflicting lock on the
table that the index belongs to.

I do agree that the perf report does indicate that the extra time is
taken due to some large amount of memory being allocated. I just can't
quite see how that would happen in Memoize given that
estimate_num_groups() clamps the distinct estimate as the number of
input rows, which is 91 in both cases in your problem query.

Are you able to run the Memoize query in psql with \watch 0.1 for a
few seconds while you do:

perf record --call-graph dwarf --pid <pid> sleep 2

then send along the perf report.

I locally hacked build_hash_table() in nodeMemoize.c to make the
hashtable 100 million elements and I see my perf report for a trivial
Memoize query come up as:

+   99.98%     0.00%  postgres  postgres           [.] _start
+   99.98%     0.00%  postgres  libc.so.6          [.]
__libc_start_main_alias_2 (inlined)
+   99.98%     0.00%  postgres  libc.so.6          [.] __libc_start_call_main
+   99.98%     0.00%  postgres  postgres           [.] main
+   99.98%     0.00%  postgres  postgres           [.] PostmasterMain
+   99.98%     0.00%  postgres  postgres           [.] ServerLoop
+   99.98%     0.00%  postgres  postgres           [.] BackendStartup (inlined)
+   99.98%     0.00%  postgres  postgres           [.] BackendRun (inlined)
+   99.98%     0.00%  postgres  postgres           [.] PostgresMain
+   99.98%     0.00%  postgres  postgres           [.] exec_simple_query
+   99.98%     0.00%  postgres  postgres           [.] PortalRun
+   99.98%     0.00%  postgres  postgres           [.] FillPortalStore
+   99.98%     0.00%  postgres  postgres           [.] PortalRunUtility
+   99.98%     0.00%  postgres  postgres           [.] standard_ProcessUtility
+   99.98%     0.00%  postgres  postgres           [.] ExplainQuery
+   99.98%     0.00%  postgres  postgres           [.] ExplainOneQuery
+   99.95%     0.00%  postgres  postgres           [.] ExplainOnePlan
+   87.87%     0.00%  postgres  postgres           [.] standard_ExecutorStart
+   87.87%     0.00%  postgres  postgres           [.] InitPlan (inlined)
+   87.87%     0.00%  postgres  postgres           [.] ExecInitNode
+   87.87%     0.00%  postgres  postgres           [.] ExecInitNestLoop
+   87.87%     0.00%  postgres  postgres           [.] ExecInitMemoize
+   87.87%     0.00%  postgres  postgres           [.]
build_hash_table (inlined) <----
+   87.87%     0.00%  postgres  postgres           [.] memoize_create (inlined)
+   87.87%     0.00%  postgres  postgres           [.]
memoize_allocate (inlined)
+   87.87%     0.00%  postgres  postgres           [.]
MemoryContextAllocExtended
+   87.87%     0.00%  postgres  postgres           [.] memset (inlined)

Failing that, are you able to pg_dump these tables and load them into
a PostgreSQL instance that you can play around with and patch?
Provided you can actually recreate the problem on that instance.

David



RE: strange slow query - lost lot of time somewhere

From
Jakub Wartak
Date:
> I do agree that the perf report does indicate that the extra time is taken due to
> some large amount of memory being allocated. I just can't quite see how that
> would happen in Memoize given that
> estimate_num_groups() clamps the distinct estimate as the number of input
> rows, which is 91 in both cases in your problem query.
> 
> Are you able to run the Memoize query in psql with \watch 0.1 for a few seconds
> while you do:
> 
> perf record --call-graph dwarf --pid <pid> sleep 2
> 
> then send along the perf report.
> 
> I locally hacked build_hash_table() in nodeMemoize.c to make the hashtable 100
> million elements and I see my perf report for a trivial Memoize query come up
> as:
> 
[..]
> 
> Failing that, are you able to pg_dump these tables and load them into a
> PostgreSQL instance that you can play around with and patch?
> Provided you can actually recreate the problem on that instance.
> 

+1 to what David says, we need a reproducer. In [1] Pavel wrote that he's having a lot of clear_page_erms(), so maybe
thiswill be a little help: I recall having similar issue having a lot of minor page faults and high %sys when raising
work_mem.For me it was different issue some time ago, but it was something like build_hash_table() being used by UNION
recursivecalls -> BuildTupleHashTable() -> .. malloc() -> mmap64().  When mmap() is issued with MAP_ANONYMOUS the
kernelwill zero out the memory (more memory -> potentially bigger CPU waste visible as minor page faults; erms stands
for"Enhanced REP MOVSB/STOSB"; this is on kernel side). The culprit was planner allocating something that wouldn't be
usedlater.
 

Additional three ways to figure that one (all are IMHO production safe):
a) already mentioned perf with --call-graph dwarf -p PID
b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having MAP_ANONYMOUS flag, size of mmap() request will
somehowmatch work_mem sizing
 
c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check MAP_ANONYMOUS as above

[1] - https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com

-J.

Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


st 4. 5. 2022 v 2:15 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Tue, 3 May 2022 at 17:02, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 3. 5. 2022 v 6:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> You sure there's not something taking an exclusive lock on one of these
>> tables every so often?
>
> I am almost sure, I can see this issue only every time when I set a higher work mem. I don't see this issue in other cases.

hmm, I don't think the query being blocked on a table lock would cause
this behaviour. As far as I know, all table locks should be obtained
before the timer starts for the "Execution Time" timer in EXPLAIN
ANALYZE.  However, locks are obtained on indexes at executor startup,
so if there was some delay in obtaining a lock on the index it would
show up this way.  I just don't know of anything that obtains a
conflicting lock on an index without the same conflicting lock on the
table that the index belongs to.

I do agree that the perf report does indicate that the extra time is
taken due to some large amount of memory being allocated. I just can't
quite see how that would happen in Memoize given that
estimate_num_groups() clamps the distinct estimate as the number of
input rows, which is 91 in both cases in your problem query.

Are you able to run the Memoize query in psql with \watch 0.1 for a
few seconds while you do:

perf record --call-graph dwarf --pid <pid> sleep 2

then send along the perf report.

I locally hacked build_hash_table() in nodeMemoize.c to make the
hashtable 100 million elements and I see my perf report for a trivial
Memoize query come up as:

+   99.98%     0.00%  postgres  postgres           [.] _start
+   99.98%     0.00%  postgres  libc.so.6          [.]
__libc_start_main_alias_2 (inlined)
+   99.98%     0.00%  postgres  libc.so.6          [.] __libc_start_call_main
+   99.98%     0.00%  postgres  postgres           [.] main
+   99.98%     0.00%  postgres  postgres           [.] PostmasterMain
+   99.98%     0.00%  postgres  postgres           [.] ServerLoop
+   99.98%     0.00%  postgres  postgres           [.] BackendStartup (inlined)
+   99.98%     0.00%  postgres  postgres           [.] BackendRun (inlined)
+   99.98%     0.00%  postgres  postgres           [.] PostgresMain
+   99.98%     0.00%  postgres  postgres           [.] exec_simple_query
+   99.98%     0.00%  postgres  postgres           [.] PortalRun
+   99.98%     0.00%  postgres  postgres           [.] FillPortalStore
+   99.98%     0.00%  postgres  postgres           [.] PortalRunUtility
+   99.98%     0.00%  postgres  postgres           [.] standard_ProcessUtility
+   99.98%     0.00%  postgres  postgres           [.] ExplainQuery
+   99.98%     0.00%  postgres  postgres           [.] ExplainOneQuery
+   99.95%     0.00%  postgres  postgres           [.] ExplainOnePlan
+   87.87%     0.00%  postgres  postgres           [.] standard_ExecutorStart
+   87.87%     0.00%  postgres  postgres           [.] InitPlan (inlined)
+   87.87%     0.00%  postgres  postgres           [.] ExecInitNode
+   87.87%     0.00%  postgres  postgres           [.] ExecInitNestLoop
+   87.87%     0.00%  postgres  postgres           [.] ExecInitMemoize
+   87.87%     0.00%  postgres  postgres           [.]
build_hash_table (inlined) <----
+   87.87%     0.00%  postgres  postgres           [.] memoize_create (inlined)
+   87.87%     0.00%  postgres  postgres           [.]
memoize_allocate (inlined)
+   87.87%     0.00%  postgres  postgres           [.]
MemoryContextAllocExtended
+   87.87%     0.00%  postgres  postgres           [.] memset (inlined)

Failing that, are you able to pg_dump these tables and load them into
a PostgreSQL instance that you can play around with and patch?
Provided you can actually recreate the problem on that instance.

+   71,98%    14,36%  postmaster  [kernel.kallsyms]      [k] page_fault                                                              ▒
+   70,19%     6,59%  postmaster  libc-2.28.so           [.] __memset_avx2_erms                                                      ▒
+   68,20%     0,00%  postmaster  postgres               [.] ExecInitNode                                                            ▒
+   68,20%     0,00%  postmaster  postgres               [.] ExecInitNestLoop                                                        ▒
+   68,13%     0,00%  postmaster  postgres               [.] ExecInitMemoize                                                         ▒
+   68,13%     0,00%  postmaster  postgres               [.] MemoryContextAllocExtended                                              ▒
+   63,20%     0,00%  postmaster  postgres               [.] 0x0000000000776b89                                                      ▒
+   63,20%     0,00%  postmaster  postgres               [.] PostgresMain                                                            ◆
+   63,03%     0,00%  postmaster  postgres               [.] 0x00000000007f48ca                                                      ▒
+   63,03%     0,00%  postmaster  postgres               [.] PortalRun                                                               ▒
+   63,03%     0,00%  postmaster  postgres               [.] 0x00000000007f83ae                                                      ▒
+   63,03%     0,00%  postmaster  postgres               [.] 0x00000000007f7fee                                                      ▒
+   63,03%     0,00%  postmaster  pg_stat_statements.so  [.] 0x00007f5579b599c6                                                      ▒
+   63,03%     0,00%  postmaster  postgres               [.] standard_ProcessUtility                                                 ▒
+   63,03%     0,00%  postmaster  postgres               [.] ExplainQuery                                                            ▒
+   62,83%     0,00%  postmaster  postgres               [.] 0x000000000062e83c                                                      ▒
+   62,83%     0,00%  postmaster  postgres               [.] ExplainOnePlan                                                          ▒
+   57,47%     0,14%  postmaster  [kernel.kallsyms]      [k] do_page_fault                                                           ▒
+   57,23%     0,51%  postmaster  [kernel.kallsyms]      [k] __do_page_fault                                                         ▒
+   55,61%     0,71%  postmaster  [kernel.kallsyms]      [k] handle_mm_fault                                                         ▒
+   55,19%     0,00%  postmaster  pg_stat_statements.so  [.] 0x00007f5579b5ad2c                                                      ▒
+   55,19%     0,00%  postmaster  postgres               [.] standard_ExecutorStart                                                  ▒
+   54,78%     0,87%  postmaster  [kernel.kallsyms]      [k] __handle_mm_fault                                                       ▒
+   53,54%     0,37%  postmaster  [kernel.kallsyms]      [k] do_anonymous_page                                                       ▒
+   36,36%     0,21%  postmaster  [kernel.kallsyms]      [k] alloc_pages_vma                                                         ▒
+   35,99%     0,31%  postmaster  [kernel.kallsyms]      [k] __alloc_pages_nodemask                                                  ▒
+   35,40%     1,06%  postmaster  [kernel.kallsyms]      [k] get_page_from_freelist                                                  ▒
+   27,71%     0,62%  postmaster  [kernel.kallsyms]      [k] prep_new_page                                                           ▒
+   27,09%    26,99%  postmaster  [kernel.kallsyms]      [k] clear_page_erms                                                         ▒
+   11,24%     2,29%  postmaster  [kernel.kallsyms]      [k] swapgs_restore_regs_and_return_to_usermode                              ▒
+    8,95%     6,87%  postmaster  [kernel.kallsyms]      [k] prepare_exit_to_usermode                                                ▒
+    7,83%     1,01%  postmaster  [kernel.kallsyms]      [k] mem_cgroup_charge            

 

David

Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


st 4. 5. 2022 v 16:08 odesílatel Jakub Wartak <Jakub.Wartak@tomtom.com> napsal:

Additional three ways to figure that one (all are IMHO production safe):
a) already mentioned perf with --call-graph dwarf -p PID
b) strace -p PID -e 'mmap' # verify if mmap() NULL is not having MAP_ANONYMOUS flag, size of mmap() request will somehow match work_mem sizing
c) gdb -p PID and then breakpoint for mmap and verify each mmap() # check MAP_ANONYMOUS as above


I have not debug symbols, so I have not more details now

Breakpoint 1 at 0x7f557f0c16c0
(gdb) c
Continuing.

Breakpoint 1, 0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
#1  0x00007f557f04dd91 in sysmalloc () from /lib64/libc.so.6
#2  0x00007f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
#3  0x00007f557f04fb1e in malloc () from /lib64/libc.so.6
#4  0x0000000000932134 in AllocSetAlloc ()
#5  0x00000000009376cf in MemoryContextAllocExtended ()
#6  0x00000000006ad915 in ExecInitMemoize ()
#7  0x000000000068dc02 in ExecInitNode ()
#8  0x00000000006b37ff in ExecInitNestLoop ()
#9  0x000000000068dc56 in ExecInitNode ()
#10 0x00000000006b37ff in ExecInitNestLoop ()
#11 0x000000000068dc56 in ExecInitNode ()
#12 0x00000000006b37de in ExecInitNestLoop ()
#13 0x000000000068dc56 in ExecInitNode ()
#14 0x00000000006b37de in ExecInitNestLoop ()
#15 0x000000000068dc56 in ExecInitNode ()
#16 0x0000000000687e4d in standard_ExecutorStart ()
#17 0x00007f5579b5ad2d in pgss_ExecutorStart () from /usr/pgsql-14/lib/pg_stat_statements.so
#18 0x000000000062e643 in ExplainOnePlan ()
#19 0x000000000062e83d in ExplainOneQuery ()
#20 0x000000000062ee6f in ExplainQuery ()
#21 0x00000000007f9b15 in standard_ProcessUtility ()
#22 0x00007f5579b599c7 in pgss_ProcessUtility () from /usr/pgsql-14/lib/pg_stat_statements.so
#23 0x00000000007f7fef in PortalRunUtility ()
#24 0x00000000007f83af in FillPortalStore ()
#25 0x00000000007f86dd in PortalRun ()
#26 0x00000000007f48cb in exec_simple_query ()
#27 0x00000000007f610e in PostgresMain ()
#28 0x0000000000776b8a in ServerLoop ()
#29 0x0000000000777a03 in PostmasterMain ()
#30 0x00000000004fe413 in main ()
(gdb) p
The history is empty.
(gdb) c
Continuing.

Breakpoint 1, 0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
#1  0x00007f557f04dd91 in sysmalloc () from /lib64/libc.so.6
#2  0x00007f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
#3  0x00007f557f04fb1e in malloc () from /lib64/libc.so.6
#4  0x0000000000932134 in AllocSetAlloc ()
#5  0x00000000009376cf in MemoryContextAllocExtended ()
#6  0x00000000006ad915 in ExecInitMemoize ()
#7  0x000000000068dc02 in ExecInitNode ()
#8  0x00000000006b37ff in ExecInitNestLoop ()
#9  0x000000000068dc56 in ExecInitNode ()
#10 0x00000000006b37ff in ExecInitNestLoop ()
#11 0x000000000068dc56 in ExecInitNode ()
#12 0x00000000006b37de in ExecInitNestLoop ()
#13 0x000000000068dc56 in ExecInitNode ()
#14 0x0000000000687e4d in standard_ExecutorStart ()
#15 0x00007f5579b5ad2d in pgss_ExecutorStart () from /usr/pgsql-14/lib/pg_stat_statements.so
#16 0x000000000062e643 in ExplainOnePlan ()
#17 0x000000000062e83d in ExplainOneQuery ()
#18 0x000000000062ee6f in ExplainQuery ()
#19 0x00000000007f9b15 in standard_ProcessUtility ()
#20 0x00007f5579b599c7 in pgss_ProcessUtility () from /usr/pgsql-14/lib/pg_stat_statements.so
#21 0x00000000007f7fef in PortalRunUtility ()
#22 0x00000000007f83af in FillPortalStore ()
#23 0x00000000007f86dd in PortalRun ()
#24 0x00000000007f48cb in exec_simple_query ()
#25 0x00000000007f610e in PostgresMain ()
#26 0x0000000000776b8a in ServerLoop ()
#27 0x0000000000777a03 in PostmasterMain ()
#28 0x00000000004fe413 in main ()
(gdb) c
Continuing.

there was 2 hits of mmap

Regards

Pavel


 
[1] - https://www.postgresql.org/message-id/CAFj8pRAo5CrF8mpPxMvnBYFSqu4HYDqRsQnLqGphckNHkHosFg%40mail.gmail.com

-J.

RE: strange slow query - lost lot of time somewhere

From
Jakub Wartak
Date:
Hi Pavel,

> I have not debug symbols, so I have not more details now
> Breakpoint 1 at 0x7f557f0c16c0
> (gdb) c
> Continuing.

> Breakpoint 1, 0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> #1  0x00007f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> #2  0x00007f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> #3  0x00007f557f04fb1e in malloc () from /lib64/libc.so.6
> #4  0x0000000000932134 in AllocSetAlloc ()
> #5  0x00000000009376cf in MemoryContextAllocExtended ()
> #6  0x00000000006ad915 in ExecInitMemoize ()

Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / dpkgs(?) so I hope you are basically 1
commandaway of being able to debug it further what happens in ExecInitMemoize() 
Those packages seem to be safe as they modify only /usr/lib/debug so should not have any impact on production workload.

-J.





Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak <Jakub.Wartak@tomtom.com> napsal:
Hi Pavel,

> I have not debug symbols, so I have not more details now
> Breakpoint 1 at 0x7f557f0c16c0
> (gdb) c
> Continuing.

> Breakpoint 1, 0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
> #1  0x00007f557f04dd91 in sysmalloc () from /lib64/libc.so.6
> #2  0x00007f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
> #3  0x00007f557f04fb1e in malloc () from /lib64/libc.so.6
> #4  0x0000000000932134 in AllocSetAlloc ()
> #5  0x00000000009376cf in MemoryContextAllocExtended ()
> #6  0x00000000006ad915 in ExecInitMemoize ()

Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / dpkgs(?) so I hope you are basically 1 command away of being able to debug it further what happens in ExecInitMemoize()
Those packages seem to be safe as they modify only /usr/lib/debug so should not have any impact on production workload.

I just have to wait for admin action - I have no root rights for the server.

 

-J.


Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Thu, 5 May 2022 at 19:26, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak <Jakub.Wartak@tomtom.com> napsal:
>> > Breakpoint 1, 0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > (gdb) bt
>> > #0  0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > #1  0x00007f557f04dd91 in sysmalloc () from /lib64/libc.so.6
>> > #2  0x00007f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
>> > #3  0x00007f557f04fb1e in malloc () from /lib64/libc.so.6
>> > #4  0x0000000000932134 in AllocSetAlloc ()
>> > #5  0x00000000009376cf in MemoryContextAllocExtended ()
>> > #6  0x00000000006ad915 in ExecInitMemoize ()
>>
>> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / dpkgs(?) so I hope you are basically 1
commandaway of being able to debug it further what happens in ExecInitMemoize() 
>> Those packages seem to be safe as they modify only /usr/lib/debug so should not have any impact on production
workload.
>
> I just have to wait for admin action - I have no root rights for the server.

Looking at ExecInitMemoize() it's hard to see what could take such a
long time other than the build_hash_table().  Tom did mention this,
but I can't quite see how the size given to that function could be
larger than 91 in your case.

If you get the debug symbols installed, can you use gdb to

break nodeMemoize.c:268
p size

maybe there's something I'm missing following the code and maybe there
is some way that est_entries is not set to what I thought it was.

It would also be good to see the same perf report again after the
debug symbols are installed in order to resolve those unresolved
function names.

David



Re: strange slow query - lost lot of time somewhere

From
"David G. Johnston"
Date:
On Mon, May 2, 2022 at 10:02 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


út 3. 5. 2022 v 6:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> there is really something strange (see attached file). Looks so this issue
> is much more related to planning time than execution time

You sure there's not something taking an exclusive lock on one of these
tables every so often?

I am almost sure, I can see this issue only every time when I set a higher work mem. I don't see this issue in other cases.


What are the values of work_mem and hash_mem_multiplier for the two cases?

David J.

Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


pá 6. 5. 2022 v 1:28 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Mon, May 2, 2022 at 10:02 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


út 3. 5. 2022 v 6:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> there is really something strange (see attached file). Looks so this issue
> is much more related to planning time than execution time

You sure there's not something taking an exclusive lock on one of these
tables every so often?

I am almost sure, I can see this issue only every time when I set a higher work mem. I don't see this issue in other cases.


What are the values of work_mem and hash_mem_multiplier for the two cases?

 (2022-05-06 07:35:21) prd_aukro=# show work_mem ;
┌──────────┐
│ work_mem │
├──────────┤
│ 400MB    │
└──────────┘
(1 řádka)

Čas: 0,331 ms
(2022-05-06 07:35:32) prd_aukro=# show hash_mem_multiplier ;
┌─────────────────────┐
│ hash_mem_multiplier │
├─────────────────────┤
│ 1                   │
└─────────────────────┘
(1 řádka)


David J.

Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


pá 6. 5. 2022 v 1:19 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Thu, 5 May 2022 at 19:26, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> čt 5. 5. 2022 v 8:51 odesílatel Jakub Wartak <Jakub.Wartak@tomtom.com> napsal:
>> > Breakpoint 1, 0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > (gdb) bt
>> > #0  0x00007f557f0c16c0 in mmap64 () from /lib64/libc.so.6
>> > #1  0x00007f557f04dd91 in sysmalloc () from /lib64/libc.so.6
>> > #2  0x00007f557f04eaa9 in _int_malloc () from /lib64/libc.so.6
>> > #3  0x00007f557f04fb1e in malloc () from /lib64/libc.so.6
>> > #4  0x0000000000932134 in AllocSetAlloc ()
>> > #5  0x00000000009376cf in MemoryContextAllocExtended ()
>> > #6  0x00000000006ad915 in ExecInitMemoize ()
>>
>> Well the PGDG repo have the debuginfos (e.g. postgresql14-debuginfo) rpms / dpkgs(?) so I hope you are basically 1 command away of being able to debug it further what happens in ExecInitMemoize()
>> Those packages seem to be safe as they modify only /usr/lib/debug so should not have any impact on production workload.
>
> I just have to wait for admin action - I have no root rights for the server.

Looking at ExecInitMemoize() it's hard to see what could take such a
long time other than the build_hash_table().  Tom did mention this,
but I can't quite see how the size given to that function could be
larger than 91 in your case.

If you get the debug symbols installed, can you use gdb to

break nodeMemoize.c:268
p size

maybe there's something I'm missing following the code and maybe there
is some way that est_entries is not set to what I thought it was.

It would also be good to see the same perf report again after the
debug symbols are installed in order to resolve those unresolved
function names.

Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at nodeMemoize.c:268
268 if (size == 0)
(gdb) p size
$1 = 4369066

This is work_mem size

+   99,92%     0,00%  postmaster  postgres               [.] ServerLoop                                                                  ▒
+   99,92%     0,00%  postmaster  postgres               [.] PostgresMain                                                                ▒
+   99,92%     0,00%  postmaster  postgres               [.] exec_simple_query                                                           ▒
+   99,70%     0,00%  postmaster  postgres               [.] PortalRun                                                                   ▒
+   99,70%     0,00%  postmaster  postgres               [.] FillPortalStore                                                             ▒
+   99,70%     0,02%  postmaster  postgres               [.] PortalRunUtility                                                            ▒
+   99,68%     0,00%  postmaster  pg_stat_statements.so  [.] 0x00007f5579b599c6                                                          ▒
+   99,68%     0,00%  postmaster  postgres               [.] standard_ProcessUtility                                                     ▒
+   99,68%     0,00%  postmaster  postgres               [.] ExplainQuery                                                                ◆
+   99,63%     0,00%  postmaster  postgres               [.] ExplainOneQuery                                                             ▒
+   99,16%     0,00%  postmaster  postgres               [.] ExplainOnePlan                                                              ▒
+   99,06%     0,00%  postmaster  pg_stat_statements.so  [.] 0x00007f5579b5ad2c                                                          ▒
+   99,06%     0,00%  postmaster  postgres               [.] standard_ExecutorStart                                                      ▒
+   99,06%     0,00%  postmaster  postgres               [.] InitPlan (inlined)                                                          ▒
+   99,06%     0,00%  postmaster  postgres               [.] ExecInitNode                                                                ▒
+   99,06%     0,00%  postmaster  postgres               [.] ExecInitNestLoop                                                            ▒
+   99,00%     0,02%  postmaster  postgres               [.] ExecInitMemoize                                                             ▒
+   98,87%    26,80%  postmaster  libc-2.28.so           [.] __memset_avx2_erms                                                          ▒
+   98,87%     0,00%  postmaster  postgres               [.] build_hash_table (inlined)                                                  ▒
+   98,87%     0,00%  postmaster  postgres               [.] memoize_create (inlined)                                                    ▒
+   98,87%     0,00%  postmaster  postgres               [.] memoize_allocate (inlined)                                                  ▒
+   98,87%     0,00%  postmaster  postgres               [.] MemoryContextAllocExtended                                                  ▒
+   72,08%    72,08%  postmaster  [unknown]              [k] 0xffffffffbaa010e0
     0,47%     0,00%  postmaster  postgres               [.] pg_plan_query
     0,47%     0,00%  postmaster  pg_stat_statements.so  [.] 0x00007f5579b59ba4
     0,47%     0,00%  postmaster  postgres               [.] standard_planner
     0,47%     0,00%  postmaster  postgres               [.] subquery_planner
     0,47%     0,00%  postmaster  postgres               [.] grouping_planner
     0,47%     0,00%  postmaster  postgres               [.] query_planner
 

David

Re: strange slow query - lost lot of time somewhere

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at
> nodeMemoize.c:268
> 268 if (size == 0)
> (gdb) p size
> $1 = 4369066

Uh-huh ....

            regards, tom lane



Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Fri, 6 May 2022 at 17:52, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at nodeMemoize.c:268
> 268 if (size == 0)
> (gdb) p size
> $1 = 4369066

Thanks for the report.  I think I now see the problem.  Looking at
[1], it seems that's a bushy plan. That's fine, but less common than a
left deep plan.

I think the problem is down to some incorrect code in
get_memoize_path() where we pass the wrong value of "calls" to
create_memoize_path(). I think instead of outer_path->parent->rows it
instead should be outer_path->rows.

If you look closely at the plan, you'll see that the outer side of the
inner-most Nested Loop is parameterized by some higher-level nested
loop.

->  Nested Loop  (cost=1.14..79.20 rows=91 width=8) (actual
time=0.024..0.024 rows=1 loops=66)
                     ->  Index Only Scan using
uq_isi_itemid_itemimageid on item_share_image itemsharei2__1
(cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1
loops=66)
                           Index Cond: (item_id = itembo0_.id)
                           Heap Fetches: 21
                     ->  Memoize  (cost=0.57..2.07 rows=1 width=8)
(actual time=0.013..0.013 rows=1 loops=66)

so instead of passing 91 to create_memoize_path() as I thought. Since
I can't see any WHERE clause items filtering rows from the
itemsharei2__1 relation, then the outer_path->parent->rows is should
be whatever pg_class.reltuples says.

Are you able to send the results of:

explain select item_id from item_share_image group by item_id; -- I'm
interested in the estimated number of groups in the plan's top node.

select reltuples from pg_class where oid = 'item_share_image'::regclass;

I'm expecting the estimated number of rows in the top node of the
group by plan to be about 4369066.

David

[1] https://explain.depesz.com/s/2rBw#source



Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Fri, 6 May 2022 at 20:04, David Rowley <dgrowleyml@gmail.com> wrote:
> Thanks for the report.  I think I now see the problem.  Looking at
> [1], it seems that's a bushy plan. That's fine, but less common than a
> left deep plan.

On second thoughts, it does not need to be a bushy plan for the outer
side of the nested loop to be parameterized by some higher-level
nested loop. There's an example of a plan like this in the regression
tests.

regression=# explain (analyze, costs off, summary off)
regression-# select * from tenk1 t1 left join
regression-# (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
regression-#  on t1.hundred = t2.hundred and t1.ten = t3.ten
regression-# where t1.unique1 = 1;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.258..0.487 rows=20 loops=1)
   ->  Index Scan using tenk1_unique1 on tenk1 t1 (actual
time=0.049..0.049 rows=1 loops=1)
         Index Cond: (unique1 = 1)
   ->  Nested Loop (actual time=0.204..0.419 rows=20 loops=1)
         Join Filter: (t1.ten = t3.ten)
         Rows Removed by Join Filter: 80
         ->  Bitmap Heap Scan on tenk1 t2 (actual time=0.064..0.194
rows=100 loops=1)
               Recheck Cond: (t1.hundred = hundred)
               Heap Blocks: exact=86
               ->  Bitmap Index Scan on tenk1_hundred (actual
time=0.036..0.036 rows=100 loops=1)
                     Index Cond: (hundred = t1.hundred)
         ->  Memoize (actual time=0.001..0.001 rows=1 loops=100)
               Cache Key: t2.thousand
               Cache Mode: logical
               Hits: 90  Misses: 10  Evictions: 0  Overflows: 0
Memory Usage: 4kB
               ->  Index Scan using tenk1_unique2 on tenk1 t3 (actual
time=0.009..0.009 rows=1 loops=10)
                     Index Cond: (unique2 = t2.thousand)
(17 rows)

debugging this I see that the memorize plan won because it was passing
10000 as the number of calls. It should have been passing 100.  The
memorize node's number of loops agrees with that. Fixing the calls to
correctly pass 100 gets rid of the Memoize node.

I've attached a patch to fix.  I'll look at it in more detail after the weekend.

I'm very tempted to change the EXPLAIN output in at least master to
display the initial and final (maximum) hash table sizes. Wondering if
anyone would object to that?

David

Attachment

Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


pá 6. 5. 2022 v 10:05 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Fri, 6 May 2022 at 17:52, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at nodeMemoize.c:268
> 268 if (size == 0)
> (gdb) p size
> $1 = 4369066

Thanks for the report.  I think I now see the problem.  Looking at
[1], it seems that's a bushy plan. That's fine, but less common than a
left deep plan.

I think the problem is down to some incorrect code in
get_memoize_path() where we pass the wrong value of "calls" to
create_memoize_path(). I think instead of outer_path->parent->rows it
instead should be outer_path->rows.

If you look closely at the plan, you'll see that the outer side of the
inner-most Nested Loop is parameterized by some higher-level nested
loop.

->  Nested Loop  (cost=1.14..79.20 rows=91 width=8) (actual
time=0.024..0.024 rows=1 loops=66)
                     ->  Index Only Scan using
uq_isi_itemid_itemimageid on item_share_image itemsharei2__1
(cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1
loops=66)
                           Index Cond: (item_id = itembo0_.id)
                           Heap Fetches: 21
                     ->  Memoize  (cost=0.57..2.07 rows=1 width=8)
(actual time=0.013..0.013 rows=1 loops=66)

so instead of passing 91 to create_memoize_path() as I thought. Since
I can't see any WHERE clause items filtering rows from the
itemsharei2__1 relation, then the outer_path->parent->rows is should
be whatever pg_class.reltuples says.

Are you able to send the results of:

explain select item_id from item_share_image group by item_id; -- I'm
interested in the estimated number of groups in the plan's top node.

 

select reltuples from pg_class where oid = 'item_share_image'::regclass;

I'm expecting the estimated number of rows in the top node of the
group by plan to be about 4369066.
 
(2022-05-06 12:30:23) prd_aukro=# explain select item_id from item_share_image group by item_id;
                                                            QUERY PLAN                                                              
────────────────────────────────────────────────────────────────────────────
Finalize HashAggregate  (cost=1543418.63..1554179.08 rows=1076045 width=8)
  Group Key: item_id
  ->  Gather  (cost=1000.57..1532658.18 rows=4304180 width=8)
        Workers Planned: 4
        ->  Group  (cost=0.57..1101240.18 rows=1076045 width=8)
              Group Key: item_id
              ->  Parallel Index Only Scan using ixfk_isi_itemid on item_share_image  (cost=0.57..1039823.86 rows=24566530 width=8)
(7 řádek)

Čas: 1,808 ms
(2022-05-06 12:30:26) prd_aukro=# select reltuples from pg_class where oid = 'item_share_image'::regclass;
 reltuples  
────────────
9.826612e+07
(1 řádka)

Čas: 0,887 ms
 
Regards

Pavel


David

[1] https://explain.depesz.com/s/2rBw#source

Re: strange slow query - lost lot of time somewhere

From
Justin Pryzby
Date:
On Tue, May 03, 2022 at 02:13:18PM +1200, David Rowley wrote:
> I'm wishing I put the initial hash table size and the final hash table
> size in EXPLAIN + EXPLAIN ANALYZE now. Perhaps it's not too late for
> v15 to do that so that it might help us figure things out in the
> future.

On Fri, May 06, 2022 at 09:27:57PM +1200, David Rowley wrote:
> I'm very tempted to change the EXPLAIN output in at least master to
> display the initial and final (maximum) hash table sizes. Wondering if
> anyone would object to that?

No objection to add it to v15.

I'll point out that "Cache Mode" was added to EXPLAIN between 11.1 and 11.2
without controversy, so this could conceivably be backpatched to v14, too.

commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085
Author: David Rowley <drowley@postgresql.org>
Date:   Wed Nov 24 10:07:38 2021 +1300

    Allow Memoize to operate in binary comparison mode



Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Tue, 10 May 2022 at 14:22, Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, May 06, 2022 at 09:27:57PM +1200, David Rowley wrote:
> > I'm very tempted to change the EXPLAIN output in at least master to
> > display the initial and final (maximum) hash table sizes. Wondering if
> > anyone would object to that?
>
> No objection to add it to v15.
>
> I'll point out that "Cache Mode" was added to EXPLAIN between 11.1 and 11.2
> without controversy, so this could conceivably be backpatched to v14, too.
>
> commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085

This is seemingly a good point, but I don't really think it's a case
of just keeping the EXPLAIN output stable in minor versions, it's more
about adding new fields to structs.

I just went and wrote the patch and the fundamental difference seems
to be that what I did in 6c32c0977 managed to only add a new field in
the empty padding between two fields. That resulted in no fields in
the struct being pushed up in their address offset. The idea here is
not to break any extension that's already been compiled that
references some field that comes after that.

In the patch I've just written, I've had to add some fields which
causes sizeof(MemoizeState) to go up resulting in the offsets of some
later fields changing.

One thing I'll say about this patch is that I found it annoying that I
had to add code to cache_lookup() when we failed to find an entry.
That's probably not the end of the world as that's only for cache
misses.  Ideally, I'd just be looking at the size of the hash table at
the end of execution, however, naturally, we must show the EXPLAIN
output before we shut down the executor.

I just copied the Hash Join output. It looks like:

# alter table tt alter column a set (n_distinct=4);
ALTER TABLE
# analyze tt;
# explain (analyze, costs off, timing off) select * from tt inner join
t2 on tt.a=t2.a;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Nested Loop (actual rows=1000000 loops=1)
   ->  Seq Scan on tt (actual rows=1000000 loops=1)
   ->  Memoize (actual rows=1 loops=1000000)
         Cache Key: tt.a
         Cache Mode: logical
         Hits: 999990  Misses: 10  Evictions: 0  Overflows: 0  Memory Usage: 2kB
         Hash Buckets: 16 (originally 4)
         ->  Index Only Scan using t2_pkey on t2 (actual rows=1 loops=10)
               Index Cond: (a = tt.a)
               Heap Fetches: 0
 Planning Time: 0.483 ms
 Execution Time: 862.860 ms
(12 rows)

Does anyone have any views about the attached patch going into v15?

David

Attachment

Re: strange slow query - lost lot of time somewhere

From
David Rowley
Date:
On Fri, 6 May 2022 at 21:27, David Rowley <dgrowleyml@gmail.com> wrote:
> I've attached a patch to fix.  I'll look at it in more detail after the weekend.

I've now pushed this fix to master and backpatched to 14.

David



Re: strange slow query - lost lot of time somewhere

From
Pavel Stehule
Date:


po 16. 5. 2022 v 6:11 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Fri, 6 May 2022 at 21:27, David Rowley <dgrowleyml@gmail.com> wrote:
> I've attached a patch to fix.  I'll look at it in more detail after the weekend.

I've now pushed this fix to master and backpatched to 14.

Thank you

Pavel
 

David