Re: using memoize in in paralel query decreases performance - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: using memoize in in paralel query decreases performance
Date
Msg-id CAFj8pRBnvSZBQrnyfp5ZPV6f5GrpwdTC+ACy7i_TKPa0fYV_wQ@mail.gmail.com
Whole thread Raw
In response to Re: using memoize in in paralel query decreases performance  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: using memoize in in paralel query decreases performance
List pgsql-hackers


út 7. 3. 2023 v 9:58 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
 /On Tue, 7 Mar 2023 at 21:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> po 6. 3. 2023 v 22:52 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
>> I wonder if the additional work_mem required for Memoize is just doing
>> something like causing kernel page cache evictions and leading to
>> fewer buffers for ixfk_ite_itemcategoryid and the item table being
>> cached in the kernel page cache.
>>
>> Maybe you could get an idea of that if you SET track_io_timing = on;
>> and EXPLAIN (ANALYZE, BUFFERS) both queries.
>
>
> https://explain.depesz.com/s/vhk0

This is the enable_memoize=on one. The I/O looks like:

Buffers: shared hit=105661309 read=15274264 dirtied=15707 written=34863
I/O Timings: shared/local read=2671836.341 write=1286.869

2671836.341 / 15274264 = ~0.175 ms per read.

> https://explain.depesz.com/s/R5ju

This is the faster enable_memoize = off one. The I/O looks like:

Buffers: shared hit=44542473 read=18541899 dirtied=11988 written=18625
I/O Timings: shared/local read=1554838.583 write=821.477

1554838.583 / 18541899 = ~0.084 ms per read.

That indicates that the enable_memoize=off version is just finding
more pages in the kernel's page cache than the slower query.  The
slower query just appears to be under more memory pressure causing the
kernel to have less free memory to cache useful pages. I don't see
anything here that indicates any problems with Memoize.  Sure the
statistics could be better as, ideally, the Memoize wouldn't have
happened for the i_2 relation. I don't see anything that indicates any
bugs with this, however. It's pretty well known that Memoize puts
quite a bit of faith into ndistinct estimates. If it causes too many
issues the enable_memoize switch can be turned to off.

You might want to consider experimenting with smaller values of
work_mem and/or hash_mem_multiplier for this query or just disabling
memoize altogether.

I can live with it. This is an analytical query and the performance is not too important for us. I was surprised that the performance was about 25% worse, and so the hit ratio was almost zero. I am thinking, but I am not sure if the estimation of the effectiveness of memoization can depend (or should depend) on the number of workers? In this case the number of workers is high.



David

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Add pg_walinspect function with block info columns
Next
From: Daniel Gustafsson
Date:
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)