Home > mailing lists

Re: using memoize in in paralel query decreases performance - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: using memoize in in paralel query decreases performance
Date	March 7, 2023 09:46:35
Msg-id	CAApHDvpehwetcdZaC3c=kn3bzpCOeU-o0xukK5cRCr5URONmSw@mail.gmail.com Whole thread Raw
In response to	Re: using memoize in in paralel query decreases performance (Pavel Stehule <pavel.stehule@gmail.com>)
Responses	Re: using memoize in in paralel query decreases performance
List	pgsql-hackers

Tree view

On Tue, 7 Mar 2023 at 22:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I can live with it. This is an analytical query and the performance is not too important for us. I was surprised that
theperformance was about 25% worse, and so the hit ratio was almost zero. I am thinking, but I am not sure if the
estimationof the effectiveness of memoization can depend (or should depend) on the number of workers? In this case the
numberof workers is high. 

The costing for Memoize takes the number of workers into account by
way of the change in expected input rows.  The number of estimated
input rows is effectively just divided by the number of parallel
workers, so if we expect 1 million rows from the outer side of the
join and 4 workers, then we'll assume the memorize will deal with
250,000 rows per worker.  If the n_distinct estimate for the cache key
is 500,000, then it's not going to look very attractive to Memoize
that.  In reality, estimate_num_groups() won't say the number of
groups is higher than the input rows, but Memoize, with all the other
overheads factored into the costs, it would never look favourable if
the planner thought there was never going to be any repeated values.
The expected cache hit ratio there would be zero.

David

pgsql-hackers by date:

From: Peter Smith
Date: 07 March 2023, 09:30:20
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Pavel Stehule
Date: 07 March 2023, 09:50:20
Subject: Re: using memoize in in paralel query decreases performance

Re: using memoize in in paralel query decreases performance - Mailing list pgsql-hackers

Previous

Next