Re: strange slow query - lost lot of time somewhere - Mailing list pgsql-hackers

From David Rowley
Subject Re: strange slow query - lost lot of time somewhere
Date
Msg-id CAApHDvqk+MDmSHhVwfhnsXYDvHoED-f0xO8kqfMRrg0n__EtaQ@mail.gmail.com
Whole thread Raw
In response to Re: strange slow query - lost lot of time somewhere  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On Tue, 3 May 2022 at 15:22, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Plugging in:
> n = 2,000
> e = 500
> c = 10,000
>
> proper = 5%
> incorrect = 25%
>
> But of the 10,000 calls we will receive, the first 2,000 will be misses while 2,000 of the remaining 8,000 will be
hits,due to sharing 2,000 distinct groups among the available inventory of 500 (25% of 8,000 is 2,000).  2,000 hits in
10,000calls yields 20%. 
>
> I believe the correct formula to be:
>
> ((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio
> .80 * .25 = .20

I think you're correct here.  The formula should be that.  However,
two things; 1) this being incorrect is not the cause of the original
problem reported on this thread, and 2) There's just no way we could
consider fixing this in v15, let alone back patch it to v14.

Maybe we should open a new thread about this and put an entry in the
first CF for v16 under bugs and come back to it after we branch.
Thinking the cache hit ratio is lower than it actually is going to be
will reduce the chances of the planner switching to a Nested Loop /
Memoize plan vs a Hash or Merge Join plan.

I was already fairly concerned that Memoize could cause performance
regressions when the ndistinct value or expected cache entry size is
underestimated or the outer side rows are overestimated.  What I've
got to calculate the cache hit ratio does seem incorrect given what
you're showing, however it does add an element of pessimism and
reduces the chances of a bad plan being picked when work_mem is too
low to cache all entries.  Swapping it out for your formula seems like
it would increase the chances of a Memoize plan being picked when the
row, ndistinct and cache entry size estimates are correct, however, it
could also increase the chance of a bad plan when being picked in
cases where the estimates are incorrect.

My problem with changing this now would be that we already often
perform Nested Loop joins when a Hash or Merge join would be a better
option. I'd hate to take us in a direction where we make that problem
even worse.

David



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: bogus: logical replication rows/cols combinations
Next
From: Andres Freund
Date:
Subject: Re: failures in t/031_recovery_conflict.pl on CI