Re: increasing effective_cache_size slows down join queries by a factor of 4000x - Mailing list pgsql-general

From Tomas Vondra
Subject Re: increasing effective_cache_size slows down join queries by a factor of 4000x
Date
Msg-id ee5300c3-7f49-ed52-0e7f-33563f240ba8@enterprisedb.com
Whole thread Raw
In response to Re: increasing effective_cache_size slows down join queries by a factor of 4000x  (A Shaposhnikov <artyom@gmail.com>)
List pgsql-general

On 2/15/22 01:06, A Shaposhnikov wrote:
> 
> Interestingly I have a second PG 14.2 database, with identical table 
> definitions, but about 10% smaller row counts, and the exact same query 
> works fast there without the 2nd condition:
> 

Are you sure about the 10%? Because in the plans from the first machine 
I see this:

 >
 > -> Index Scan using team_pkey on team t (cost=0.57..11382381.88
 > rows=78693167 width=175) (actual time=0.016..0.695 rows=854 loops=1)
 >

while the second machine does this:

> 
> -> Index Scan using team_pkey on team t (cost=0.57..2366113.83 
> rows=2807531 width=160) (actual time=0.031..0.801 rows=888 loops=1)
> 

That's 2.8M vs. 78M, quite far from "10% difference". Not sure about 
team_aliases table, that's imposible to say from the plans.

This may matter a lot, because we use effective cache size to calculate 
cache hit ratio for the query, with relation sizes as an input. So 
smaller relations (or larger effective_cache_size) means cheaper random 
I/O, hence preference for nested loop join.

The other thing is data distribution - that may matter too.


IMO it's pointless to investigate this further - we know what's causing 
the issue. The optimizer is oblivious that merge join will have to skip 
large part of the second input, due to the implicit condition. Notice 
that adding the condition changes the cost from:

  Limit (cost=81.33..331.82 rows=1000 width=183) ...

to

  Limit (cost=81.33..720.48 rows=1000 width=183) ...

So it seems *more* expensive than the first plan. Taken to the extreme 
the planner could theoretically have chosen to use the first plan (and 
delay the condition until after the join).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-general by date:

Previous
From: Marc Millas
Date:
Subject: Re: Moving the master to a new server
Next
From: Glen Eustace
Date:
Subject: Re: Moving the master to a new server