Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Should we update the random_page_cost default value?
Date
Msg-id 5qum3e5rzblbuw4m5cb7g6l5d6y2gip4ltz6pouutqkljc645r@ibvg5dygvqgl
Whole thread Raw
In response to Re: Should we update the random_page_cost default value?  (Greg Sabino Mullane <htamfids@gmail.com>)
Responses Re: Should we update the random_page_cost default value?
List pgsql-hackers
Hi,

On 2025-10-07 15:20:37 -0400, Greg Sabino Mullane wrote:
> On Tue, Oct 7, 2025 at 3:15 PM Greg Sabino Mullane <htamfids@gmail.com>
> wrote:
> 
> > One of the take-away lessons from this thread for me is that the TPC-*
> >> benchmarks are far removed from real world queries. (Maybe if we ask an LLM
> >> to use an ORM to implement TPC-H? Ha ha ha!)
> >
> >
> To be clear, I'm saying that TPC queries are written by sane adults that
> know what they are doing, but perhaps lowering rpc tends to help more when
> the queries are not well-written SQL (which many consultants would argue is
> the majority of production queries).

I think this discrepancy is largely due to the fact that Tomas' is testing
with a cold cache (he has numbers for both), whereas most production workloads
have very high cache hit ratios.  Also most production postgres workloads are
not heavily on the analytics side, in semi-transactional workloads switching
to a sequential scan "too early" is *way way* worse than staying with a index
scan for a bit longer than makes sense. The switch to a seqscan will often
make the query dramatically more expensive, whereas staying with the index
scan increases costs incrementally.  Add to that the bane of fast-start plans
that really can't be disabled other than making seq scans relatively more
expensive...

I rather doubt we'll find a particularly satisfying answer to this without a
better answer to take into account how many blocks will already be cached at
the start of query execution - we only really model keeping blocks cached
across repeated accesses within one query execution.  Just lowering
seq_page_cost & random_page_cost to account for IO being cheap (due to being
cached) is a pretty bad answer, as it doesn't really allow for accurate
costing of some queries having high hit ratio and others a poor one.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: Add mode column to pg_stat_progress_vacuum
Next
From: Paul Ramsey
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options