Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Should we update the random_page_cost default value?
Date
Msg-id ke6nnme7eiwf3m2whrxwj6s2zbzkoex2tvwohd6bcshs4mkdva@e5irn76rhcrj
Whole thread Raw
In response to Re: Should we update the random_page_cost default value?  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-hackers
Hi,

On 2025-10-06 07:34:53 +0200, Laurenz Albe wrote:
> However, my practical experience is that PostgreSQL tends to favor
> sequential scans too much.  Often, that happens together with a parallel
> plan, and I find PostgreSQL with the default configuration prefer a plan
> with two parallel workers performing a sequential scan with a ridiculously
> selective (correctly estimated!) filter condition like 500 rows out of a
> million over an index scan that is demonstrably faster.
>
> I have no artificial reproducer for that, and I admit that I didn't hunt
> down the reason why the planner might prefer such a plan.  I just tell
> people to lower random_page_cost, and the problem goes away.  So I am
> clearly fighting symptoms.  Often, an alternative solution is to set
> max_parallel_workers_per_gather to 0, which seems to suggest that perhaps
> the suggestion from [1] is more interesting than I thought.

I've seen this quite often too. IIRC in the cases I've actually analyzed in
any depth it came down to a few root causes:

1) Fast start plans (where the planner though that a sequential scan will find
   a matching tuple quickly, but doesn't, leading to scanning most of the
   table). I frankly think we should just disable these, they're a very low
   confidence bet with high costs in the case of a loss.

2) Not taking the likelihood of data already being cached into account leads
   to preferring sequential scans due to seq_page_cost, even though the index
   scan would not have required any IO

3) Our costing for the cost of predicate evaluation is extremely poor. Among
   the reasons are

   - There is no difference in cost between common operators, despite
     significant real evaluation cost. E.g. int and text operators are not
     close in evaluation cost.

   - IIRC we disregard the cost of potentially needing to detoast completely,
     despite that very easily becoming the determining factor

   - Tuple deforming cost. It's a lot more CPU intensive to deform column 105
     than column 5, often the index might avoid needing to do the more
     epensive deforming, but we don't take that into account.

   This often leads to under-estimating the CPU cost of seqscans.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: split func.sgml to separated individual sgml files
Next
From: Bruce Momjian
Date:
Subject: Re: Should we update the random_page_cost default value?