Re: Dynamic gathering the values for seq_page_cost/xxx_cost - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Date
Msg-id 20191127164821.lspxyrf3c5r6zu5n@development
Whole thread Raw
In response to Dynamic gathering the values for seq_page_cost/xxx_cost  (Andy Fan <zhihui.fan1213@gmail.com>)
Responses Re: Dynamic gathering the values for seq_page_cost/xxx_cost  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers
On Tue, Nov 26, 2019 at 08:59:22AM +0800, Andy Fan wrote:
>The optimizer cost model usually needs 2 inputs,  one is used to represent
>data distribution and the other one is used to represent the capacity of
>the hardware, like cpu/io let's call this one as system stats.
>
>In Oracle database, the system stats can be gathered with
>dbms_stats.gather_system_stats [1] on the running hardware,  In
>postgresql,  the value is set on based on experience (user can change the
>value as well, but is should be hard to decide which values they should
>use).  The pg way is not perfect in theory(In practice, it may be good
>enough or not).  for example,  HDD & SSD have different capacity regards to
>seq_scan_cost/random_page_cost,   cpu cost may also different on different
>hardware as well.
>
>I run into a paper [2] which did some research on dynamic gathering the
>values for xxx_cost, looks it is interesting.  However it doesn't provide
>the code for others to do more research.  before I dive into this,  It
>would be great to hear some suggestion from experts.
>
>so what do you think about this method and have we have some discussion
>about this before and the result?
>

IMHO it would be great to have a tool that helps with tuning those
parameters, particularly random_page_cost. I'm not sure how feasible it
is, though, but if you're willing to do some initial experiments and
research, I think it's worth looking into.

It's going to be challenging, though, because even random_page_cost=4
mismatches the "raw" characteristics on any existing hardware. On old
drives the sequential/random difference is way worse, on SSDs it's about
right. But then again, we know random_page_cost=1.5 or so works mostly
fine on SSDs, and that's much lower than just raw numbers.

So it's clearly one thing to measure HW capabilities, and it's another
thing to conclude what the parameters should be ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Invisible PROMPT2
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Block level parallel vacuum