Re: Dynamic gathering the values for seq_page_cost/xxx_cost - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Date
Msg-id 20200918135033.6nwd3gtmlpy4atvb@development
Whole thread Raw
In response to Re: Dynamic gathering the values for seq_page_cost/xxx_cost  (Andy Fan <zhihui.fan1213@gmail.com>)
Responses Re: Dynamic gathering the values for seq_page_cost/xxx_cost
List pgsql-hackers
On Fri, Sep 18, 2020 at 09:28:10PM +0800, Andy Fan wrote:
>On Thu, Nov 28, 2019 at 12:48 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>
>> On Tue, Nov 26, 2019 at 08:59:22AM +0800, Andy Fan wrote:
>> >The optimizer cost model usually needs 2 inputs,  one is used to represent
>> >data distribution and the other one is used to represent the capacity of
>> >the hardware, like cpu/io let's call this one as system stats.
>> >
>> >In Oracle database, the system stats can be gathered with
>> >dbms_stats.gather_system_stats [1] on the running hardware,  In
>> >postgresql,  the value is set on based on experience (user can change the
>> >value as well, but is should be hard to decide which values they should
>> >use).  The pg way is not perfect in theory(In practice, it may be good
>> >enough or not).  for example,  HDD & SSD have different capacity regards
>> to
>> >seq_scan_cost/random_page_cost,   cpu cost may also different on different
>> >hardware as well.
>> >
>> >I run into a paper [2] which did some research on dynamic gathering the
>> >values for xxx_cost, looks it is interesting.  However it doesn't provide
>> >the code for others to do more research.  before I dive into this,  It
>> >would be great to hear some suggestion from experts.
>> >
>> >so what do you think about this method and have we have some discussion
>> >about this before and the result?
>> >
>>
>> IMHO it would be great to have a tool that helps with tuning those
>> parameters, particularly random_page_cost. I'm not sure how feasible it
>> is, though, but if you're willing to do some initial experiments and
>> research, I think it's worth looking into.
>>
>> It's going to be challenging, though, because even random_page_cost=4
>> mismatches the "raw" characteristics on any existing hardware. On old
>> drives the sequential/random difference is way worse, on SSDs it's about
>> right. But then again, we know random_page_cost=1.5 or so works mostly
>> fine on SSDs, and that's much lower than just raw numbers.
>>
>> So it's clearly one thing to measure HW capabilities, and it's another
>> thing to conclude what the parameters should be ...
>>
>>
>> regards
>>
>> --
>> Tomas Vondra                  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>I recently tried something in this direction and the result looks
>promising based on my limited test.
>
>Since the unit of a xxx_cost is "seq_page_cost", then how to detect
>seq_page_cost is important. In the cost model, the IO cost of a seqscan is
>rel->pages * seq_page_cost, it doesn't consider any cache (file system
>cache or
>shared buffer cache).  However, it assumes the OS will prefetch the IO. So
>to
>detect the seq_page_cost, I enabled the prefetch but avoided the file system
>cache. I tested this with 1). drop the cache on the file system. 2). Open
>the test
>file without O_DIRECT so that the prefetch can work.
>
>To detect the random page read, I read it with pread with a random offset.
>Since the random offsets may be the same as each other during the test,
>so even dropping the file system cache at the beginning doesn't work. so
>I open it with the O_DIRECT option.
>
>I also measure the cost of reading a page from a file system cache, during
>my test, it is about 10% of a seq scan read.
>
>After I get the basic numbers about the hardware capability, I let the user
>provide a cache hit ratio (This is a place where we can further improve if
>this
>is a right direction).
>
>Here is the test result on my hardware.
>
>fs_cache_lat = 0.832025us, seq_read_lat = 8.570290us, random_page_lat =
>73.987732us
>
>cache hit ratio: 1.000000 random_page_cost 1.000000
>cache hit ratio: 0.900000 random_page_cost 5.073692
>cache hit ratio: 0.500000 random_page_cost 7.957589
>cache hit ratio: 0.100000 random_page_cost 8.551591
>cache hit ratio: 0.000000 random_page_cost 8.633049
>
>
>Then I tested the suggested value with the 10GB TPCH
>workload. I compared the plans with 2 different settings random_page_cost =
>1). 4 is the default value) 2). 8.6  the cache hint ratio = 0 one.  Then 11
>out of the 22
>queries generated a different plan.  At last I drop the cache (including
>both
>file system cache and shared_buffer) before run each query and run the 11
>queries
>under the 2 different settings. The execution time is below.
>
>
>|     | random_page_cost=4 | random_page_cost=8.6 |
>|-----+--------------------+----------------------|
>| Q1  |           1425.964 |             1121.928 |
>| Q2  |           2553.072 |             2567.450 |
>| Q5  |           4397.514 |             1475.343 |
>| Q6  |          12576.985 |             4622.503 |
>| Q7  |           3459.777 |             2987.241 |
>| Q8  |           8360.995 |             8415.311 |
>| Q9  |           4661.842 |             2930.370 |
>| Q11 |           4885.289 |             2348.541 |
>| Q13 |           2610.937 |             1497.776 |
>| Q20 |          13218.122 |            10985.738 |
>| Q21 |            264.639 |              262.350 |
>
>
>The attached main.c is the program I used to detect the
>random_page_cost. result.tar.gz is the test result, you can run a git log
>first
>to see the difference on plan or execution stat.
>
>Any feedback is welcome. Thanks!
>

That seems pretty neat. What kind of hardware have you done these tests
on? It's probably worth testing on various other storage systems to see
how that applies to those.

Have you tried existing I/O testing tools, e.g. fio? If your idea is to
propose some built-in tool (similar to pg_test_fsync) then we probably
should not rely on external tools, but I wonder if we're getting the
same numbers.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
Next
From: Masahiko Sawada
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2