Re: Dynamic gathering the values for seq_page_cost/xxx_cost - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Date
Msg-id CAKU4AWrz9PT9LPythqAj4SJwNAAHH-taNGHRO8B7Z9fqFCRvHQ@mail.gmail.com
Whole thread Raw
In response to Re: Dynamic gathering the values for seq_page_cost/xxx_cost  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Dynamic gathering the values for seq_page_cost/xxx_cost
List pgsql-hackers
Thanks Ashutosh for coming:)

On Mon, Sep 21, 2020 at 9:03 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Sep 21, 2020 at 9:11 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Here are some changes for my detection program.
>
> |                                                 | seq_read_lat (us) | random_read_lat (us) |
> | FIO                                             |                12 |                  148 |
> | Previous main.c                                 |               8.5 |                   74 |
> | invalidate_device_cache before each testing     |                 9 |                  150 |
> | prepare the test data file with O_DIRECT option |                15 |                  150 |
>
> In invalidate_device_cache, I just create another 1GB data file and read
> it. (see invalidate_device_cache function) this is similar as the previous fio setup.
>
> prepare test data file with O_DIRECT option means in the past, I prepare the test
> file with buffer IO. and before testing, I do invalidate device cache, file
> system cache. but the buffered prepared file still get better performance, I
> have no idea of it. Since I don't want any cache.  I use O_DIRECT
> option at last. The seq_read_lat changed from 9us to 15us.
> I still can't find out the 25% difference with the FIO result. (12 us vs 9 us).
>
> At last, the random_page_cost happens to not change very much.
>
> /u/y/g/fdirect> sudo ./main
> fs_cache_lat = 0.569031us, seq_read_lat = 18.901749us, random_page_lat = 148.650589us
>
> cache hit ratio: 1.000000 random_page_cost 1.000000
> cache hit ratio: 0.900000 random_page_cost 6.401019
> cache hit ratio: 0.500000 random_page_cost 7.663772
> cache hit ratio: 0.100000 random_page_cost 7.841498
> cache hit ratio: 0.000000 random_page_cost 7.864383
>
> This result looks much different from "we should use 1.1 ~ 1.5 for SSD".
>

Very interesting. Thanks for working on this. In an earlier email you
mentioned that TPCH plans changed to efficient ones when you changed
random_page_cost = =8.6 from 4 and seq_page_cost was set to 1. IIUC,
setting random_page_cost to seq_page_cost to the same ratio as that
between the corresponding latencies improved the plans.

Yes. 

How about
trying this with that ratio set to the one obtained from the latencies
provided by FIO? Do we see any better plans?

My tools set the random_page_cost to 8.6, but based on the fio data, it should be
set to 12.3 on the same hardware.  and I do see the better plan as well with 12.3.
Looks too smooth to believe it is true..

The attached result_fio_mytool.tar.gz is my test result.   You can use git show HEAD^^
is the original plan with 8.6.  git show HEAD^  show the plan changes after we changed
the random_page_cost. git show HEAD shows the run time statistics changes for these queries. 
I also uploaded the test tool [1] for this for your double check. 


|     |      8.6 |     12.3 |

|-----+----------+----------|

| Q2  | 2557.064 | 2444.995 |

| Q4  | 3544.606 | 3148.884 |

| Q7  | 2965.820 | 2240.185 |

| Q14 | 4988.747 | 4931.375 |

 

page cost is one thing, but there are CPU costs also involved in costs
of expression evaluation. Should those be changed accordingly to the
CPU latency?

Yes, we need that as well.  At the beginning of this thread, I treat all of them equally. 
In the first reply of Tomas,  he mentioned random_page_cost specially. After
~10 months, I tested TPCH on a hardware and then found random_page_cost
is set too incorrectly, after fixing it,  the result looks much better.  So I'd like to work
on this special thing first. 

 
--
Best Regards
Andy Fan
Attachment

pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: Use appendStringInfoString and appendPQExpBufferStr where possible
Next
From: Andy Fan
Date:
Subject: Re: Dynamic gathering the values for seq_page_cost/xxx_cost