Home > mailing lists

Re: Dynamic gathering the values for seq_page_cost/xxx_cost - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Date	September 26, 2020 00:16:52
Msg-id	CAKU4AWqx_6k9D+2Px-7kRv4SnX-OFCRo3Dyxy+CVJiejsJKgcA@mail.gmail.com Whole thread Raw
In response to	Re: Dynamic gathering the values for seq_page_cost/xxx_cost (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses	Re: Dynamic gathering the values for seq_page_cost/xxx_cost
List	pgsql-hackers

Tree view

On Fri, Sep 25, 2020 at 5:15 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Sep 22, 2020 at 10:57 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>
> My tools set the random_page_cost to 8.6, but based on the fio data, it should be
> set to 12.3 on the same hardware. and I do see the better plan as well with 12.3.
> Looks too smooth to believe it is true..
>
> The attached result_fio_mytool.tar.gz is my test result. You can use git show HEAD^^
> is the original plan with 8.6. git show HEAD^ show the plan changes after we changed
> the random_page_cost. git show HEAD shows the run time statistics changes for these queries.
> I also uploaded the test tool [1] for this for your double check.

The scripts seem to start and stop the server, drop caches for every
query. That's where you are seeing that setting random_page_cost to
fio based ratio provides better plans. But in practice, these costs
need to be set on a server where the queries are run concurrently and
repeatedly. That's where the caching behaviour plays an important

role. Can we write a tool which can recommend costs for that scenario?

I totally agree with this. Actually the first thing I did is to define a
proper IO workload. At the very beginning, I used DIRECT_IO for both seq read
and random read on my SSD, and then found the result is pretty bad per testing
(random_page_cost = ~1.6). then I realized postgresql relies on the prefetch
which is disabled by DIRECT_IO. After I fixed this, I tested again with the above
scenario (cache hit ratio = 0) to verify my IO model. Per testing, it looks good.

I am also thinking if the random_page_cost = 1.1 doesn't provide a good result

on my SSD because it ignores the prefects of seq read.

After I am OK with my IO model, I test with the way you see above. but

I also detect the latency for file system cache hit, which is handled by

get_fs_cache_latency_us in my code (I ignored the shared buffer hits for now).

and allows user to provides a cache_hit_ratio, the final random_page_cost

= (real_random_lat) / real_seq_lat, where

real_xxx_lat = cache_hit_ratio * fs_cache_lat + (1 - cache_hit_ratio) * xxx_lat.

See function cal_real_lat and cal_random_page_cost.

As for the testing with cache considered, I found how to estimate cache hit

ratio is hard or how to control a hit ratio to test is hard. Recently I am thinking

a method that we can get a page_reads, shared_buffer_hit from pg_kernel

and the real io (without the file system cache hit) at os level (just as what

iotop/pidstat do). then we can know the shared_buffer hit ratio and file system

cache hit ratio (assume it will be stable after a long run). and then do a testing.

However this would be another branch of manual work and I still have not got

it done until now.

I'd not like to share too many details, but "lucky" many cases I have haven't file

system cache, that makes things a bit easier. What I am doing right now is to

calculate the random_page_cost with the above algorithm with only shared_buffer

considered. and test the real benefits with real workload to see how it works.

If it works well, I think the only thing left is to handle file system cache.

The testing is time consuming since I have to cooperate with many site engineers,

so any improvement on the design will be much helpful.

How do the fio based cost perform when the queries are run repeatedly?

That probably is not good since I have 280G+ file system cache and I have to

prepare much more than 280G data size for testing.

Best Regards

Andy Fan

pgsql-hackers by date:

From: Julien Rouhaud
Date: 26 September 2020, 00:09:31
Subject: Re: Optimize memory allocation code

From: Li Japin
Date: 26 September 2020, 00:32:28
Subject: Re: Optimize memory allocation code

Re: Dynamic gathering the values for seq_page_cost/xxx_cost - Mailing list pgsql-hackers

Previous

Next