Re: Estimating seq_page_fetch and random_page_fetch - Mailing list pgsql-hackers

From Umar Farooq Minhas
Subject Re: Estimating seq_page_fetch and random_page_fetch
Date
Msg-id BAY104-DAV469114EFDF241D00A8179CD780@phx.gbl
Whole thread Raw
In response to Re: Estimating seq_page_fetch and random_page_fetch  ("Luke Lonergan" <LLonergan@greenplum.com>)
Responses Re: Estimating seq_page_fetch and random_page_fetch
Re: Estimating seq_page_fetch and random_page_fetch
List pgsql-hackers
Thanks a lot for your replies. The suggestions have proved much useful.
Ayush, I'm curious to see your C program, thanks.
 
Here is a related but different issue. I started looking at the postgres optimizer/planner code a month back to modify it for the purposes of experiments that I need to conduct. The EXPLAIN command prints the total costs i.e both CPU + I/O however, for my purposes I need these two costs to be separated i.e. instead of getting one cost displayed, I want cpu cost and io cost displayed separated when i run EXPLAIN on a particular query. Till now I haven't been able to figure out a 'clean' way of doing this. Can anyone tell me how much time should I expect to spend making such a change ? and from where should I start ? costsize.c ?
 
I have another question. Looking at the optimizer code, it pretty much looks insensitive to the memory factor. The only parameters being utilized are the "effective_cache_size" ( in estimating index cost only) and "work_mem" for (sort, aggregation, groups, hash/merge joins). Are these the only memory factors that DIRECTLY effect the cost estimates of the planner/optimizer?
 
Again your help is appreciated.
 
-Umar
----- Original Message -----
Sent: Thursday, March 08, 2007 2:16 PM
Subject: Re: [HACKERS] Estimating seq_page_fetch and random_page_fetch

Adding to this:

Ayush recently wrote a C program that emulates PG IO to do this analysis, and we came out with (predictably) a ratio of sequential/random of 20-50 (for a single user).  This is predictable because the random component is fixed at the access time of a single hard drive no matter how many disks are in an array, while the sequential scales nearly linearly with the number of drives in the array.

So, you can estimate random using 8-12ms per random access, and sequential as 1/(number of disks X 60-130MB/s).

Ayush, can you forward your C program?

- Luke

Msg is shrt cuz m on ma treo

 -----Original Message-----
From:   Gregory Stark [mailto:stark@enterprisedb.com]
Sent:   Thursday, March 08, 2007 12:37 PM Eastern Standard Time
To:     Tom Lane
Cc:     Umar Farooq Minhas; pgsql-hackers@postgresql.org
Subject:        Re: [HACKERS] Estimating seq_page_fetch and random_page_fetch


"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Umar Farooq Minhas" <umarfm13@hotmail.com> writes:
>> How can we accrately estimate the "seq_page_fetch" and =
>> "random_page_fetch" costs from outside the postgres using for example a =
>> C routine.
>
> Use a test case larger than memory.  Repeat many times to average out
> noise.  IIRC, when I did the experiments that led to the current
> random_page_cost of 4.0, it took about a week before I had numbers I
> trusted.

When I was running tests I did it on a filesystem where nothing else was
running. Between tests I unmounted and remounted it. As I understand it Linux
associates the cache with the filesystem and not the block device and discards
all pages from cache when the filesystem is unmounted.

That doesn't contradict anything Tom said, it might be useful as an additional
tool though.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

pgsql-hackers by date:

Previous
From: "Christian Bird"
Date:
Subject: Re: who gets paid for this
Next
From: "Christian Bird"
Date:
Subject: who gets paid for this