Thread: Estimating seq_page_fetch and random_page_fetch
"Umar Farooq Minhas" <umarfm13@hotmail.com> writes: > How can we accrately estimate the "seq_page_fetch" and = > "random_page_fetch" costs from outside the postgres using for example a = > C routine. Use a test case larger than memory. Repeat many times to average out noise. IIRC, when I did the experiments that led to the current random_page_cost of 4.0, it took about a week before I had numbers I trusted. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Umar Farooq Minhas" <umarfm13@hotmail.com> writes: >> How can we accrately estimate the "seq_page_fetch" and = >> "random_page_fetch" costs from outside the postgres using for example a = >> C routine. > > Use a test case larger than memory. Repeat many times to average out > noise. IIRC, when I did the experiments that led to the current > random_page_cost of 4.0, it took about a week before I had numbers I > trusted. When I was running tests I did it on a filesystem where nothing else was running. Between tests I unmounted and remounted it. As I understand it Linux associates the cache with the filesystem and not the block device and discards all pages from cache when the filesystem is unmounted. That doesn't contradict anything Tom said, it might be useful as an additional tool though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
<p><font size="2">Adding to this:<br /><br /> Ayush recently wrote a C program that emulates PG IO to do this analysis, andwe came out with (predictably) a ratio of sequential/random of 20-50 (for a single user). This is predictable becausethe random component is fixed at the access time of a single hard drive no matter how many disks are in an array,while the sequential scales nearly linearly with the number of drives in the array.<br /><br /> So, you can estimaterandom using 8-12ms per random access, and sequential as 1/(number of disks X 60-130MB/s).<br /><br /> Ayush, canyou forward your C program?<br /><br /> - Luke<br /><br /> Msg is shrt cuz m on ma treo<br /><br /> -----Original Message-----<br/> From: Gregory Stark [<a href="mailto:stark@enterprisedb.com">mailto:stark@enterprisedb.com</a>]<br />Sent: Thursday, March 08, 2007 12:37 PM Eastern Standard Time<br /> To: Tom Lane<br /> Cc: Umar Farooq Minhas;pgsql-hackers@postgresql.org<br /> Subject: Re: [HACKERS] Estimating seq_page_fetch and random_page_fetch<br/><br /><br /> "Tom Lane" <tgl@sss.pgh.pa.us> writes:<br /><br /> > "Umar Farooq Minhas" <umarfm13@hotmail.com>writes:<br /> >> How can we accrately estimate the "seq_page_fetch" and =<br /> >>"random_page_fetch" costs from outside the postgres using for example a =<br /> >> C routine.<br /> ><br/> > Use a test case larger than memory. Repeat many times to average out<br /> > noise. IIRC, when I didthe experiments that led to the current<br /> > random_page_cost of 4.0, it took about a week before I had numbersI<br /> > trusted.<br /><br /> When I was running tests I did it on a filesystem where nothing else was<br /> running.Between tests I unmounted and remounted it. As I understand it Linux<br /> associates the cache with the filesystemand not the block device and discards<br /> all pages from cache when the filesystem is unmounted.<br /><br />That doesn't contradict anything Tom said, it might be useful as an additional<br /> tool though.<br /><br /> --<br /> Gregory Stark<br /> EnterpriseDB <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br/><br /> ---------------------------(end of broadcast)---------------------------<br/> TIP 3: Have you checked our extensive FAQ?<br /><br /> <a href="http://www.postgresql.org/docs/faq">http://www.postgresql.org/docs/faq</a><br/><br /></font>
On Thu, Mar 08, 2007 at 05:35:03PM +0000, Gregory Stark wrote: > > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > > "Umar Farooq Minhas" <umarfm13@hotmail.com> writes: > >> How can we accrately estimate the "seq_page_fetch" and = > >> "random_page_fetch" costs from outside the postgres using for example a = > >> C routine. > > > > Use a test case larger than memory. Repeat many times to average out > > noise. IIRC, when I did the experiments that led to the current > > random_page_cost of 4.0, it took about a week before I had numbers I > > trusted. > > When I was running tests I did it on a filesystem where nothing else was > running. Between tests I unmounted and remounted it. As I understand it Linux > associates the cache with the filesystem and not the block device and discards > all pages from cache when the filesystem is unmounted. > > That doesn't contradict anything Tom said, it might be useful as an additional > tool though. Another trick I've used in the past is to just run the machine out of memory, using the following: /** $Id: clearmem.c,v 1.1 2003/06/29 20:41:33 decibel Exp $** Utility to clear out a chunk of memory and zero it. Usefulfor flushing disk buffers*/ int main(int argc, char *argv[]) { if (!calloc(atoi(argv[1]), 1024*1024)) { printf("Error allocating memory.\n"); } } I'll monitor top while that's running to ensure that some stuff gets swapped out to disk. I believe this might still leave some cached data in other areas of the kernel, but it's probably not enough to worry about. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Thu, 2007-03-08 at 17:35 +0000, Gregory Stark wrote: > When I was running tests I did it on a filesystem where nothing else was > running. Between tests I unmounted and remounted it. As I understand it Linux > associates the cache with the filesystem and not the block device and discards > all pages from cache when the filesystem is unmounted. On recent Linux kernels, /proc/sys/vm/drop_caches can also be useful: http://linux.inet.hr/proc_sys_vm_drop_caches.html You could also use posix_fadvise() to achieve a similar effect on a per-file basis. -Neil
----- Original Message -----From: Luke LonerganSent: Thursday, March 08, 2007 2:16 PMSubject: Re: [HACKERS] Estimating seq_page_fetch and random_page_fetchAdding to this:
Ayush recently wrote a C program that emulates PG IO to do this analysis, and we came out with (predictably) a ratio of sequential/random of 20-50 (for a single user). This is predictable because the random component is fixed at the access time of a single hard drive no matter how many disks are in an array, while the sequential scales nearly linearly with the number of drives in the array.
So, you can estimate random using 8-12ms per random access, and sequential as 1/(number of disks X 60-130MB/s).
Ayush, can you forward your C program?
- Luke
Msg is shrt cuz m on ma treo
-----Original Message-----
From: Gregory Stark [mailto:stark@enterprisedb.com]
Sent: Thursday, March 08, 2007 12:37 PM Eastern Standard Time
To: Tom Lane
Cc: Umar Farooq Minhas; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Estimating seq_page_fetch and random_page_fetch
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
> "Umar Farooq Minhas" <umarfm13@hotmail.com> writes:
>> How can we accrately estimate the "seq_page_fetch" and =
>> "random_page_fetch" costs from outside the postgres using for example a =
>> C routine.
>
> Use a test case larger than memory. Repeat many times to average out
> noise. IIRC, when I did the experiments that led to the current
> random_page_cost of 4.0, it took about a week before I had numbers I
> trusted.
When I was running tests I did it on a filesystem where nothing else was
running. Between tests I unmounted and remounted it. As I understand it Linux
associates the cache with the filesystem and not the block device and discards
all pages from cache when the filesystem is unmounted.
That doesn't contradict anything Tom said, it might be useful as an additional
tool though.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
On Thu, Mar 08, 2007 at 07:01:17PM -0500, Umar Farooq Minhas wrote: > displayed, I want cpu cost and io cost displayed separated when i run > EXPLAIN on a particular query. Till now I haven't been able to figure > out a 'clean' way of doing this. Can anyone tell me how much time > should I expect to spend making such a change ? and from where should > I start ? costsize.c ? That's going to be a lot of work. You need to duplicate the variable and eery usage of that variable. And I can't imagine why you'd be interested anyway... > I have another question. Looking at the optimizer code, it pretty > much looks insensitive to the memory factor. The only parameters > being utilized are the "effective_cache_size" ( in estimating index > cost only) and "work_mem" for (sort, aggregation, groups, hash/merge > joins). Are these the only memory factors that DIRECTLY effect the > cost estimates of the planner/optimizer? Sure, what other factors were you considering? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
It would be interested to see some code here. Maybe this would be a great oportunity to start - some sort of 'auto- tune' (as an option), in the area. -- GJ C/C++/SQL freelance to hire.