Re: A costing analysis tool - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: A costing analysis tool
Date
Msg-id 20051013194110.GD6080@svana.org
Whole thread Raw
In response to Re: A costing analysis tool  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
On Thu, Oct 13, 2005 at 01:52:10PM -0500, Kevin Grittner wrote:
> Thanks, Josh, for the feedback.
>
> It sounds as though you are more focused on picking up costing
> problems which happen during production -- which is clearly
> valuable, but addresses a somewhat different set of needs than
> I was looking at.  That said, it seems like there is potential to share
> signifcant code between the two techniques.  We'll have to see if
> we can work that out.

Firstly, I really hope you get further with this than I did a while ago
when I attempted. It's certainly a worthly goal.

Secondly, while checking for problems in productions systems is good,
it's not going to help with fixing the cost model. For that you need
raw data.

My basic plan was to setup tables of different sizes and attempt to run
queries such as:

- Index Scan on each table with different types of keys and coverage.
- Seq Scan
- Nested loop, etc...

I did reach the point where I was wishing I could just give PostgreSQL
the plan and tell it to execute it. :) The point of the exercise is to
be able to derive correlations so you could from the plan calcuate the
actual costs. For example, run a nested loop with an inner index scan
once, twice, three times etc so we can actually *see* what the cache
effects are.

I got stuck on working out how to force the optimiser to produce the
plan I want. I didn't try too hard though. The enable_xxx options
should be enough, hopefully. Ofcourse you want to run it with different
numbers of shared buffers to see how they affect the results.

And then you ideally want the results for several different machines,
different disk subsystems, memory types, etc and placed on a nice web
page so other people can run correlations on the data themselves.

This is essentially what you already came up with. Note that for these
purposes the actual estimates by PostgreSQL are irrelevent. However, I
strongly suggest finding a way of collating the results publically from
lots of people because digging for correlations is something lots of
people can hammer on and is really hard to program.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

Previous
From: Thomas Hallgren
Date:
Subject: Re: pg_config --pgxs on Win32
Next
From: Tom Lane
Date:
Subject: Re: pg_config --pgxs on Win32