Re: Super Optimizing Postgres - Mailing list pgsql-hackers

From mlw
Subject Re: Super Optimizing Postgres
Date
Msg-id 3BF5A2EB.801E1711@mohawksoft.com
Whole thread Raw
In response to Re: Super Optimizing Postgres  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian wrote:
> 
> > Does sort memory come out of shared? I don't think so (would it
> > need too?), but "Cache Size and Sort Size " seems to imply that
> > it does.
> 
> Sort comes from per-backend memory, not shared.  Of course, both
> per-backend and shared memory come from the same pool of RAM, if that's
> what you mean.  Could it be made clearer?

Actually, in most cases, RAM is ram, and shared ram is just the same ram.
However, on some cluster environments, shared ram is a different memory pool
than process ram.

In your section: "Cache Size and Sort Size" You talk about both and shared
memory, but make no distinction about which uses what. I would suggest an
explicit sentence about how Cache comes from the shared memory pool and Sort
comes from the process memory pool.


> 
> > Also, you don't go into the COST variables. If what is documented
> > about them is correct, they are woefully incorrect with a modern
> > machine.
> 
> You mean:
> 
>         #random_page_cost = 4
>         #cpu_tuple_cost = 0.01
>         #cpu_index_tuple_cost = 0.001
>         #cpu_operator_cost = 0.0025
> 
> Thos are relative, of course.  We are always looking for better numbers.
> 
> > Would a 1.3 ghz Athlon really have a cpu_operator_cost of 0.0025?
> > That would imply that that computer could process 2500 conditionals
> > in the time it would take to make a sequential read. If Postgres
> > is run on a 10K RPM disk vs a 5.4K RPM disk on two different
> > machines with the same processor and speed, these numbers can't
> > hope to be right, one should be about twice as high as the other.
> 
> Again, are the correct relative to each other.

They can't possibly be correct. If We have two identical machines where the
only difference is the disk subsystem, one has a 10K RPM SCSI system, and the
other is a 5.4K RPM IDE disk. There is no way these settings can be accurate.

> 
> > That said, do these numbers really affect the planner all that
> > much?
> 
> Sure do effect the planner.  That is how index scan vs sequential and
> join type are determined.

OK, then it should be fairly straight forward to make a profiler for Postgres
to set these parameters.

Sequential and random read test, these are a no brainer.

The cpu costs are not so easy. I don't have a very good idea about what they
"really" mean. I have a guess, but not enough to make a benchmark routine.

If someone who REALLY knows could detail a test routine for each of the cpu
cost types. I could write a program that will spit out what the numbers should
be.

I envision:

pgprofile /u01/postgres/test.file

And that would output something like:

random_page_cost = 2
cpu_tuple_cost = 0.00344
cpu_index_tuple_cost = 0.00234
cpu_operator_cost = 0.00082


pgsql-hackers by date:

Previous
From: Alex Pilosov
Date:
Subject: Re: Super Optimizing Postgres
Next
From: Tom Lane
Date:
Subject: TOAST performance (was Re: [GENERAL] Delete Performance)