Re: Super Optimizing Postgres - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Super Optimizing Postgres
Date
Msg-id 3BF7578F.7000108@sid.tm.ee
Whole thread Raw
In response to Re: Super Optimizing Postgres  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers

Justin Clift wrote:

>>In my example, two computers with exactly the same hardware, except one has a
>>5400 RPM IDE drive, the other has a 10,000 RPM IDE drive. These machines should
>>not use the same settings, it is obvious that a sequential scan block read on
>>one will be faster than the other.
>>
>
>If we're going to do this bit properly, then we'll have to take into
>consideration many database objects will need their own individual
>statistics.  For example, lets say we have a database with a bunch of
>10k rpm SCSI drives which the tables are on, and the system also has one
>or more 15k rpm SCSI drives (lets say a Seagate Cheetah II drives) on
>which the indices have been placed.  With the 10k rpm drives, the tables
>needing the fastest throughput or having the highest usage are put on
>the outer edges of the disk media, and the rest of the tables are placed
>in the available space.
>
>On this theoretical system, we will be better off measuring the
>performance of each table and index in turn then generating and storing
>costs for each one which are as "accurate as possible at this point in
>time".
>
That would mean that these statistic values must be stored in pg_class 
and not be SET
variables at all.
This will probably have the added benefit that some cacheing effects of 
small/big tables
will be accounted for automatically so you dont have to do that in 
optimizer.

>  A model like this would probably have these costs re-calculated
>each time the ANALYZE command is run to ensure their accuracy through
>database growth and changes.
>
Then the ANALYZE should be run on both tables and indexes. AFAIK we 
currently
analyze only real data.

>
>I think this would be decently accurate, and RAID systems would be
>accurately analysed.  Don't know how to take into account large cache
>sizes though.  :)
>
Maybe some volatile statistict on how much of table "may be" cached in 
the disk/fs
caches, assuming that we currently know how much of each is in shared 
memory.

-----------
Hannu











pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: OCTET_LENGTH is wrong
Next
From: czl@iname.com (charles)
Date:
Subject: Re: pg locking problem