Re: statistics - Mailing list pgsql-performance

From Paul Thomas
Subject Re: statistics
Date
Msg-id 20040408004835.A8476@bacon
Whole thread Raw
In response to statistics  ("Jaime Casanova" <el_vigia_ec@hotmail.com>)
List pgsql-performance
On 07/04/2004 22:05 Jaime Casanova wrote:
> What the statistics are? Where can i view it? where can i find info
> about its field and why are they valuable information to performance?
>
> thanx in advance,
>
> Jaime Casanova


OK. An idiot's guide to statistics by a full-time idiot...

Let's start with a simple premise. I'm a RDBMS (forget that I'm actually
an idiot for a moment...) and I've been asked for

select * from foo where bar = 7;

How do I go about fulfilling the reequest in the most efficient manner?
(i.e., ASAP!)

One way might be to read through the whole table and return only those
rows which match the where criteron - a sequential scan on the table.

But wait a minute, there is an index on column bar. Could I use this
instead? Well, of course, I could use it but I have to keep sight of the
goal of returning the data ASAP and I know that the act of reading
index/reading table/... will have a performance penalty due to a lot more
head movement on the disk. So how do I make chose between a sequential
scan and an index scan? Let's lokk at a couple of extreme scenarios:

1) let's look at the condition where all or virtually all of the bar
columns are populated wityh the value 7. In this case it would be more
efficient to read sequentially through the table.

2) the opposite of (1) - very few of the bar columns have the value 7. In
this case using the index could be a winner.

So generalising, I need to be able to estimate whether doing a sequential
scan is more efficient that an index scan and this comes down to 2 factors:

a) the cost of moving the disk heads all over the place (random page cost)
b) the spread of values in the selecting column(s)

(a) is specfified in postgresql.conf (see archives for much discusion
about what the value should be..)
(b) is determined by the dastardly trick of actually sampling the data in
the table!!! That's what analyze does. It samples your table(s) and uses
the result to feeede into it's descision about when to flip between
sequential and index scans.

Hope this makes some kind of sense...

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for
Business             |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: good pc but bad performance,why?
Next
From: Tom Lane
Date:
Subject: Re: good pc but bad performance,why?