Re: Indexes and statistics - Mailing list pgsql-sql

From David Witham
Subject Re: Indexes and statistics
Date
Msg-id CFA248776934FD43847E740E43C346D199DC18@ozimelb03.ozicom.com
Whole thread Raw
In response to Indexes and statistics  ("David Witham" <davidw@unidial.com.au>)
Responses Re: Indexes and statistics  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Tom,

I'm running PostgreSQL 7.3.2 on Red Hat Linux 7.3 with 512Mb RAM.

The table definition is:
                   Table "public.cdr"       Column         |         Type          | Modifiers
-----------------------+-----------------------+-----------carrier_id            | integer               | not
nullfile_id              | integer               | not nullservice_num           | character varying(10) | not nullday
                | date                  | not nulltime                  | integer               | not nulldestination
       | character varying(20) | not nullduration              | integer               | not nullcharge_wholesale
|numeric(8,2)          | not nullcharge_band_id        | integer               | charge_retail         | numeric(8,2)
      | not nullrate_plan_id          | integer               | not nullitem_code             | integer               |
notnullcust_id               | integer               | not nullbill_id               | integer               | prefix
            | character varying(12) | charge_wholesale_calc | numeric(8,2)          |  
Indexes: cdr_ix1 btree ("day"),        cdr_ix2 btree (service_num),        cdr_ix3 btree (cust_id),        cdr_ix4
btree(bill_id),        cdr_ix5 btree (carrier_id),        cdr_ix6 btree (file_id) 

Does this make it a "wide" table?

The data arrives ordered by service_num, day, time. This customer has one primary service_num that most of the calls
aremade from. Therefore each day a clump of CDRs will be loaded for that customer, interspersed with CDRs from all the
othercustomers. Therefore the distribution of records for a service_num is clumpy but evenly distributed throughout the
table.For a customer with a single primary number, this result applies to the customer as a whole. For a customer with
manyservice_num's the result is a little more doubtful depending on whether their service_num's arrive sequentially or
not.This would not necessarily be the case. 

I hope this makes sense. Does it help any?

Thanks,
David

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, 18 February 2004 16:10
To: David Witham
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] Indexes and statistics


"David Witham" <davidw@unidial.com.au> writes:
> One of the customers is quite large (8.3% of the records):

Hmm.  Unless your rows are quite wide, a random sampling of 8.3% of the
table would be expected to visit every page of the table, probably
several times.  So the planner's cost estimates do not seem out of line
to me; an indexscan *should* be slow.  The first question to ask is why
the deviation from reality.  Are the rows for that customer ID likely to
be physically concentrated into a limited number of physical pages?
Do you have so much RAM that the whole table got swapped in, eliminating
the extra I/O that the planner is expecting?
        regards, tom lane


pgsql-sql by date:

Previous
From: sad
Date:
Subject: Re: bytea or blobs?
Next
From: Dana Hudes
Date:
Subject: Re: bytea or blobs?