Re: Indexes and statistics - Mailing list pgsql-sql

From Tom Lane
Subject Re: Indexes and statistics
Date
Msg-id 23562.1077117691@sss.pgh.pa.us
Whole thread Raw
In response to Re: Indexes and statistics  ("David Witham" <davidw@unidial.com.au>)
List pgsql-sql
"David Witham" <davidw@unidial.com.au> writes:
> Does this make it a "wide" table?

Nope.  A rough estimate is that your rows will be about 160 bytes wide,
which means you can fit about 50 per 8K page.  So a query that needs to
select 8% of the table will *on average* need to hit about 4 rows per
page.  In the absence of any data clumping this would certainly mean
that the scan would need to touch every page anyway, and thus that using
the index could provide no I/O savings.  However,

> The data arrives ordered by service_num, day, time. This customer has
> one primary service_num that most of the calls are made from.

So you do have very strong clumping, which the planner is evidently
failing to account for properly.  Could we see the pg_stats rows for
service_num and cust_id?  I'm curious whether the ANALYZE stats picked
up the effect at all.

As far as actually solving the problem is concerned, you have a few
options.  I wouldn't recommend turning off enable_seqscan globally,
but you could perhaps turn it off locally (just do a SET command) just
for this query.  Another possibility, if you care a lot about the speed
of this particular type of query, is to make a partial index tuned to
the query:create index my_idx on cdr (cust_id) WHERE bill_id IS NULL;
I gather from your previously shown results that "bill_id IS NULL"
covers only a small fraction of the table, so this index would be pretty
small and should look quite attractive to the planner.
        regards, tom lane


pgsql-sql by date:

Previous
From: Dana Hudes
Date:
Subject: Re: bytea or blobs?
Next
From: Tom Lane
Date:
Subject: Re: CHAR(n) always trims trailing spaces in 7.4