Home > mailing lists

Re: again on index usage - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: again on index usage
Date	January 9, 2002 14:54:51
Msg-id	14552.1010605708@sss.pgh.pa.us Whole thread Raw
In response to	Re: again on index usage (Daniel Kalchev <daniel@digsys.bg>)
Responses	Re: again on index usage
List	pgsql-hackers

Tree view

Daniel Kalchev <daniel@digsys.bg> writes:
> I understand the clustering might help with sequential scans, but why 
> would it help with index scans?

No, the other way around: it makes no difference for seq scans, but can
speed up index scans quite a lot.  With a clustered table, successive
index-driven fetches tend to hit the same pages rather than hitting
random pages throughout the table.  That saves I/O.

Given the numbers you were quoting, if the table were in perfectly
random order by ipdate then there would probably have been about three
rows per page that the indexscan would've had to fetch.  This would mean
touching each page three times in some random order.  Unless the table
is small enough to fit in Postgres' shared buffer cache, that's going to
represent a lot of extra I/O --- a lot more than reading each page only
once, as a seqscan would do.  At the other extreme, if the table is
perfectly ordered by ipdate then the indexscan need only hit a small
number of pages (all the rows we want are in a narrow range) and we
touch each page many times before moving on to the next.  Very few I/O
requests in that case.

7.1 does not have any statistics about table order, so it uses the
conservative assumption that the ordering is random.  7.2 has more
statistical data and perhaps will make better estimates about the
cost of indexscans.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 09 January 2002, 14:24:57
Subject: Re: Some architectures need "signed char" declarations

From: Daniel Kalchev
Date: 09 January 2002, 15:15:25
Subject: Re: again on index usage

Re: again on index usage - Mailing list pgsql-hackers

Previous

Next