Home > mailing lists

Re: again on index usage - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: again on index usage
Date	January 9, 2002 18:25:06
Msg-id	14755.1010607659@sss.pgh.pa.us Whole thread Raw
In response to	Re: again on index usage (Daniel Kalchev <daniel@digsys.bg>)
Responses	Re: again on index usage (Daniel Kalchev <daniel@digsys.bg>)
List	pgsql-hackers

Tree view

Daniel Kalchev <daniel@digsys.bg> writes:
> In any case, if we need to hit 50k pages (assuming the indexed data is 
> randomly scattered in the file), and having to read these three times each, it 
> will be less I/O than having to read 1.7 million records.

How do you arrive at that?  Assuming 100 records per page (probably the
right order of magnitude), the seqscan alternative is 17k page reads.
Yes, you examine more tuples, but CPUs are lots faster than disks.

That doesn't even address the fact that Unix systems reward sequential
reads and penalize random access.  In a seqscan, we can expect that the
kernel will schedule the next page read before we ask for it, so that
our CPU time to examine a page is overlapped with I/O for the next page.
In an indexscan that advantage goes away, because neither we nor the
kernel know which page will be touched next.  On top of the loss of
read-ahead, the filesystem is probably laid out in a way that rewards
sequential access with fewer and shorter seeks.

The tests I've made suggest that the penalty involved is about a factor
of four -- ie, a seqscan can scan four pages in the same amount of time
that it takes to bring in one randomly-accessed page.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 09 January 2002, 18:25:05
Subject: Re: RC1 time?

From: Tom Lane
Date: 09 January 2002, 18:35:15
Subject: Re: Some architectures need "signed char" declarations

Re: again on index usage - Mailing list pgsql-hackers

Previous

Next