Re: again on index usage - Mailing list pgsql-hackers

From Tom Lane
Subject Re: again on index usage
Date
Msg-id 14755.1010607659@sss.pgh.pa.us
Whole thread Raw
In response to Re: again on index usage  (Daniel Kalchev <daniel@digsys.bg>)
Responses Re: again on index usage  (Daniel Kalchev <daniel@digsys.bg>)
List pgsql-hackers
Daniel Kalchev <daniel@digsys.bg> writes:
> In any case, if we need to hit 50k pages (assuming the indexed data is 
> randomly scattered in the file), and having to read these three times each, it 
> will be less I/O than having to read 1.7 million records.

How do you arrive at that?  Assuming 100 records per page (probably the
right order of magnitude), the seqscan alternative is 17k page reads.
Yes, you examine more tuples, but CPUs are lots faster than disks.

That doesn't even address the fact that Unix systems reward sequential
reads and penalize random access.  In a seqscan, we can expect that the
kernel will schedule the next page read before we ask for it, so that
our CPU time to examine a page is overlapped with I/O for the next page.
In an indexscan that advantage goes away, because neither we nor the
kernel know which page will be touched next.  On top of the loss of
read-ahead, the filesystem is probably laid out in a way that rewards
sequential access with fewer and shorter seeks.

The tests I've made suggest that the penalty involved is about a factor
of four -- ie, a seqscan can scan four pages in the same amount of time
that it takes to bring in one randomly-accessed page.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: RC1 time?
Next
From: Tom Lane
Date:
Subject: Re: Some architectures need "signed char" declarations