>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I understand the clustering might help with sequential
scans,but why > > would it help with index scans?> > No, the other way around: it makes no difference for seq scans,
butcan> speed up index scans quite a lot. With a clustered table, successive> index-driven fetches tend to hit the
samepages rather than hitting> random pages throughout the table. That saves I/O.
Ok, time to go home :-), but...
> Given the numbers you were quoting, if the table were in perfectly> random order by ipdate then there would probably
havebeen about three> rows per page that the indexscan would've had to fetch. This would mean> touching each page
threetimes in some random order. Unless the table> is small enough to fit in Postgres' shared buffer cache, that's
goingto> represent a lot of extra I/O --- a lot more than reading each page only> once, as a seqscan would do. At the
otherextreme, if the table is> perfectly ordered by ipdate then the indexscan need only hit a small> number of pages
(allthe rows we want are in a narrow range) and we> touch each page many times before moving on to the next. Very few
I/O>requests in that case.
In any case, if we need to hit 50k pages (assuming the indexed data is
randomly scattered in the file), and having to read these three times each, it
will be less I/O than having to read 1.7 million records. The table will never
be laid sequentially on the disk, at least not in this case (which adds to the
table day after day - and this is why data is almost ordered by ipdate).
What I am arguing about is the scaling - is 50k random reads worse than 1.7
million sequential reads? Eventually considering the tuple size, disk block
size etc.
I will wait patiently for 4.2 to release and see how this same table performs.
:-)
Daniel