Home > mailing lists

Re: again on index usage - Mailing list pgsql-hackers

From	Daniel Kalchev
Subject	Re: again on index usage
Date	January 9, 2002 18:15:25
Msg-id	200201092006.WAA22792@dcave.digsys.bg Whole thread Raw
In response to	Re: again on index usage (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: again on index usage
List	pgsql-hackers

Tree view

>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I understand the clustering might help with sequential
scans,but why > > would it help with index scans?> > No, the other way around: it makes no difference for seq scans,
butcan> speed up index scans quite a lot.  With a clustered table, successive> index-driven fetches tend to hit the
samepages rather than hitting> random pages throughout the table.  That saves I/O.

Ok, time to go home :-), but...
> Given the numbers you were quoting, if the table were in perfectly> random order by ipdate then there would probably
havebeen about three> rows per page that the indexscan would've had to fetch.  This would mean> touching each page
threetimes in some random order.  Unless the table> is small enough to fit in Postgres' shared buffer cache, that's
goingto> represent a lot of extra I/O --- a lot more than reading each page only> once, as a seqscan would do.  At the
otherextreme, if the table is> perfectly ordered by ipdate then the indexscan need only hit a small> number of pages
(allthe rows we want are in a narrow range) and we> touch each page many times before moving on to the next.  Very few
I/O>requests in that case.

In any case, if we need to hit 50k pages (assuming the indexed data is 
randomly scattered in the file), and having to read these three times each, it 
will be less I/O than having to read 1.7 million records. The table will never 
be laid sequentially on the disk, at least not in this case (which adds to the 
table day after day - and this is why data is almost ordered by ipdate).

What I am arguing about is the scaling - is 50k random reads worse than 1.7 
million sequential reads? Eventually considering the tuple size, disk block 
size etc.

I will wait patiently for 4.2 to release and see how this same table performs. 
:-)

Daniel

pgsql-hackers by date:

From: "Marc G. Fournier"
Date: 09 January 2002, 17:54:51
Subject: Re: RC1 time?

From: Tom Lane
Date: 09 January 2002, 18:25:05
Subject: Re: RC1 time?

Re: again on index usage - Mailing list pgsql-hackers

Previous

Next