Thread: Sequential Scans
Hi, I have a table with about 3.2 Million records. There is one process that I run that needs to process batches of records 1000 at a time out of a set of approximately 220,000 records. So my query looks like this: SELECT a.*, b.url FROM listings a, urls b WHERE a.urlindex=b.index AND a.haslid=1 ORDER BY a.index LIMIT 1000 OFFSET 0; Doing the above query with an offset of up to 5000 (the 5th batch) shows (with EXPLAIN) that index scans are being used. Exceeding an OFFSET of 5000 produces sequential scans. The whole process goes horribly slow at that point. Any suggestions as to what may be causing this? My environment: * RH 7.3 // Postgresql 7.3.x // Latest Kernel from RH * RAM 6 Gigs * Dual Xeon CPU's * Shared Ram 250MB * VACUUM / ANALYZE run twice per day * The tables in question are mostly static (.05% updates, 1% inserts daily) - Ericson Smith eric@did-it.com -- Ericson Smith <eric@did-it.com>
On Thursday 06 Mar 2003 9:15 pm, Ericson Smith wrote: > Hi, > > I have a table with about 3.2 Million records. > > There is one process that I run that needs to process batches of records > 1000 at a time out of a set of approximately 220,000 records. > > So my query looks like this: > SELECT a.*, b.url FROM listings a, urls b WHERE a.urlindex=b.index AND > a.haslid=1 ORDER BY a.index LIMIT 1000 OFFSET 0; > > Doing the above query with an offset of up to 5000 (the 5th batch) shows > (with EXPLAIN) that index scans are being used. > > Exceeding an OFFSET of 5000 produces sequential scans. The whole process > goes horribly slow at that point. To get to an offset of 5000 it has to find the first 4999 and then ignore them. There comes a point at which the planner decides the cost of grabbing many index items exceeds the cost of just reading the table. It might be in your case the planner's estimating the wrong changeover point - read up on the runtime environment settings, I think CPU_INDEX_TUPLE_COST may be your friend (see archives for discussion). In your case though, you might want to look at using a cursor and then fetching blocks of 1000 rows at a time. -- Richard Huxton
Richard Huxton wrote: > > > I think CPU_INDEX_TUPLE_COST may be your >friend (see archives for discussion). > > > Might be worth looking at RANDOM_PAGE_COST as well ( going down to 1 or even fractional values) best wishes Mark
Well, actually... I shifted to using cursors instead of those LIMIT, OFFSETS. Things are lightning fast now. In fact in many batch processing operations, we are going to be using those cursors in the future. - Ericson Smith eric@did-it.com On Mon, 2003-03-10 at 05:45, Mark Kirkwood wrote: > Richard Huxton wrote: > > > > > > > I think CPU_INDEX_TUPLE_COST may be your > >friend (see archives for discussion). > > > > > > > > Might be worth looking at RANDOM_PAGE_COST as well ( going down to 1 or > even fractional values) > > best wishes > > Mark > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Ericson Smith <eric@did-it.com>