Thread: Sequential Scans

Sequential Scans

From
Ericson Smith
Date:
Hi,

I have a table with about 3.2 Million records.

There is one process that I run that needs to process batches of records
1000 at a time out of a set of approximately 220,000 records.

So my query looks like this:
SELECT a.*, b.url FROM listings a, urls b WHERE a.urlindex=b.index AND
a.haslid=1 ORDER BY a.index LIMIT 1000 OFFSET 0;

Doing the above query with an offset of up to 5000 (the 5th batch) shows
(with EXPLAIN) that index scans are being used.

Exceeding an OFFSET of 5000 produces sequential scans. The whole process
goes horribly slow at that point.

Any suggestions as to what may be causing this?

My environment:
* RH 7.3 // Postgresql 7.3.x // Latest Kernel from RH
* RAM 6 Gigs
* Dual Xeon CPU's
* Shared Ram 250MB
* VACUUM / ANALYZE run twice per day
* The tables in question are mostly static (.05% updates, 1% inserts
daily)

- Ericson Smith
eric@did-it.com



--
Ericson Smith <eric@did-it.com>


Re: Sequential Scans

From
Richard Huxton
Date:
On Thursday 06 Mar 2003 9:15 pm, Ericson Smith wrote:
> Hi,
>
> I have a table with about 3.2 Million records.
>
> There is one process that I run that needs to process batches of records
> 1000 at a time out of a set of approximately 220,000 records.
>
> So my query looks like this:
> SELECT a.*, b.url FROM listings a, urls b WHERE a.urlindex=b.index AND
> a.haslid=1 ORDER BY a.index LIMIT 1000 OFFSET 0;
>
> Doing the above query with an offset of up to 5000 (the 5th batch) shows
> (with EXPLAIN) that index scans are being used.
>
> Exceeding an OFFSET of 5000 produces sequential scans. The whole process
> goes horribly slow at that point.

To get to an offset of 5000 it has to find the first 4999 and then ignore
them. There comes a point at which the planner decides the cost of grabbing
many index items exceeds the cost of just reading the table. It might be in
your case the planner's estimating the wrong changeover point - read up on
the runtime environment settings, I think CPU_INDEX_TUPLE_COST may be your
friend (see archives for discussion).

In your case though, you might want to look at using a cursor and then
fetching blocks of 1000 rows at a time.

--
  Richard Huxton

Re: Sequential Scans

From
Mark Kirkwood
Date:
Richard Huxton wrote:

>
>
> I think CPU_INDEX_TUPLE_COST may be your
>friend (see archives for discussion).
>
>
>

Might be worth looking at RANDOM_PAGE_COST as well ( going down to 1 or
even fractional values)

best wishes

Mark


Re: Sequential Scans

From
Ericson Smith
Date:
Well, actually... I shifted to using cursors instead of those LIMIT,
OFFSETS. Things are lightning fast now. In fact in many batch processing
operations, we are going to be using those cursors in the future.

- Ericson Smith
eric@did-it.com

On Mon, 2003-03-10 at 05:45, Mark Kirkwood wrote:
> Richard Huxton wrote:
>
> >
> >
> > I think CPU_INDEX_TUPLE_COST may be your
> >friend (see archives for discussion).
> >
> >
> >
>
> Might be worth looking at RANDOM_PAGE_COST as well ( going down to 1 or
> even fractional values)
>
> best wishes
>
> Mark
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
--
Ericson Smith <eric@did-it.com>