Re: About b-tree usage - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: About b-tree usage
Date
Msg-id 1110360219.4089.483.camel@jeff
Whole thread Raw
In response to Re: About b-tree usage  (Ioannis Theoharis <theohari@ics.forth.gr>)
List pgsql-hackers
On Tue, 2005-03-08 at 15:30 +0200, Ioannis Theoharis wrote:
> 
> let me, i have turned enable_seqscan to off, in order to discourage
> optimizer to choose seq_scan whenever an idex_scan can be used.
> 
> But in this case, why optimizer don't chooses seq_scan (discourage is
> different than prevent) ?
> 

As Michael Paesold pointed out, enable_seqscan=off effectively means
that it will never choose a seqscan when there exists an alternative.

> At many cases i need only a small fragment of raws to be retrieved. But
> this extreme case is a real-scenario (not the most frequent but real).
> 

Returning all the rows in a large table is a real scenario? I would
expect you to at least use a cursor. Keep in mind that libpq has to
store all those rows in local client memory, so a cursor is a good idea
if that is the case.

And regardless, if you are returning all the rows in a table, the
absolute fastest way possible is a seq scan. An index is much slower
because it does way more work, and the disk accesses are random rather
than sequential.

> I try to find a way to achieve good performence even for the extreme
> case. Is there any way?
> 

The extreme case you provided is a SELECT without a WHERE. We already
know that PostgreSQL is executing that as efficiently as possible when
enable_seqscan=on. Why not send me a simple script that generates some
data similar to what you want to access, and then the query you'd like
to perform on that data. We might find that PostgreSQL is already
executing the query as efficiently as possible; in fact I suspect that's
the case (although perhaps some tuning would help).

In short, we shouldn't try to use indexes for all situations; they are
inherently worse for some situations. That's why PostgreSQL has both
seqscans and indexes, and an intelligent planner.

> ps. In bibliografy, there is a different alternative for indices. except
> th simple approach of <attr_val, rid> is the alternative <attr_val, set
> of rids>. The second means the attaches to each discrete attr_val the set
> o rid's of all raws with same attr_val. Is this alternative taken into
> account in postgres?
> 

I don't entirely understand what you're asking. It seems like you're
talking about a relatively minor implementation issue, and if it does
make a difference, it would probably not be very much. Perhaps rephrase
your question?

> 
> On Mon, 7 Mar 2005, Jeff Davis wrote:
> 
> >
> > In that case, sequential scan is faster, but perhaps the planner doesn't
> > know that ahead of time. Try turning on more statistics if you haven't
> > already, and then run ANALYZE again. If the planner sees a range,
> > perhaps it assumes that it is a highly selective range, when in fact, it
> > consists of all of the tuples. Also, make sure enable_seqscan is true
> > (in case you turned it off for testing or something and forgot).
> >
> > A seqscan is usually faster when a large proportion of the tuples are
> > returned because:
> > (1) It uses sequential I/O; whereas an index might access tuples in a
> > random order.
> > (2) It doesn't have to read the index's disk pages at all.
> >
> > I suspect you don't need to return all the tuples in the table. If you
> > include the details of a real scenario perhaps the people on the list
> > could be more helpful.
> >
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly



pgsql-hackers by date:

Previous
From: tzirechnoy@hotpop.com
Date:
Subject: Cost of XLogInsert CRC calculations
Next
From: Simon Riggs
Date:
Subject: Re: fool-toleranced optimizer