AW: [HACKERS] TODO list updated - Mailing list pgsql-hackers

From Zeugswetter Andreas SB
Subject AW: [HACKERS] TODO list updated
Date
Msg-id 219F68D65015D011A8E000006F8590C603FDC203@sdexcsrv1.f000.d0188.sd.spardat.at
Whole thread Raw
List pgsql-hackers
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > We currently do not use indexes to handle ORDER BY 
> because it is slower,
> > 
> > Er, actually, we *do* use indexes for ORDER BY currently:
> > 
> > regression=# explain select * from tenk1 order by unique1;
> > NOTICE:  QUERY PLAN:
> > Index Scan using tenk1_unique1 on tenk1  (cost=760.00 
> rows=10000 width=148)
> > 
> > If you start psql with PGOPTIONS="-fi" you can see that the 
> optimizer
> > believes an explicit sort would be much slower:
> > 
> > regression=# explain select * from tenk1 order by unique1;
> > NOTICE:  QUERY PLAN:
> > Sort  (cost=3233.91 rows=10000 width=148)
> >   ->  Seq Scan on tenk1  (cost=563.00 rows=10000 width=148)
> > 
> > but (at least on my machine) the explicit sort is marginally faster.
> > Evidently, the cost estimate for an explicit sort is *way* too high.

Doing the sort, or the index access is allways a tradeoff.
For interactive access the index is faster,
for batch mode the sort is faster.

I would try to avoid a sort, that would need more than a few
100 Mb of sort disk space, even if I would eventually get my last
row faster. 
The tradeoff is, that you wait an hour before you get the first row,
and block all those resources until you finish.

The index access gives the first rows fast, and does not block 
resources.

In mathematical terms I would give the sort an exponential cost
curve regarding sort size
(probably also dependent on ~16 * available sort memory), 
and the index access a linear cost curve.

Andreas


pgsql-hackers by date:

Previous
From: "Kevin Lam"
Date:
Subject: Re: question about MS Access connect to Postgresql 6.5.2-1
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] libpq+MB/putenv(), getenv() clean up