RE: [HACKERS] Sorting costs (was Caution: tonight's commits force initdb) - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject RE: [HACKERS] Sorting costs (was Caution: tonight's commits force initdb)
Date
Msg-id 000a01beee8f$286cb720$2801007e@cadzone.tpf.co.jp
Whole thread Raw
In response to Re: [HACKERS] Sorting costs (was Caution: tonight's commits force initdb)  (Hannu Krosing <hannu@trust.ee>)
List pgsql-hackers
> -----Original Message-----
> From: hannu@kodu.home.ee [mailto:hannu@kodu.home.ee]On Behalf Of Hannu
> Krosing
> Sent: Wednesday, August 25, 1999 4:11 AM
> To: Tom Lane
> Cc: Hiroshi Inoue; pgsql-hackers
> Subject: Re: [HACKERS] Sorting costs (was Caution: tonight's commits
> force initdb)
>
>
> Tom Lane wrote:
> >
> > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > > Hmm,Index scan is chosen to select all rows.
> > > AFAIK,sequential scan + sort is much faster than index scan in
> > > most cases.
> > >       cost of index scan < cost of sequential scan + cost of sort
> > > I have felt that the current cost estimation of index scan is
> too small,
> > > though I have no alternative.
>
> can the optimizer make use of LIMIT, or some other hint that reaction
> time is preferred over speed of full query ?
>
> In web apps the index scan may often be fastre than seq scan + sort as
> one
> may not actually need all the tuples but only a small fraction from near
> the beginning.
>
> Getting the beginning fast also gives better responsiveness for other
> interactive uses.
>

I kow there are many cases that the response to get first rows is necessary.
It's the reason that I provided a patch for descending ORDER BY cases.
I think that LIMIT is the hint to tell optimizer that the response is
necessary.
We would be able to use "LIMIT ALL" only to tell the hint if LIMIT is taken
into account.

> > I am also suspicious that indexscan costs are underestimated.  The
> > cost of reading the index is probably not too far off, but the cost
> > of accessing the main table is bogus.  Worst case, for a table whose
> > tuples are thoroughly scattered, you would have a main-table page fetch
> > for *each* returned tuple.  In practice it's probably not anywhere near
> > that bad, since you may have some clustering of tuples (so that the same
> > page is hit several times in a row), and also the Postgres buffers and
> > the underlying Unix system's disk cache will save trips to disk if there
> > is any locality of reference at all.  I have no idea how to estimate
> > that effect --- anyone?  But cost_index is clearly producing a
> > ridiculously optimistic estimate at the moment.
>
> The one way to find out would be actual benchmarking - if current
> optimizer prefers index scans it is possible to do a query using
> index scan, dro the index, somehow flush disk cache and then do the
> same query using seqscan+sort.
>

Probably this case was benchmarked by someone(Vadim ?).

I have thought that generally index scan is 5~10 times slower than
sequential scan to select all rows.
I know the case that index scan is more than 500 times slower than
sequential scan.  After clustering,the performance was dramatically
improved. Unfortunately,I don't know how to estimate such a case.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp




pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] vacuum process size
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] vacuum process size