Re: Why is indexonlyscan so darned slow? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why is indexonlyscan so darned slow?
Date
Msg-id 7600.1337541890@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why is indexonlyscan so darned slow?  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Why is indexonlyscan so darned slow?
Re: Why is indexonlyscan so darned slow?
List pgsql-hackers
Jeff Janes <jeff.janes@gmail.com> writes:
> On Thu, May 17, 2012 at 11:35 AM, Joshua Berkus <josh@agliodbs.com> wrote:
>> That's in-RAM speed ... I ran the query twice to make sure the index was cached, and it didn't get any better.  And
Imeant 5X per byte rather than 5X per tuple.
 

> Ah, OK that makes more sense.  I played around with this, specifically
> count(*), quite a bit when IOS first came out, and I attributed a
> large part of the time to the code that forms a tuple out of raw
> bytes, and the code that advances the aggregate.  The first one is
> probably more a per-tuple cost than per byte, and the second
> definitely is per tuple cost.
> I can't find my detailed notes from this work, so this is just from memory.

I did a quick investigation of this example with oprofile, and found
that there's not going to be any easy large improvement available.
It's basically all per-tuple CPU costs, breaking down like this:

samples  %        symbol name
15513    13.4664  IndexOnlyNext
10886     9.4498  index_getnext_tid
7526      6.5331  visibilitymap_test
7116      6.1772  ExecClearTuple
7054      6.1234  _bt_checkkeys
6804      5.9064  _bt_next
6344      5.5070  ExecProject
6033      5.2371  advance_aggregates
5619      4.8777  ExecScan
5331      4.6277  advance_transition_function
5202      4.5157  btgettuple
4928      4.2779  _bt_saveitem
4653      4.0391  ExecProcNode
4473      3.8829  int8inc
3404      2.9549  MemoryContextReset
3125      2.7127  _bt_readpage
2768      2.4028  FunctionCall2Coll
2278      1.9775  ExecAgg
1502      1.3038  ExecStoreVirtualTuple
1198      1.0399  BufferGetBlockNumber
1105      0.9592  ExecIndexOnlyScan
946       0.8212  hash_search_with_hash_value

A fair chunk of what's being attributed to IndexOnlyNext is actually the
inlined code for StoreIndexTuple, and that in turn is mostly the inlined
code for index_getattr.  We might possibly save a bit here if we noticed
that the query doesn't actually need us to fetch the indexed columns,
but it looks like that would only buy a couple percent overall --- and
testing for that would add useless cycles to every case where we *do*
need the indexed value.  So I'm afraid that it might amount to optimizing
SELECT COUNT(*) at the expense of everything else, which I'm not for.

Another possibility is to try to reduce the costs of index_getnext_tid
and FunctionCall2Coll, which are basically just trampolines to reach
btgettuple.  It's not immediately obvious how to make that much better
though.

Anyway, on my machine it seems that the per-tuple CPU costs for SELECT
COUNT(*) with an index-only scan are something like 10% higher than the
per-tuple costs with a heap scan.  We might get that down to roughly par
with some hacking, but it's never going to be vastly better.  The
argument in favor of index-only scans is mainly about reducing I/O costs
anyway.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: read() returns ERANGE in Mac OS X
Next
From: David Fetter
Date:
Subject: Re: Remove readline notice from psql --version?