Greg Smith wrote:
> Matthew Wakeling wrote:
>> This sort of thing has been fairly well researched at an academic
>> level, but has not been implemented in that many real world
>> situations. I would encourage its use in Postgres.
>
> I guess, but don't forget that work on PostgreSQL is driven by what
> problems people are actually running into. There's a long list of
> performance improvements sitting in the TODO list waiting for people
> to find time to work on them, ones that we're quite certain are
> useful. That anyone is going to chase after any of these speculative
> ideas from academic research instead of one of those is unlikely.
> Your characterization of the potential speed up here is "Using a
> proper tree inside the index page would improve the CPU usage of the
> index lookups", which seems quite reasonable. Regardless, when I
> consider "is that something I have any reason to suspect is a
> bottleneck on common workloads?", I don't think of any, and return to
> working on one of things I already know is instead.
>
There are two different things concerning gist indexes:
1) with larger block sizes and hence, larger # entries per gist page,
results in more generic keys of those pages. This in turn results in a
greater number of hits, when the index is queried, so a larger part of
the index is scanned. NB this has nothing to do with caching / cache
sizes; it holds for every IO model. Tests performed by me showed
performance improvements of over 200%. Since then implementing a speedup
has been on my 'want to do list'.
2) there are several approaches to get the # entries per page down. Two
have been suggested in the thread referred to by Matthew (virtual pages
(but how to order these?) and tree within a page). It is interesting to
see if ideas from Prokop's cache oblivous algorithms match with this
problem to find a suitable virtual page format.
regards,
Yeb Havinga