Re: B-Heaps - Mailing list pgsql-performance

From Yeb Havinga
Subject Re: B-Heaps
Date
Msg-id 4C1BBB39.10308@gmail.com
Whole thread Raw
In response to Re: B-Heaps  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: B-Heaps  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
Greg Smith wrote:
> Matthew Wakeling wrote:
>> This sort of thing has been fairly well researched at an academic
>> level, but has not been implemented in that many real world
>> situations. I would encourage its use in Postgres.
>
> I guess, but don't forget that work on PostgreSQL is driven by what
> problems people are actually running into.  There's a long list of
> performance improvements sitting in the TODO list waiting for people
> to find time to work on them, ones that we're quite certain are
> useful.  That anyone is going to chase after any of these speculative
> ideas from academic research instead of one of those is unlikely.
> Your characterization of the potential speed up here is "Using a
> proper tree inside the index page would improve the CPU usage of the
> index lookups", which seems quite reasonable.  Regardless, when I
> consider "is that something I have any reason to suspect is a
> bottleneck on common workloads?", I don't think of any, and return to
> working on one of things I already know is instead.
>
There are two different things concerning gist indexes:

1) with larger block sizes and hence, larger # entries per gist page,
results in more generic keys of those pages. This in turn results in a
greater number of hits, when the index is queried, so a larger part of
the index is scanned. NB this has nothing to do with caching / cache
sizes; it holds for every IO model. Tests performed by me showed
performance improvements of over 200%. Since then implementing a speedup
has been on my 'want to do list'.

2) there are several approaches to get the # entries per page down. Two
have been suggested in the thread referred to by Matthew (virtual pages
(but how to order these?) and tree within a page). It is interesting to
see if ideas from Prokop's cache oblivous algorithms match with this
problem to find a suitable virtual page format.

regards,
Yeb Havinga


pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: requested shared memory size overflows size_t
Next
From: "Kevin Grittner"
Date:
Subject: Re: B-Heaps