Thread: Hashtable entry recycling algorithm in pg_stat_statements

Hashtable entry recycling algorithm in pg_stat_statements

From

Tom Lane

Date:

02 January 2009, 20:22:47

The pending contrib/pg_stat_statements patch has an interesting method
for dealing with the limited size of its shared-memory hash table
(which has got one entry per unique statement text, userid, and database
id, so it's not exactly likely to be small).  When the hash table fills
up, it enumerates all the hash entries, sorts them by "usage" order,
decrements the usage counts, and then removes the ones with least usage
values.  It does all this while holding exclusive lock on the hash
table.  This is going to mean that every so often, your entire database
just freezes up for N milliseconds while this is going on --- no query
will be able to get through ExecutorEnd until that cleanup work is done.
I don't think this is a property that will go over well with the sort of
DBA who would have a use for this module.  Unpredictable response times
are exactly what he's hoping to avoid.

A couple of other possibilities that seem a bit saner:

1. Use a self-organizing list: any time an entry is referenced,
move it to front, and when you need a new entry take the oldest
one off the back.  I don't see a way to do that without a global
lock that protects the list links, but there could be a spinlock
that's held only long enough to manipulate the list links.

2. Use a clock sweep algorithm similar to bufmgr's.

Either of these trades off accuracy of deciding which existing cache
entries are "least interesting" in order to reduce the maintenance
overhead --- but it doesn't appear to me that the code implements usage
counts in a way that would justify treating them as sacrosanct
indicators of relative usefulness anyhow.

The first option seems attractively simple and predictable in
performance --- all operations are O(1).

Comments?
        regards, tom lane

Re: Hashtable entry recycling algorithm in pg_stat_statements

From

"Alex Hunsaker"

Date:

02 January 2009, 21:20:25

On Fri, Jan 2, 2009 at 17:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A couple of other possibilities that seem a bit saner:
>
> 1. Use a self-organizing list: any time an entry is referenced,
> move it to front, and when you need a new entry take the oldest
> one off the back.  I don't see a way to do that without a global
> lock that protects the list links, but there could be a spinlock
> that's held only long enough to manipulate the list links.
>
> 2. Use a clock sweep algorithm similar to bufmgr's.
>
> Either of these trades off accuracy of deciding which existing cache
> entries are "least interesting" in order to reduce the maintenance
> overhead --- but it doesn't appear to me that the code implements usage
> counts in a way that would justify treating them as sacrosanct
> indicators of relative usefulness anyhow.
>
> The first option seems attractively simple and predictable in
> performance --- all operations are O(1).

Its seems to me a linear list would make the "common" case where the
query is already in the list but we need to update the stats slow.  Or
am I just thinking to abstractly and the list is not a pg_list.h list
but just a c array and use a simple hash. (or I guess we could "hash"
and the use list_nth_cel()... but that *seems* slow)?

Re: Hashtable entry recycling algorithm in pg_stat_statements

From

Tom Lane

Date:

02 January 2009, 21:24:00

"Alex Hunsaker" <badalex@gmail.com> writes:
> Its seems to me a linear list would make the "common" case where the
> query is already in the list but we need to update the stats slow.

No, the hashtable is still there for lookups.  The list would be a means
of determining which hashtable entry to release when we're out of space.
        regards, tom lane

Re: Hashtable entry recycling algorithm in pg_stat_statements

From

"Alex Hunsaker"

Date:

02 January 2009, 21:34:54

On Fri, Jan 2, 2009 at 18:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Alex Hunsaker" <badalex@gmail.com> writes:
>> Its seems to me a linear list would make the "common" case where the
>> query is already in the list but we need to update the stats slow.
>
> No, the hashtable is still there for lookups.  The list would be a means
> of determining which hashtable entry to release when we're out of space.

Ahh ok well #1 seems easier to do and it seems like it would even be
faster than a clock and sweep...