Thread: Hashtable entry recycling algorithm in pg_stat_statements
The pending contrib/pg_stat_statements patch has an interesting method for dealing with the limited size of its shared-memory hash table (which has got one entry per unique statement text, userid, and database id, so it's not exactly likely to be small). When the hash table fills up, it enumerates all the hash entries, sorts them by "usage" order, decrements the usage counts, and then removes the ones with least usage values. It does all this while holding exclusive lock on the hash table. This is going to mean that every so often, your entire database just freezes up for N milliseconds while this is going on --- no query will be able to get through ExecutorEnd until that cleanup work is done. I don't think this is a property that will go over well with the sort of DBA who would have a use for this module. Unpredictable response times are exactly what he's hoping to avoid. A couple of other possibilities that seem a bit saner: 1. Use a self-organizing list: any time an entry is referenced, move it to front, and when you need a new entry take the oldest one off the back. I don't see a way to do that without a global lock that protects the list links, but there could be a spinlock that's held only long enough to manipulate the list links. 2. Use a clock sweep algorithm similar to bufmgr's. Either of these trades off accuracy of deciding which existing cache entries are "least interesting" in order to reduce the maintenance overhead --- but it doesn't appear to me that the code implements usage counts in a way that would justify treating them as sacrosanct indicators of relative usefulness anyhow. The first option seems attractively simple and predictable in performance --- all operations are O(1). Comments? regards, tom lane
On Fri, Jan 2, 2009 at 17:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: > A couple of other possibilities that seem a bit saner: > > 1. Use a self-organizing list: any time an entry is referenced, > move it to front, and when you need a new entry take the oldest > one off the back. I don't see a way to do that without a global > lock that protects the list links, but there could be a spinlock > that's held only long enough to manipulate the list links. > > 2. Use a clock sweep algorithm similar to bufmgr's. > > Either of these trades off accuracy of deciding which existing cache > entries are "least interesting" in order to reduce the maintenance > overhead --- but it doesn't appear to me that the code implements usage > counts in a way that would justify treating them as sacrosanct > indicators of relative usefulness anyhow. > > The first option seems attractively simple and predictable in > performance --- all operations are O(1). Its seems to me a linear list would make the "common" case where the query is already in the list but we need to update the stats slow. Or am I just thinking to abstractly and the list is not a pg_list.h list but just a c array and use a simple hash. (or I guess we could "hash" and the use list_nth_cel()... but that *seems* slow)?
"Alex Hunsaker" <badalex@gmail.com> writes: > Its seems to me a linear list would make the "common" case where the > query is already in the list but we need to update the stats slow. No, the hashtable is still there for lookups. The list would be a means of determining which hashtable entry to release when we're out of space. regards, tom lane
On Fri, Jan 2, 2009 at 18:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Alex Hunsaker" <badalex@gmail.com> writes: >> Its seems to me a linear list would make the "common" case where the >> query is already in the list but we need to update the stats slow. > > No, the hashtable is still there for lookups. The list would be a means > of determining which hashtable entry to release when we're out of space. Ahh ok well #1 seems easier to do and it seems like it would even be faster than a clock and sweep...