Re: review - pg_stat_statements - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: review - pg_stat_statements |
Date | |
Msg-id | CAM3SWZTaZGZgAXNjDfwewdTKHagC-GDjSmpNxyyEOd17_qzyNQ@mail.gmail.com Whole thread Raw |
In response to | review - pg_stat_statements (Pavel Stehule <pavel.stehule@gmail.com>) |
Responses |
Re: review - pg_stat_statements
|
List | pgsql-hackers |
On Sun, Nov 24, 2013 at 1:18 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I got a compilation warning: I'll look into it. > * I tried do some basic benchmark, and I didn't see any negative on > performance related to implemented feature You're never going to see any performance impact with something like a regular pgbench workload. That's because you'll only ever write query texts out when a shared lock is held. With only extreme exceptions (a garbage collection), exceptions will never be exercised by what you're doing, you will only block whole new entries from being added - existing stat counters are just protected by a shared lock + spinlock. You might consider rigging pg_stat_statements to create a new hash value randomly (consisting of a random integer for a queryid "hash") maybe 1% - 10% of the time. That would simulate the cache being filled quickly, I guess, where that shared lock will conflict with the exclusive lock, potentially showing where what I've done can hurt. I recommend in the interest of fairness not letting it get so far as to put the cache under continual pressure. Now, this isn't that important a case, because having a larger hash table makes exclusive locking/new entries less necessary, and this work enables people to have larger hash tables. But it does matter to some degree. > * We would to this patch - a idea of proposal is right - a shared memory can > be used better than storage of possibly extra long queries Right. Plus the amount of storage used is pretty modest, even compared to previous shared memory use. Each entry doesn't need to have track_activity_query_size bytes of storage, only what it needs (though garbage can accrue, which is a concern). > Peter does some warning about performance in feature proposal > http://www.postgresql.org/message-id/CAM3SWZRYYnfwXi3r3eDAKWBOYtaf1_PwGJxTAyddNSbjAfDzjA@mail.gmail.com I'm mostly talking about the cost of the shared lock for *reading* here, when pg_stat_statements() is called. If that was happening at the same time as I/O for reading the query texts from file, that could be a concern. Not enough of a concern to worry about humans doing this, I think, but maybe for scripts. Maybe the trick would be to copy the stats into shared memory, and only then read texts from disk (with the shared lock released). We'd have to be concerned about a garbage collection occurring, so we'd need to re-acquire a shared lock again (plus a spinlock) to check that didn't happen (which is generally very unlikely). Only when we know it didn't happen could we display costs to the user, and that might mean keeping all the query texts in memory, and that might matter in extreme situations. The reason I haven't done all this already is because it's far from clear that it's worth the trouble. > Peter mentioned a race conditions under high load. It is fixed? Yes, the version you mentioned had the fix. -- Peter Geoghegan
pgsql-hackers by date: