Re: review - pg_stat_statements - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: review - pg_stat_statements |
Date | |
Msg-id | CAFj8pRD=RfrzwsbgSr0oqY4K-3CougZywg9JKFOd9qkdHxxMBw@mail.gmail.com Whole thread Raw |
In response to | Re: review - pg_stat_statements (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: review - pg_stat_statements
|
List | pgsql-hackers |
2013/11/24 Peter Geoghegan <pg@heroku.com>
On Sun, Nov 24, 2013 at 1:18 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:I'll look into it.
> I got a compilation warning:You're never going to see any performance impact with something like a
> * I tried do some basic benchmark, and I didn't see any negative on
> performance related to implemented feature
regular pgbench workload. That's because you'll only ever write query
texts out when a shared lock is held. With only extreme exceptions (a
garbage collection), exceptions will never be exercised by what you're
doing, you will only block whole new entries from being added -
existing stat counters are just protected by a shared lock + spinlock.
You might consider rigging pg_stat_statements to create a new hash
value randomly (consisting of a random integer for a queryid "hash")
maybe 1% - 10% of the time. That would simulate the cache being filled
quickly, I guess, where that shared lock will conflict with the
exclusive lock, potentially showing where what I've done can hurt. I
recommend in the interest of fairness not letting it get so far as to
put the cache under continual pressure.
Now, this isn't that important a case, because having a larger hash
table makes exclusive locking/new entries less necessary, and this
work enables people to have larger hash tables. But it does matter to
some degree.
how is a size of hash table related to exclusive locks in pgss_store? I don't afraid about performance of pg_stat_statements(). I afraid a performance of creating new entry and appending to file.
I didn't expected some slowdown - a benchmark was "only" verification against some hidden cost. Is not difficult to write synthetic benchmark that will find some differences, but a result of this benchmark will be useless probably due minimal relation to reality.
> * We would to this patch - a idea of proposal is right - a shared memory canRight. Plus the amount of storage used is pretty modest, even compared
> be used better than storage of possibly extra long queries
to previous shared memory use. Each entry doesn't need to have
track_activity_query_size bytes of storage, only what it needs (though
garbage can accrue, which is a concern).I'm mostly talking about the cost of the shared lock for *reading*
> Peter does some warning about performance in feature proposal
> http://www.postgresql.org/message-id/CAM3SWZRYYnfwXi3r3eDAKWBOYtaf1_PwGJxTAyddNSbjAfDzjA@mail.gmail.com
here, when pg_stat_statements() is called. If that was happening at
the same time as I/O for reading the query texts from file, that could
be a concern. Not enough of a concern to worry about humans doing
this, I think, but maybe for scripts.
Maybe the trick would be to copy the stats into shared memory, and
only then read texts from disk (with the shared lock released). We'd
have to be concerned about a garbage collection occurring, so we'd
need to re-acquire a shared lock again (plus a spinlock) to check that
didn't happen (which is generally very unlikely). Only when we know it
didn't happen could we display costs to the user, and that might mean
keeping all the query texts in memory, and that might matter in
extreme situations. The reason I haven't done all this already is
because it's far from clear that it's worth the trouble.
I don't expect so pg_stat_statements is on application critical path, so I prefer mostly simple design
> Peter mentioned a race conditions under high load. It is fixed?Yes, the version you mentioned had the fix.
ok
Regards
Pavel
--
Peter Geoghegan
pgsql-hackers by date: