Re: review - pg_stat_statements - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: review - pg_stat_statements
Date
Msg-id CAFj8pRD=RfrzwsbgSr0oqY4K-3CougZywg9JKFOd9qkdHxxMBw@mail.gmail.com
Whole thread Raw
In response to Re: review - pg_stat_statements  (Peter Geoghegan <pg@heroku.com>)
Responses Re: review - pg_stat_statements
List pgsql-hackers



2013/11/24 Peter Geoghegan <pg@heroku.com>
On Sun, Nov 24, 2013 at 1:18 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I got a compilation warning:

I'll look into it.

> * I tried do some basic benchmark, and I didn't see any negative on
> performance related to implemented feature

You're never going to see any performance impact with something like a
regular pgbench workload. That's because you'll only ever write query
texts out when a shared lock is held. With only extreme exceptions (a
garbage collection), exceptions will never be exercised by what you're
doing, you will only block whole new entries from being added -
existing stat counters are just protected by a shared lock + spinlock.

You might consider rigging pg_stat_statements to create a new hash
value randomly (consisting of a random integer for a queryid "hash")
maybe 1% - 10% of the time. That would simulate the cache being filled
quickly, I guess, where that shared lock will conflict with the
exclusive lock, potentially showing where what I've done can hurt. I
recommend in the interest of fairness not letting it get so far as to
put the cache under continual pressure.

Now, this isn't that important a case, because having a larger hash
table makes exclusive locking/new entries less necessary, and this
work enables people to have larger hash tables. But it does matter to
some degree.


how is a size of hash table related to exclusive locks in pgss_store? I don't afraid about performance of pg_stat_statements(). I afraid a performance of creating new entry and appending to file.

I didn't expected some slowdown - a benchmark was "only" verification against some hidden cost. Is not difficult to write synthetic benchmark  that will find some differences, but a result of this benchmark will be useless probably due minimal relation to reality.
 
> * We would to this patch - a idea of proposal is right - a shared memory can
> be used better than storage of possibly extra long queries

Right. Plus the amount of storage used is pretty modest, even compared
to previous shared memory use. Each entry doesn't need to have
track_activity_query_size bytes of storage, only what it needs (though
garbage can accrue, which is a concern).
I'm mostly talking about the cost of the shared lock for *reading*
here, when pg_stat_statements() is called. If that was happening at
the same time as I/O for reading the query texts from file, that could
be a concern. Not enough of a concern to worry about humans doing
this, I think, but maybe for scripts.

Maybe the trick would be to copy the stats into shared memory, and
only then read texts from disk (with the shared lock released). We'd
have to be concerned about a garbage collection occurring, so we'd
need to re-acquire a shared lock again (plus a spinlock) to check that
didn't happen (which is generally very unlikely). Only when we know it
didn't happen could we display costs to the user, and that might mean
keeping all the query texts in memory, and that might matter in
extreme situations. The reason I haven't done all this already is
because it's far from clear that it's worth the trouble.


I don't expect so pg_stat_statements is on application critical path, so I prefer mostly simple design
 
> Peter mentioned a race conditions under high load. It is fixed?

Yes, the version you mentioned had the fix.


ok

Regards

Pavel
 
--
Peter Geoghegan

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Traffic jams in fn_extra
Next
From: Marko Kreen
Date:
Subject: Re: PL/Python: domain over array support