Re: review - pg_stat_statements - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: review - pg_stat_statements
Date
Msg-id CAM3SWZTaZGZgAXNjDfwewdTKHagC-GDjSmpNxyyEOd17_qzyNQ@mail.gmail.com
Whole thread Raw
In response to review - pg_stat_statements  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: review - pg_stat_statements
List pgsql-hackers
On Sun, Nov 24, 2013 at 1:18 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I got a compilation warning:

I'll look into it.

> * I tried do some basic benchmark, and I didn't see any negative on
> performance related to implemented feature

You're never going to see any performance impact with something like a
regular pgbench workload. That's because you'll only ever write query
texts out when a shared lock is held. With only extreme exceptions (a
garbage collection), exceptions will never be exercised by what you're
doing, you will only block whole new entries from being added -
existing stat counters are just protected by a shared lock + spinlock.

You might consider rigging pg_stat_statements to create a new hash
value randomly (consisting of a random integer for a queryid "hash")
maybe 1% - 10% of the time. That would simulate the cache being filled
quickly, I guess, where that shared lock will conflict with the
exclusive lock, potentially showing where what I've done can hurt. I
recommend in the interest of fairness not letting it get so far as to
put the cache under continual pressure.

Now, this isn't that important a case, because having a larger hash
table makes exclusive locking/new entries less necessary, and this
work enables people to have larger hash tables. But it does matter to
some degree.

> * We would to this patch - a idea of proposal is right - a shared memory can
> be used better than storage of possibly extra long queries

Right. Plus the amount of storage used is pretty modest, even compared
to previous shared memory use. Each entry doesn't need to have
track_activity_query_size bytes of storage, only what it needs (though
garbage can accrue, which is a concern).

> Peter does some warning about performance in feature proposal
> http://www.postgresql.org/message-id/CAM3SWZRYYnfwXi3r3eDAKWBOYtaf1_PwGJxTAyddNSbjAfDzjA@mail.gmail.com

I'm mostly talking about the cost of the shared lock for *reading*
here, when pg_stat_statements() is called. If that was happening at
the same time as I/O for reading the query texts from file, that could
be a concern. Not enough of a concern to worry about humans doing
this, I think, but maybe for scripts.

Maybe the trick would be to copy the stats into shared memory, and
only then read texts from disk (with the shared lock released). We'd
have to be concerned about a garbage collection occurring, so we'd
need to re-acquire a shared lock again (plus a spinlock) to check that
didn't happen (which is generally very unlikely). Only when we know it
didn't happen could we display costs to the user, and that might mean
keeping all the query texts in memory, and that might matter in
extreme situations. The reason I haven't done all this already is
because it's far from clear that it's worth the trouble.

> Peter mentioned a race conditions under high load. It is fixed?

Yes, the version you mentioned had the fix.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: review - pg_stat_statements
Next
From: Олексій Васильєв
Date:
Subject: Connect from background worker thread to database