Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From Gregory Smith
Subject Re: Scaling shared buffer eviction
Date
Msg-id 54224242.5010906@gmail.com
Whole thread Raw
In response to Re: Scaling shared buffer eviction  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 9/23/14, 7:13 PM, Robert Haas wrote:
> I think we expose far too little information in our system views. Just 
> to take one example, we expose no useful information about lwlock 
> acquire or release, but a lot of real-world performance problems are 
> caused by lwlock contention.
I sent over a proposal for what I was calling Performance Events about a 
year ago.  The idea was to provide a place to save data about lock 
contention, weird checkpoint sync events, that sort of thing.  Replacing 
log parsing to get at log_lock_waits data was my top priority.  Once 
that's there, lwlocks was an obvious next target.  Presumably we just 
needed collection to be low enough overhead, and then we can go down to 
whatever shorter locks we want; lower the overhead, faster the event we 
can measure.

Sometimes the database will never be able to instrument some of its 
fastest events without blowing away the event itself.  We'll still have 
perf / dtrace / systemtap / etc. for those jobs.  But those are not the 
problems of the average Postgres DBA's typical day.

The data people need to solve this sort of thing in production can't 
always show up in counters.  You'll get evidence the problem is there, 
but you need more details to actually find the culprit.  Some info about 
the type of lock, tables and processes involved, maybe the query that's 
running, that sort of thing.  You can kind of half-ass the job if you 
make per-tables counter for everything, but we really need more, both to 
serve our users and to compare well against what other databases provide 
for tools.  That's why I was trying to get the infrastructure to capture 
all that lock detail, without going through the existing logging system 
first.

Actually building Performance Events fell apart on the storage side:  
figuring out where to put it all without waiting for a log file to hit 
disk.  I wanted in-memory storage so clients don't wait for anything, 
then a potentially lossy persistence writer.  I thought I could get away 
with a fixed size buffer like pg_stat_statements uses.  That was 
optimistic.  Trying to do better got me lost in memory management land 
without making much progress.

I think the work you've now done on dynamic shared memory gives the 
right shape of infrastructure that I could pull this off now.  I even 
have funding to work on it again, and it's actually the #2 thing I'd 
like to take on as I get energy for new feature development.  (#1 is the 
simple but time consuming job of adding block write counters, the lack 
of which which is just killing me on some fast growing installs)

I have a lot of unread messages on this list to sort through right now.  
I know I saw someone try to revive the idea of saving new sorts of 
performance log data again recently; can't seem to find it again right 
now.  That didn't seem like it went any farther than thinking about the 
specifications though.  The last time I jumped right over that and hit a 
wall with this one hard part of the implementation instead, low overhead 
memory management for saving everything.

-- 
Greg Smith greg.smith@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/



pgsql-hackers by date:

Previous
From: Gregory Smith
Date:
Subject: Re: proposal: rounding up time value less than its unit.
Next
From: Jan Wieck
Date:
Subject: Re: jsonb format is pessimal for toast compression