Re: Limit of bgwriter_lru_maxpages of max. 1000? - Mailing list pgsql-general
From | Gerhard Wiesinger |
---|---|
Subject | Re: Limit of bgwriter_lru_maxpages of max. 1000? |
Date | |
Msg-id | alpine.LFD.2.00.0910051928460.15050@bbs.intern Whole thread Raw |
In response to | Re: Limit of bgwriter_lru_maxpages of max. 1000? (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: Limit of bgwriter_lru_maxpages of max. 1000?
|
List | pgsql-general |
On Mon, 5 Oct 2009, Greg Smith wrote: > On Sun, 4 Oct 2009, Gerhard Wiesinger wrote: > >>> On Fri, 2 Oct 2009, Scott Marlowe wrote: >>> >>>> I found that lowering checkpoint completion target was what helped. >>>> Does that seem counter-intuitive to you? >>> >> >> I set it to 0.0 now. > > If you set that to 0.0, the whole checkpoing spreading logic doesn't apply > like it's supposed to. I'm not sure what the results you posted mean now. If > you had it set to 0 and saw a bad spike (which is how I read your message), > I'd say "yes, that's what happens when you do reduce that parameter, so don't > do that". If you meant something else please clarify. I think the problem is, that it is done on checkpoint time (whether spread or not). I should have been already be done by bgwriter. > > Thanks for the dtrace example, I suggested we add those checkpoint probes in > there and someone did, but I hadn't seen anybody use them for anything yet. > I think more probes (e.g. on different writing conditions like writing from bgwriter or on a checkpoint) would be interesting here. >> Bug1: usage_count is IHMO not consistent > > It's a bit hack-ish, but the changes made to support multiple buffer use > strategies introduced by the "Make large sequential scans and VACUUMs work in > a limited-size ring" commit are reasonable even if they're not as consistent > as we'd like. Those changes were supported by benchmarks proving their > utility, which always trump theoretical "that shouldn't work better!" claims > when profiling performance. > > Also, they make sense to me, but I've spent a lot of time staring at > pg_buffercache output to get a feel for what shows up in there under various > circumstances. That's where I'd suggest you go if this doesn't seem right to > you; run some real database tests and use pg_buffercache to see what's inside > the cache when you're done. What's in there and what I expected to be in > there weren't always the same thing, and it's interesting to note how that > changes as shared_buffers increases. I consider some time studying that a > pre-requisite to analyzing performance of this code. > I have analyzed pg_buffercache (query every second, see below) in parallel to see what happens but I didn't see expected results in some ways with the usage_counts. Therefore I analyzed the code and found IHMO the problem with the usage_count and buffer reallocation. Since the code change is also new (I think it way 05/2009) it might be that you tested before ... BTW: Is it possible to get everything in pg_class over all databases as admin? >> Bug2: Double iteration of buffers >> As you can seen in the calling tree below there is double iteration with >> buffers involved. This might be a major performance bottleneck. > > Hmmm, this might be a real bug causing scans through the buffer cache to go > twice as fast as intended. That's not twice O(2*n)=O(n) that's a factor n*n (outer and inner loop iteration) which means overall is O(n^2) which is IHMO too much. > Since the part you suggest is doubled isn't very > intensive or called all that often, there's no way it can be a major issue > though. It is a major issue since it is O(n^2) and not O(n). E.g. with 2GB share buffer we have 262144 blocks and 68719476736 calls which is far too much. > That's based on knowing what the code does and how much it was > called, as well as some confidence that if it were really a *major* problem, > it would have shown up on the extensive benchmarks done on all the code paths > you're investigating. > The problem might be hidden for the following reasons: 1.) Buffers values are too low that even n^2 is low for today's machines 2.) Code is not often called in that way 3.) backend writes out pages so that the code is never executed 4.) ... >> BTW: Are there some tests available how fast a buffer cache hit is and a >> disk cache hit is (not in the buffer cache but in the disk cache)? I'll >> asked, because a lot of locking is involved in the code. > > I did some once but didn't find anything particularly interesting about the > results. Since you seem to be on a research tear here, it would be helpful > to have a script to test that out available, I wasn't able to release mine > and something dtrace based would probably be better than the approach I used > (I threw a bunch of gettimeofdata calls into the logs and post-processed them > with a script). > Do you have an where one should set tracepoints inside and outside PostgreSQL? >> BTW2: Oracle buffercache and background writer strategy is also >> interesting. > > As a rule, we don't post links to other database implementation details here, > as those can have patented design details we'd prefer not to intentionally > re-implement. Much of Oracle's design here doesn't apply here anyway, as it > was done in the era when all of their writes were synchronous. That required > them to worry about doing a good job on some things in their background > writer that we shrug off and let os writes combined with fsync handle > instead. > Ok, no problem. Ciao, Gerhard -- http://www.wiesinger.com/ SELECT CASE WHEN datname IS NULL THEN pg_buffercache.reldatabase::text ELSE datname END AS database, CASE WHEN spcname IS NULL THEN pg_buffercache.reltablespace::text ELSE spcname END AS tablespace, CASE WHEN relname IS NULL THEN pg_buffercache.relfilenode::text ELSE relname END AS relation, CASE WHEN relkind IS NULL THEN pg_buffercache.relfilenode::text ELSE relkind END AS relkind, usagecount > 0 AS usagecount_gt_0, isdirty, MIN(relblocknumber) AS min_blocknumber, MAX(relblocknumber) AS max_blocknumber, ROUND(AVG(relblocknumber),2) AS avg_blocknumber, ROUND(STDDEV(relblocknumber),2) AS stddev_blocknumber, COUNT(*) AS count FROM pg_buffercache LEFT JOIN pg_class ON pg_buffercache.relfilenode = pg_class.oid LEFT JOIN pg_tablespace ON pg_buffercache.reltablespace = pg_tablespace.oid LEFT JOIN pg_database ON pg_buffercache.reldatabase = pg_database.oid WHERE isdirty = true GROUP BY CASE WHEN datname IS NULL THEN pg_buffercache.reldatabase::text ELSE datname END, CASE WHEN spcname IS NULL THEN pg_buffercache.reltablespace::text ELSE spcname END, CASE WHEN relkind IS NULL THEN pg_buffercache.relfilenode::text ELSE relkind END, CASE WHEN relname IS NULL THEN pg_buffercache.relfilenode::text ELSE relname END, usagecount > 0, isdirty --HAVING relkind = 'r' ORDER BY database, tablespace, relkind DESC, relation, usagecount > 0, isdirty ;
pgsql-general by date: