Re: Advice: Where could I be of help? - Mailing list pgsql-hackers

From Curtis Faith
Subject Re: Advice: Where could I be of help?
Date
Msg-id DMEEJMCDOJAKPPFACMPMIECECEAA.curtis@galtair.com
Whole thread Raw
In response to Re: Advice: Where could I be of help?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:

> > My modification was to use access counts to increase the
> durability of the
> > more accessed blocks.
>

tom lane replies:
> You could do it that way too, but I'm unsure whether the extra
> complexity will buy anything.  Ultimately, I think an LRU-anything
> algorithm is equivalent to a clock sweep for those pages that only get
> touched once per some-long-interval: the single-touch guys get recycled
> in order of last use, which seems just like a clock sweep around the
> cache.  The guys with some amount of preference get excluded from the
> once-around sweep.  To determine whether LRU-2 is better or worse than
> some other preference algorithm requires a finer grain of analysis than
> this.  I'm not a fan of "more complex must be better", so I'd want to see
> why it's better before buying into it ...

I'm definitely not a fan of "more complex must be better either". In fact,
its surprising how often the real performance problems are easy to fix
and simple while many person years are spent solving the issue everyone
"knows" must be causing the performance problems only to find little gain.

The key here is empirical testing. If the cache hit ratio for LRU-2 is
much better then there may be no need here. OTOH, it took less than
less than 30 lines or so of code to do what I described, so I don't consider
it too, too "more complex" :=} We should run a test which includes
running indexes (or is indices the PostgreSQL convention?) that are three
or more times the size of the cache to see how well LRU-2 works. Is there
any cache performance reporting built into pgsql?

tom lane wrote:
> Shouldn't the OS be responsible for scheduling those writes
> appropriately?  Ye good olde elevator algorithm ought to handle this;
> and it's at least one layer closer to the actual disk layout than we
> are, thus more likely to issue the writes in a good order.  It's worth
> experimenting with, perhaps, but I'm pretty dubious about it.

I wasn't proposing anything other than changing the order of the writes,
not actually ensuring that they get written that way at the level you
describe above. This will help a lot on brain-dead file systems that
can't do this ordering and probably also in cases where the number
of blocks in the cache is very large.

On a related note, while looking at the code, it seems to me that we
are writing out the buffer cache synchronously, so there won't be
any possibility of the file system reordering anyway. This appears to be
a huge performance problem. I've read claims  in the archives that
that the buffers are written asynchronously but my read of the
code says otherwise. Can someone point out my error?

I only see calls that ultimately call FileWrite or write(2) which will
block without a O_NOBLOCK open. I thought one of the main reasons
for having a WAL is so that you can write out the buffer's asynchronously.

What am I missing?

I wrote:
> > Then during execution if the planner turned out to be VERY wrong about
> > certain assumptions the execution system could update the stats
> that led to
> > those wrong assumptions. That way the system would seek the
> correct values
> > automatically.

tom lane replied:
> That has been suggested before, but I'm unsure how to make it work.
> There are a lot of parameters involved in any planning decision and it's
> not obvious which ones to tweak, or in which direction, if the plan
> turns out to be bad.  But if you can come up with some ideas, go to
> it!

I'll have to look at the current planner before I can suggest
anything concrete.

- Curtis



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [SQL] [GENERAL] CURRENT_TIMESTAMP
Next
From: "Curtis Faith"
Date:
Subject: Re: Potential Large Performance Gain in WAL synching