Re: Load distributed checkpoint - Mailing list pgsql-hackers

From Takayuki Tsunakawa
Subject Re: Load distributed checkpoint
Date
Msg-id 01eb01c724d7$ffd47230$19527c0a@OPERAO
Whole thread Raw
In response to Re: Load distributed checkpoint  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Responses Re: Load distributed checkpoint
Re: Load distributed checkpoint
List pgsql-hackers
From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp>
> You were running the test on the very memory-depend machine.
>> shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
> Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
> data and 2GB of memory. Storage is always the main part of
performace here,
> even not in checkpoints.

Yes, I used half the size of RAM as the shared buffers, which is
reasonable.  And I cached all the data.  The effect of fsync() is a
heavier offence, isn't it?  System administrators would say "I have
enough memory.  The data hasn't exhausted the DB cache yet.  But the
users complain to me about the response.  Why?  What should I do?
What?  Checkpoint??  Why doesn't PostgreSQL take care of frontend
users?"
BTW, is DBT-2 an OLTP benchmark which randomly access some parts of
data, or a batch application which accesses all data?  I'm not
familiar with it.  I know that IPA opens it to the public.

> If you use Linux, it has very unpleased behavior in fsync(); It
locks all
> metadata of the file being fsync-ed. We have to wait for the
completion of
> fsync when we do read(), write(), and even lseek().
> Almost of your data is in the accounts table and it was stored in a
single
> file. All of transactions must wait for fsync to the single largest
file,
> so you saw the bottleneck was in the fsync.

Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
waiting for lseek() to complete when it committed.  But why does the
backend which is syncing WAL/pg_control have to wait for syncing the
data file?  They are, not to mention, different files, and WAL and
data files are stored on separate disks.


>> [Conclusion]
>> I believe that the problem cannot be solved in a real sense by
>> avoiding fsync/fdatasync().
>
> I think so, too. However, I assume we can resolve a part of the
> checkpoint spikes with smoothing of write() alone.

First, what's the goal (if possible numerically?  Have you explained
to community members why the patch would help many people?  At least,
I haven't heard that fsync() can be seriously bad and we would close
our eyes to what fsync() does.
By the way, what good results did you get with DBT-2?  If you don't
mind, can you show us?


> BTW, can we use the same way to fsync? We call fsync()s to all
modified
> files without rest in mdsync(), but it's not difficult at all to
insert
> sleeps between fsync()s. Do you think it helps us? One of issues is
that
> we have to sleep in file unit, which is maybe rough granularity.

No, it definitely won't help us.  There is no reason why it will help.
It might help in some limited environments, though, how can we
characterize such environments?  Can we say "our approach helps our
environments, but it won't help you.  The kernel VM settings may help
you.  Good luck!"?
We have to consider seriously.  I think it's time to face the problem
and we should follow the approaches of experts like Jim Gray and DBMS
vendors, unless we have a new clever idea like them.




pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: ERROR: tuple concurrently updated
Next
From: David Fetter
Date:
Subject: Re: New version of money type