Re: Load distributed checkpoint - Mailing list pgsql-hackers
From | Takayuki Tsunakawa |
---|---|
Subject | Re: Load distributed checkpoint |
Date | |
Msg-id | 01eb01c724d7$ffd47230$19527c0a@OPERAO Whole thread Raw |
In response to | Re: Load distributed checkpoint (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>) |
Responses |
Re: Load distributed checkpoint
Re: Load distributed checkpoint |
List | pgsql-hackers |
From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp> > You were running the test on the very memory-depend machine. >> shared_buffers = 4GB / The scaling factor is 50, 800MB of data. > Thet would be why the patch did not work. I tested it with DBT-2, 10GB of > data and 2GB of memory. Storage is always the main part of performace here, > even not in checkpoints. Yes, I used half the size of RAM as the shared buffers, which is reasonable. And I cached all the data. The effect of fsync() is a heavier offence, isn't it? System administrators would say "I have enough memory. The data hasn't exhausted the DB cache yet. But the users complain to me about the response. Why? What should I do? What? Checkpoint?? Why doesn't PostgreSQL take care of frontend users?" BTW, is DBT-2 an OLTP benchmark which randomly access some parts of data, or a batch application which accesses all data? I'm not familiar with it. I know that IPA opens it to the public. > If you use Linux, it has very unpleased behavior in fsync(); It locks all > metadata of the file being fsync-ed. We have to wait for the completion of > fsync when we do read(), write(), and even lseek(). > Almost of your data is in the accounts table and it was stored in a single > file. All of transactions must wait for fsync to the single largest file, > so you saw the bottleneck was in the fsync. Oh, really, what an evil fsync is! Yes, I sometimes saw a backend waiting for lseek() to complete when it committed. But why does the backend which is syncing WAL/pg_control have to wait for syncing the data file? They are, not to mention, different files, and WAL and data files are stored on separate disks. >> [Conclusion] >> I believe that the problem cannot be solved in a real sense by >> avoiding fsync/fdatasync(). > > I think so, too. However, I assume we can resolve a part of the > checkpoint spikes with smoothing of write() alone. First, what's the goal (if possible numerically? Have you explained to community members why the patch would help many people? At least, I haven't heard that fsync() can be seriously bad and we would close our eyes to what fsync() does. By the way, what good results did you get with DBT-2? If you don't mind, can you show us? > BTW, can we use the same way to fsync? We call fsync()s to all modified > files without rest in mdsync(), but it's not difficult at all to insert > sleeps between fsync()s. Do you think it helps us? One of issues is that > we have to sleep in file unit, which is maybe rough granularity. No, it definitely won't help us. There is no reason why it will help. It might help in some limited environments, though, how can we characterize such environments? Can we say "our approach helps our environments, but it won't help you. The kernel VM settings may help you. Good luck!"? We have to consider seriously. I think it's time to face the problem and we should follow the approaches of experts like Jim Gray and DBMS vendors, unless we have a new clever idea like them.
pgsql-hackers by date: