Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date | |
Msg-id | CA+TgmoZgL=ppfXoJbC9bQHnpJU9UrWs3Nk0DeGJAu7tsvwYYrw@mail.gmail.com Whole thread Raw |
In response to | Re: Linux kernel impact on PostgreSQL performance (Claudio Freire <klaussfreire@gmail.com>) |
Responses |
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
List | pgsql-hackers |
On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner <david@fromorbit.com> wrote: > But there's something here that I'm not getting - you're talking > about a data set that you want ot keep cache resident that is at > least an order of magnitude larger than the cyclic 5-15 minute WAL > dataset that ongoing operations need to manage to avoid IO storms. > Where do these temporary files fit into this picture, how fast do > they grow and why are do they need to be so large in comparison to > the ongoing modifications being made to the database? I'm not sure you've got that quite right. WAL is fsync'd very frequently - on every commit, at the very least, and multiple times per second even there are no commits going on just to make sure we get it all down to the platter as fast as possible. The thing that causes the I/O storm is the data file writes, which are performed either when we need to free up space in PostgreSQL's internal buffer pool (aka shared_buffers) or once per checkpoint interval (5-60 minutes) in any event. The point of this system is that if we crash, we're going to need to replay all of the WAL to recover the data files to the proper state; but we don't want to keep WAL around forever, so we checkpoint periodically. By writing all the data back to the underlying data files, checkpoints render older WAL segments irrelevant, at which point we can recycle those files before the disk fills up. Temp files are something else again. If PostgreSQL needs to sort a small amount of data, like a kilobyte, it'll use quicksort. But if it needs to sort a large amount of data, like a terabyte, it'll use a merge sort.[1] The reason is of course that quicksort requires random access to work well; if parts of quicksort's working memory get paged out during the sort, your life sucks. Merge sort (or at least our implementation of it) is slower overall, but it only accesses the data sequentially. When we do a merge sort, we use files to simulate the tapes that Knuth had in mind when he wrote down the algorithm. If the OS runs short of memory - because the sort is really big or just because of other memory pressure - it can page out the parts of the file we're not actively using without totally destroying performance. It'll be slow, of course, because disks always are, but not like quicksort would be if it started swapping. I haven't actually experienced (or heard mentioned) the problem Jeff Janes is mentioning where temp files get written out to disk too aggressively; as mentioned before, the problems I've seen are usually the other way - stuff not getting written out aggressively enough. But it sounds plausible. The OS only lets you set one policy, and if you make that file right for permanent data files that get checkpointed it could well be wrong for temp files that get thrown out. Just stuffing the data on RAMFS will work for some installations, but might not be good if you actually do want to perform sorts whose size exceeds RAM. BTW, I haven't heard anyone on pgsql-hackers say they'd be interesting in attending Collab on behalf of the PostgreSQL community. Although the prospect of a cross-country flight is a somewhat depressing thought, it does sound pretty cool, so I'm potentially interested. I have no idea what the procedure is here for moving forward though, especially since it sounds like there might be only one seat available and I don't know who else may wish to sit in it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] The threshold where we switch from quicksort to merge sort is a configurable parameter.
pgsql-hackers by date: