Re: Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers
From | Jim Nasby |
---|---|
Subject | Re: Linux kernel impact on PostgreSQL performance |
Date | |
Msg-id | 52D601B2.70806@nasby.net Whole thread Raw |
In response to | Re: Linux kernel impact on PostgreSQL performance (Mel Gorman <mgorman@suse.de>) |
Responses |
Re: Linux kernel impact on PostgreSQL performance
|
List | pgsql-hackers |
On 1/14/14, 4:21 AM, Mel Gorman wrote: > There is an interesting side-line here. If all IO is initiated by one > process in postgres then the memory locality will be sub-optimal. > The consumer of the data may or may not be running on the same > node as the process that read the data from disk. It is possible to > migrate this from user space but the interface is clumsy and assumes the > data is mapped. That's really not the case in Postgres. There's essentially 3 main areas for IO requests to come from: - Individual "backends". These are processes forked off of our startup process (postmaster) for the purpose of serving userconnections. This is always "foreground" IO and should be avoided as much as possible (but is still a large percentage). - autovacuum. This is a set of "clean-up" processes, meant to be low impact, background only. Similar to garbage collectionis GC languages. - bgwriter. This process is meant to greatly reduce the need for user backends to write data out. Generally speaking, read requests are most likely to come from user backends. autovacuum can issue them too, but it's gota throttling mechanism so generally shouldn't be that much of the workload. Ideally most write traffic would come from bgwriter (and autovacuum, though again we don't care too much about it). In realitythough, that's going to depend very highly on a user's actual workload. To start, backends normally must write allwrite-ahead-log traffic before they finalize (COMMIT) a transaction for the user. COMMIT is sort of similar in idea tofsync... "When this returns I guarantee I've permanently stored your data." The amount of WAL data generated for a transaction will vary enormously, even as a percentage of raw page data written. Insome cases a very small (10s-100s of bytes) amount of WAL data will cover 1 or more base data pages (8k by default, upto 64k). But to protect against torn page writes, by default we write a complete copy of a data page to WAL the first timethe page is dirtied after a checkpoint. So the opposite scenario is we actually write slightly MORE data to WAL thanwe do to the data pages. What makes WAL even trickier is that bgwritter tries to write WAL data out before backends need to. In a system with a fairlylow transaction rate that can work... but with a higher rate most WAL data will be written by a backend trying to issuea COMMIT. Note however that COMMIT needs to write ALL WAL data up to a given point, so one backend that only needs towrite 100 bytes can easily end up flushing (and fsync'ing) megabytes of data written by some other backend. Further complicating things is temporary storage, either in the form of user defined temporary tables, or temporary storageneeded by the database itself. It's hard to characterize these workloads other than to say that typically readingand writing to them will want to move a relatively large amount of data at once. BTW, because Postgres doesn't have terribly sophisticated memory management, it's very common to create temporary file datathat will never, ever, ever actually NEED to hit disk. Where I work being able to tell the kernel to avoid flushing thosefiles unless the kernel thinks it's got better things to do with that memory would be EXTREMELY valuable, because it'sall temp data anyway: if the database or server crashes it's just going to get throw away. It might be a good idea forthe Postgres to look at simply putting this data into plain memory now and relying on the OS to swap it as needed. That'dbe more problematic for temp tables, but in that case mmap might work very well, because that data is currently nevershared by other processes, though if we start doing parallel query execution that will change. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
pgsql-hackers by date: