Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra |
Date | |
Msg-id | 21594.1507409215@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
|
List | pgsql-hackers |
I wrote: > Current status is that I've filed a bug report with Apple and am waiting > to see their response before deciding what to do next. If they fix the > issue promptly then there's little need for us to do anything. Not having heard a peep from Apple yet, I decided to do a bit more experimenting. I found that indeed, issuing fewer and larger mmap/msync requests helps enormously. If you're willing to go as far as issuing only one per file, the speed seems on par with non-fsync. But that requires being able to mmap 1GB-sized files, so it's probably not something we want to do. What I did instead was to adjust the logic in copy_file() so that the unit of flush requests can be a multiple of the unit of read/write requests. (My original thought of just raising the buffer size seems like not as good an idea; it's less cache-friendly for one thing.) I find that on both Linux and macOS-with-HFS, requesting a flush only every 1MB seems to be a win compared to flushing every 64KB as we currently do. Actually it seems that on macOS, every increment of increase in the flush distance helps, but with HFS you hit diminishing returns after 1MB or so. With APFS you need a flush distance of 32MB or more to have credible performance. Accordingly I propose the attached patch. If anyone's interested in experimenting on other platforms, we might be able to refine/complicate the FLUSH_DISTANCE selection further, but I think this is probably good enough for a first cut. regards, tom lane diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c index a5e074e..eae9f5a 100644 --- a/src/backend/storage/file/copydir.c +++ b/src/backend/storage/file/copydir.c @@ -139,10 +139,24 @@ copy_file(char *fromfile, char *tofile) int dstfd; int nbytes; off_t offset; + off_t flush_offset; - /* Use palloc to ensure we get a maxaligned buffer */ + /* Size of copy buffer (read and write requests) */ #define COPY_BUF_SIZE (8 * BLCKSZ) + /* + * Size of data flush requests. It seems beneficial on most platforms to + * do this every 1MB or so. But macOS, at least with early releases of + * APFS, is really unfriendly to small mmap/msync requests, so there do it + * only every 32MB. + */ +#if defined(__darwin__) +#define FLUSH_DISTANCE (32 * 1024 * 1024) +#else +#define FLUSH_DISTANCE (1024 * 1024) +#endif + + /* Use palloc to ensure we get a maxaligned buffer */ buffer = palloc(COPY_BUF_SIZE); /* @@ -163,11 +177,23 @@ copy_file(char *fromfile, char *tofile) /* * Do the data copying. */ + flush_offset = 0; for (offset = 0;; offset += nbytes) { /* If we got a cancel signal during the copy of the file, quit */ CHECK_FOR_INTERRUPTS(); + /* + * We fsync the files later, but during the copy, flush them every so + * often to avoid spamming the cache and hopefully get the kernel to + * start writing them out before the fsync comes. + */ + if (offset - flush_offset >= FLUSH_DISTANCE) + { + pg_flush_data(dstfd, flush_offset, offset - flush_offset); + flush_offset = offset; + } + pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ); nbytes = read(srcfd, buffer, COPY_BUF_SIZE); pgstat_report_wait_end(); @@ -190,15 +216,11 @@ copy_file(char *fromfile, char *tofile) errmsg("could not write to file \"%s\": %m", tofile))); } pgstat_report_wait_end(); - - /* - * We fsync the files later but first flush them to avoid spamming the - * cache and hopefully get the kernel to start writing them out before - * the fsync comes. - */ - pg_flush_data(dstfd, offset, nbytes); } + if (offset > flush_offset) + pg_flush_data(dstfd, flush_offset, offset - flush_offset); + if (CloseTransientFile(dstfd)) ereport(ERROR, (errcode_for_file_access(), -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: