Thread: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Peter Geoghegan
Date:
Postgres 9.6 saw the performance characteristics of sorting significantly altered. While almost every external sort will see a benefit without any change in configuration, there is a new scope for DBAs to tune the system to better take advantage of the improved implementation of external sorting. They should get guidance from us on this to make it more likely that this actually happens, though. I'd like to discuss where this guidance should be added, but first, a little background. Firstly, DBAs may get a significant, appreciable benefit from making sure that temp_tablespaces puts temp files on a fast filesystem; this would probably have been far less helpful prior to 9.6. It seems reasonable to suppose that explicitly setting temp_tablespaces does not happen all that often on production installations today, since external sorts were stalled on memory access most of the time with the old replacement selection approach. While memory stalls remain a big cost in 9.6, the situation is much improved. External sorts may be significantly I/O bound far more frequently in 9.6, especially during merging (which is also optimized in 9.6), and especially during merging for CREATE INDEX in particular. In the latter case, it's likely that the system writes index tuples and WAL out as it reads runs in, with everything on the same filesystem. Modern filesystems are really good at read-ahead and write-behind. It really is not at all unexpected for sequential I/O with fast *disks* to be faster than random *memory* access [1], and so I really doubt that there is any question that I/O will now matter more. I expect that blocking on sequential I/O will become much more of a problem when parallel sort is added, but it's still going to be a problem with serial sorts on busy production systems servicing many clients. (With only a single client, this is much less true, but that's seldom the case.) Secondly, with 9.6 it's roughly true that the more memory you can spare for maintenance_work_mem and work_mem, the better. There are some caveats that I won't rehash right now; the caveats are rather limited, and DBAs can probably just pretend that the common, simple intuition "more memory is better, until you OOM" is at last true. I don't imagine more information (the caveats) is of much practical use. A duality ========= Adding faster disks to get more sequential write bandwidth is a practical measure available to many DBAs, which is more important for sorting now. While not every user can afford to do so, it's nice that some have the *option* of upgrading to get better performance. It need not be expensive; we get plenty of benefit from cheaper SATA HDDs, since, with care, only sequential I/O performance really matters. This was not really the case in prior releases. At the same time, if you can afford the memory, you should probably just increase work_mem/maintenance_work_mem to get another performance benefit. These two benefits will often be complementary. There may a feedback loop that looks a bit like this: Quicksort is cache oblivious, which implies that we may effectively use more working memory, which implies longer runs, which implies fewer merge passes, which implies that sequential I/O performance is mostly where I/O costs are paid, which implies that you can manage I/O costs by adding (consumer grade?) SATA HDDs configured in RAID0 for temp files, which implies that sorts finish sooner, which implies that in aggregate memory use is lower, which implies that you can afford to set work_mem/maintenance_work_mem higher still, which implies ... . This may seem facile, but consider the huge difference in costs I'm focussing on. There is a huge difference between the cost of random I/O and sequential I/O (when FS cache is not able to "amortize" the random I/O), just as there is a huge difference between the cost of an L1 cache access, and main memory access (a cache miss). So, if I can be forgiven for using such a hand-wavey term, there is an interesting duality here. We should avoid "the expensive variety" of I/O (random I/O) wherever possible, and avoid "the expensive variety" of memory access (involving a last-level CPU cache miss) as much as possible. Certainly, the economics of modern hardware strongly support increasing locality of access at every level. We're talking about differences of perhaps several orders of magnitude. I/O bound sorts in earlier releases ----------------------------------- Perhaps someone has personally seen a DBA adding a new temp_tablespace on a separate, faster filesystem. Perhaps this clearly improved performance. I don't think that undermines my argument, though. This *does* sometimes happen on prior versions of Postgres, but my guess is that this is almost accidental: * Increasing work_mem/maintenance_work_mem perversely makes external sorts *slower* before 9.6. Iterative tuning of these settings on busy production systems will therefore tend to guide the DBA to decrease work_mem/maintenance_work_mem settings (maybe external sorts got so slow that OOMs happened). Whether or not the DBA realizes it, the counter-intuitive slowdown occurs because replacement selection is sensitive to CPU cache size. * As those memory settings are pushed down, runs become smaller, and more merge steps are required for any larger external sorts performed. Merge steps increase the amount of I/O, particularly when a few passes are necessary; *random* I/O suddenly spikes. Of course, this could have been avoided if work_mem/maintenance_work_mem were higher, but that's already been ruled out. Documentation ============= I think we can expand "21.6. Tablespaces" to describe the implications of these new performance characteristics. I'd like to hear opinions on how to approach that before proposing a patch, though. The basic guidance should, IMV, be: * A temp tablespace with cheaper disks that have good sequential I/O performance can speed up external sorts quite a lot. Probably not a great idea to have many temp tablespaces. Use RAID0 instead, because that performs better, and because it doesn't matter that temp files are not recoverable if a disk is faulty. * More memory for sorting and hashing is often better in PostgreSQL 9.6. Notably, the performance of hash joins that spill will tend to degrade less predictably than the performance of sorts that spill as less memory is made available. (Perhaps mention the number of external sort passes?) * Increasing work_mem/maintenance_work_mem may fail to improve performance only because sorts then become more I/O bound. When in doubt, testing is advised. A balance may need to be found, if only to avoid wasting memory. Thoughts? [1] https://queue.acm.org/detail.cfm?id=1563874 -- Peter Geoghegan
Re: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Bruce Momjian
Date:
On Thu, Apr 21, 2016 at 08:37:54PM -0700, Peter Geoghegan wrote: > Documentation > ============= > > I think we can expand "21.6. Tablespaces" to describe the implications > of these new performance characteristics. I'd like to hear opinions on > how to approach that before proposing a patch, though. The basic > guidance should, IMV, be: > > * A temp tablespace with cheaper disks that have good sequential I/O > performance can speed up external sorts quite a lot. Probably not a > great idea to have many temp tablespaces. Use RAID0 instead, because > that performs better, and because it doesn't matter that temp files > are not recoverable if a disk is faulty. > > * More memory for sorting and hashing is often better in PostgreSQL > 9.6. Notably, the performance of hash joins that spill will tend to > degrade less predictably than the performance of sorts that spill as > less memory is made available. (Perhaps mention the number of external > sort passes?) > > * Increasing work_mem/maintenance_work_mem may fail to improve > performance only because sorts then become more I/O bound. When in > doubt, testing is advised. A balance may need to be found, if only to > avoid wasting memory. This seems very detailed. I think we need much broader coverage of how the existing GUC variables affect performance before we could cover this. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Peter Geoghegan
Date:
On Sat, Apr 30, 2016 at 9:30 AM, Bruce Momjian <bruce@momjian.us> wrote: >> I think we can expand "21.6. Tablespaces" to describe the implications >> of these new performance characteristics. I'd like to hear opinions on >> how to approach that before proposing a patch, though. > This seems very detailed. I think we need much broader coverage of how > the existing GUC variables affect performance before we could cover > this. While my justification is very detailed, I think that the practical guidance itself is fairly simple. In fact, making sure that people don't miss that things have become *simpler* across the board is the primary point that I'd like to make to users. As of 9.6, external sorting finally benefits from very large amounts of memory, certainly in the multiple gigabyte range, where before performance steadily declined as work_mem increased past a fairly small amount (less than 100MB in my tests). So, DBAs had competing, even contradictory considerations: keep work_mem high enough to make most sorts internal. But if a sort must be external, make sure that the work_mem it has available is quite low. Say, less than 100MB. This set of characteristics is almost impossible to tune for. Other operations that use temporary memory bound in size by work_mem have always benefited from increasing work_mem settings in a more or less predictable, linear way, so the *general* picture about what to do becomes far clearer (we don't really *need* to talk about sorting at all). Presumably due to the old issues with tuplesort, the closest the docs get to recommending higher work_mem or maintenance_work_mem settings is: "Larger [maintenance_work_mem] settings might improve performance for vacuuming and for restoring database dumps". That's it! Since the performance characteristics of external sorting are now roughly in line with everything else, why continue to make such a weak statement in 9.6? It's not hard to understand why we originally equivocated here, but things have changed. I hardly realized that the docs are only lukewarm on the idea that increasing work_mem will ever help *anything* at all. It's easy to fail to notice that when you're an expert. We provide *zero* guidance on how to tune work_mem or maintenance_work_mem. Surely we can do better than that. A secondary point I'd like to make is that if and when no further benefit can be observed from increasing work_mem, it's well worth considering that more I/O bandwidth for temp files will help. Obviously, this assumes the DBA avoids swapping when using all that memory, for example by setting vm.swappiness appropriately on Linux, while also avoiding OOMs. At the point that no further benefit can be obtained by increasing work_mem, sequential I/O should be strongly considered as a possible bottleneck to target. That's a nice problem to have, because you can buy fairly inexpensive HDDs for temp files that will increase the point at which higher work_mem settings will no longer help. So, we can talk about this stuff without necessarily even mentioning external sorting. I didn't mention it before now, but as it happens the 2009 ACM article I linked to already (https://queue.acm.org/detail.cfm?id=1563874) mentions that they found performance fell sharply past a certain point when using Postgres for a large aggregate SQL query: """ Invoking the DBMS’s built-in EXPLAIN facility revealed the problem: while the query planner chose a reasonable hash table-based aggregation strategy for small tables, on larger tables it switched to sorting by grouping columns—a viable, if suboptimal strategy given a few million rows, but a very poor one when facing a billion. """ I think that this must be describing tuplesort's previous use of replacement selection to sort runs -- it might have been the case that an internal sort for the GroupAggregate was "viable, if suboptimal" then, but an external sort was just unacceptable. 9.6 really blurs the distinction between internal and external sorts a lot of the time. It was common to see a big external sort with ~5 runs taking no longer than 110% of the time of an equivalent internal sort when testing the 9.6 sorting stuff. I would like to help DBAs keep up the benefits when they have hundreds of gigabytes or even terabytes of data to aggregate through. If you look at the graph in that article, it isn't that hard to imagine that a well-tuned implementation could avoid that superlinear growth in query runtime, because *superlinear* growth isn't predicted by any theoretical model. The article is incorrect to state "There is no pathology here". Although, that doesn't undermine the author's argument too much; I find it rather convincing overall. The fact that a single node will eventually fall short doesn't mean it isn't important to push your single node further, to maintain an unsharded Postgres instance as the right choice for longer (by getting the full benefit of the hardware). Getting this right could allow unsharded Postgres to go a lot further (parallelism is the final part of this), which could make all the difference for some users. -- Peter Geoghegan
Re: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Bruce Momjian
Date:
On Sat, Apr 30, 2016 at 12:19:02PM -0700, Peter Geoghegan wrote: > Presumably due to the old issues with tuplesort, the closest the docs > get to recommending higher work_mem or maintenance_work_mem settings > is: "Larger [maintenance_work_mem] settings might improve performance > for vacuuming and for restoring database dumps". That's it! Since the > performance characteristics of external sorting are now roughly in > line with everything else, why continue to make such a weak statement > in 9.6? It's not hard to understand why we originally equivocated > here, but things have changed. Yes, this needs updating. My point is that there is a whole lot of things we don't talk about in this area, and should, but I would like it to be of a consistent level of detail for all areas of performancce. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Peter Geoghegan
Date:
On Sat, Apr 30, 2016 at 3:23 PM, Bruce Momjian <bruce@momjian.us> wrote: > Yes, this needs updating. My point is that there is a whole lot of > things we don't talk about in this area, and should, but I would like it > to be of a consistent level of detail for all areas of performancce. I think that we need to do better generally too, but the existing handling of performance, such as it is, is not consistent in the level of detail it goes into. For example, we give far more advice about setting the value of commit_delay than setting the value of work_mem, even though that's clearly a niche topic in comparison. You can say the same thing about effective_io_concurrency. 95%+ of all users don't use either setting, making that documentation irrelevant to them. I think that this is simply because it was hard to make a good recommendation about work_mem, but that's now less true overall. We don't like equivocating, so we said only the absolute minimum. ISTM that the area that needs the most attention is planner stuff, and query workspace memory stuff (e.g. work_mem, temp files). work_mem and maintenance_work_mem seem like good places to start adding more practical advise, particularly given we can avoid mentioning sorting or hashing, and still add value. Maybe there is a place to emphasize this change in the release notes. I don't really want to make it about the external sort feature, though, because enabling higher work_mem settings by making sure that does not disadvantage external sorts is as much about enabling HashAggregates as it is about enabling internal sorts. -- Peter Geoghegan
Re: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Bruce Momjian
Date:
On Sat, Apr 30, 2016 at 04:23:00PM -0700, Peter Geoghegan wrote: > Maybe there is a place to emphasize this change in the release notes. > I don't really want to make it about the external sort feature, > though, because enabling higher work_mem settings by making sure that > does not disadvantage external sorts is as much about enabling > HashAggregates as it is about enabling internal sorts. We do often mention in the release notes areas that will need retuning. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Peter Geoghegan
Date:
On Sat, Apr 30, 2016 at 4:26 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Sat, Apr 30, 2016 at 04:23:00PM -0700, Peter Geoghegan wrote: >> Maybe there is a place to emphasize this change in the release notes. >> I don't really want to make it about the external sort feature, >> though, because enabling higher work_mem settings by making sure that >> does not disadvantage external sorts is as much about enabling >> HashAggregates as it is about enabling internal sorts. > > We do often mention in the release notes areas that will need retuning. How do you feel about it in this instance? As you may have gathered, my perspective is that the external sorting patches were more about fixing a problem with tuplesort than about improving its performance. The performance characteristics of the old approach to sorting runs were all over the place, which made increasing work_mem like taking one step forward, then two steps backwards. -- Peter Geoghegan
Re: New 9.6 external sort guidance around temp_tablespaces and maintenance_work_mem
From
Bruce Momjian
Date:
On Sat, Apr 30, 2016 at 04:39:22PM -0700, Peter Geoghegan wrote: > On Sat, Apr 30, 2016 at 4:26 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Sat, Apr 30, 2016 at 04:23:00PM -0700, Peter Geoghegan wrote: > >> Maybe there is a place to emphasize this change in the release notes. > >> I don't really want to make it about the external sort feature, > >> though, because enabling higher work_mem settings by making sure that > >> does not disadvantage external sorts is as much about enabling > >> HashAggregates as it is about enabling internal sorts. > > > > We do often mention in the release notes areas that will need retuning. > > How do you feel about it in this instance? > > As you may have gathered, my perspective is that the external sorting > patches were more about fixing a problem with tuplesort than about > improving its performance. The performance characteristics of the old > approach to sorting runs were all over the place, which made > increasing work_mem like taking one step forward, then two steps > backwards. Yes, we should mention something. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +