Re: maintenance_work_mem and CREATE INDEX time - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: maintenance_work_mem and CREATE INDEX time |
Date | |
Msg-id | CA+HiwqE_iksCRKZ=udU-0R1MiDpb_S2bgM-R9tz+d8rjy=FHVA@mail.gmail.com Whole thread Raw |
In response to | Re: maintenance_work_mem and CREATE INDEX time (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: maintenance_work_mem and CREATE INDEX time
|
List | pgsql-hackers |
On Wed, Jul 24, 2013 at 6:02 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote <amitlangote09@gmail.com> wrote: >> On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote <amitlangote09@gmail.com> wrote: >>> Hello, >>> >>> While understanding the effect of maintenance_work_mem on time taken >>> by CREATE INDEX, I observed that for the values of >>> maintenance_work_mem less than the value for which an internal sort is >>> performed, the time taken by CREATE INDEX increases as >>> maintenance_work_increases. >>> >>> My guess is that for all those values an external sort is chosen at >>> some point and larger the value of maintenance_work_mem, later the >>> switch to external sort would be made causing CREATE INDEX to take >>> longer. That is a smaller value of maintenance_work_mem would be >>> preferred for when external sort is performed anyway. Does that make >>> sense? >>> >> >> Upon further investigation, it is found that the delay to switch to >> external sort caused by a larger value of maintenance_work_mem is >> small compared to the total time of CREATE INDEX. > > If you are using trace_sort to report that, it reports the switch as > happening as soon as it runs out of memory. > > At point, all we have been doing is reading tuples into memory. The > time it takes to do that will depend on maintenance_work_mem, because > that affects how many tuples fit in memory. But all the rest of the > tuples need to be read sooner or later anyway, so pushing more of them > to later doesn't improve things overall it just shifts timing around. > > After it reports the switch, it still needs to heapify the existing > in-memory tuples before the tapesort proper can begin. This is where > the true lost opportunities start to arise, as the large heap starts > driving cache misses which would not happen at all under different > settings. > > Once the existing tuples are heapified, it then continues to use the > heap to pop tuples from it to write out to "tape", and to push newly > read tuples onto it. This also suffers lost opportunities. > > Once all the tuples are written out and it starts merging, then the > large maintenance_work_mem is no longer a penalty as the new heap is > limited by the number of tapes, which is almost always much smaller. > In fact this stage will actually be faster, but not by enough to make > up for the earlier slow down. > > So it is not surprising that the time before the switch is reported is > a small part of the overall time difference. > So, is it the actual sorting (before merging) that suffers with larger maintenance_work_mem? I am sorry but I can not grasp the complexity of external sort code at this point, so all I can say is that during an external sort a smaller value of maintenance_work_mem is beneficial (based on my observations in tests). But how that follows from what is going on in the implementation of external sort is still something I am working on understanding. -- Amit Langote
pgsql-hackers by date: