Eliminating CREATE INDEX comparator TID tie-breaker overhead - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Eliminating CREATE INDEX comparator TID tie-breaker overhead |
Date | |
Msg-id | CAM3SWZQpZygWdVUC6H_rctP5_FS7g7_EKz7fqeOa6hYiULU6pA@mail.gmail.com Whole thread Raw |
Responses |
Re: Eliminating CREATE INDEX comparator TID tie-breaker overhead
|
List | pgsql-hackers |
I have one more idea for accelerating sorting-related operations that is both fairly effective and relatively easy to implement. This just about clears my backlog of those, though. There is an open item on the TODO list entitled "Consider whether duplicate keys should be sorted by block/offset" [1]. Currently, comparetup_index_btree() and comparetup_index_hash() do a tie-breaker on ItemPointer (heap TID). This started as insurance against bad qsort() implementations that do badly with many equal keys, but it was also thought that there is some value in having equal keys be in physical (heap) order. Clearly the first concern has been obsolete since 2006, when a high quality qsort() was added to Postgres, but the second concern is probably still valid. Overhead ------------- Tom said in 2008 that the CREATE INDEX overhead of doing this may be about 7% [2]. It seems even worse now, presumably due to SortSupport reducing costs elsewhere. Several years ago, I suggested ripping out the tie-breaker too, but didn't get very far with that idea due to the potential downsides for index scans. I'm not about to revisit the discussion from 2011 about whether or not it should be torn out. I'd rather just (mostly) fix the real problem without changing the behavior of comparetup_index_btree() (or comparetup_index_hash()). The real problem is that we do pointer chasing to get to the IndexTuple ("tuple proper"), where we might get away with only examining the SortTuple, as would happen in comparetup_heap() in the event of an equivalent heap tuplesort, for example. The added memory latency hurts lower cardinality attributes (when only one attribute appears in the Index definition and the IndexTuple isn't cache resident), and wastes memory bandwidth on the system, which is something that there is virtually never enough of. Patch ===== Attached patch adds a new tie-breaker which is used only when the tuples being sorted still fit in memory. Currently, the field SortTuple.tupindex has a different purpose depending on whether we're building initial runs or merging runs. However, SortTuple.tupindex currently has no purpose when a sort can complete entirely in memory, so that case is available to support a third meaning: SortTuple.tupindex can be used to hold a simple ordinal number. When our status is TSS_INITIAL (when we qsort()), this is used as a proxy for what the TID tie-breaker would ordinarily return. This reliably preserves the existing behavior because index tuplesort clients invariably scan heap tuples in physical order, and so nothing is lost. Performance ----------------- The patch can make CREATE INDEX with an unlogged B-Tree index with a single int4 attribute as much as 15% faster (although I think under 10% is much more likely). That seems like an improvement that makes the patch worthwhile. Design considerations and consequences -------------------------------------------------------- I don't think that there is much point in getting rid of SortTuple.tupindex for in-memory sorts, which this patch closes off the possibility of. I tested the performance of a SortTuple struct with only a datum1 and "tuple proper" for in-memory sorting at one time, and it was disappointing, and in any case unworkably invasive. I also don't think it's worth worrying about the fact that tupindex is assigned an ordinal number even in cases where it won't be used inside the comparator. The overhead has to be indistinguishable from zero anyway, and the right place to put that assignment happens to be where there is generic TSS_INITIAL copying within puttuple_common(). I'm not concerned about synchronized scans breaking my assumption of a physical ordering of heaptuples being fed to tuplesort.c. I think that it is unlikely to ever be worth seriously considering this case. I have a hard time imagining anything (beyond synchronous scans) breaking my assumption that index tuplesorts receive tuples in heap physical order. If anything was to break that in the future (e.g. parallelizing the heap scan for index builds), then IMV the onus should be on that new case to take appropriate precautions against breaking my assumption. [1] https://wiki.postgresql.org/wiki/Todo#Sorting [2] http://www.postgresql.org/message-id/23321.1205726381@sss.pgh.pa.us -- Peter Geoghegan
Attachment
pgsql-hackers by date: