Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array |
Date | |
Msg-id | CAD21AoD7=PU34+FHLQ6Y4Zj=_icCxk36r_-E6BmCz_t70NPsVQ@mail.gmail.com Whole thread Raw |
In response to | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array (Xuneng Zhou <xunengzhou@gmail.com>) |
Responses |
Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
|
List | pgsql-hackers |
On Mon, Oct 20, 2025 at 11:13 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi, > > Thanks for looking into this. > > On Tue, Oct 21, 2025 at 1:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > On Tue, 21 Oct 2025 at 04:31, Michael Paquier <michael@paquier.xyz> wrote: > > > > > > On Sat, Oct 18, 2025 at 01:59:40PM +0500, Kirill Reshke wrote: > > > > Indeed, these changes look correct. > > > > I wonder why b89e151054a0 did this place this way, hope we do not miss > > > > anything here. > > > > > > Perhaps a lack of time back in 2014? It feels like an item where we > > > would need to research a bit some of the past threads, and see if this > > > has been discussed, or if there were other potential alternatives > > > discussed. This is not saying that what you are doing in this > > > proposal is actually bad, but it's a bit hard to say what an > > > "algorithm" should look like in this specific code path with XID > > > manipulations. Perhaps since 2014, we may have other places in the > > > tree that share similar characteristics as what's done here. > > > > > > So it feels like this needs a bit more historical investigation first, > > > rather than saying that your proposal is the best choice on the table. > > > -- > > > Michael > > > > Sure > > > > Commit b89e151054a0 comes from [0] > > Comment of SnapBuildPurgeCommittedTxn tracks to [1] (it was in form > > "XXX: Neater algorithm?") > > > > Between these two messages, it was not disucccesseed... > > > > I will also study other related threads like [2], but i don't think > > they will give more insight here. > > > > [0] https://www.postgresql.org/message-id/20140303162652.GB16654%40awork2.anarazel.de > > [1] https://www.postgresql.org/message-id/20140115002223.GA17204%40awork2.anarazel.de > > [2] https://www.postgresql.org/message-id/flat/20130914204913.GA4071%40awork2.anarazel.de#:~:text=20130914204913.GA4071%40awork2.anarazel.de > > Introducing logical decoding was a major feature, and I assume there > were more critical design decisions and trade-offs to consider at that > time. > > I proposed this refactor not because it is the most significant > optimization, but because it seems to be a low-hanging fruit with > clear benefits. By using an in-place two-pointer compaction, we can > eliminate the extra memory allocation and copy-back step without > introducing risks to this well-tested code path. I agree the proposed in-pace update is better than the current copy-and-iterating approach. While the benefit might not be visible as it's not a known hot-path, I find that the proposed patch makes sense to improve the current codes. It could help some workloads where there are many DDLs (e.g., creating temporary tables in many transactions). > > A comparable optimization exists in KnownAssignedXidsCompress() which > uses the same algorithm to remove stale XIDs without workspace > allocation. That implementation also adds a lazy compaction heuristic > that delays compaction until a threshold of removed entries is > reached, amortizing the O(N) cost across multiple operations. > > The comment above the data structure mentions the trade-off of keeping > the committed.xip array sorted versus unsorted. If the array were > sorted, we could use a binary search combined with memmove to compact > it efficiently, achieving O(log n + n) complexity for purging. > However, that design would increase the complexity of > SnapBuildAddCommittedTxn from O(1) to O(n) and "more complicated wrt > wraparound". > > /* > * Array of committed transactions that have modified the catalog. > * > * As this array is frequently modified we do *not* keep it in > * xidComparator order. Instead we sort the array when building & > * distributing a snapshot. > * > * TODO: It's unclear whether that reasoning has much merit. Every > * time we add something here after becoming consistent will also > * require distributing a snapshot. Storing them sorted would > * potentially also make it easier to purge (but more complicated wrt > * wraparound?). Should be improved if sorting while building the > * snapshot shows up in profiles. > */ > TransactionId *xip; > } committed; > > /* > * Keep track of a new catalog changing transaction that has committed. > */ > static void > SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid) > { > Assert(TransactionIdIsValid(xid)); > > if (builder->committed.xcnt == builder->committed.xcnt_space) > { > builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1; > > elog(DEBUG1, "increasing space for committed transactions to %u", > (uint32) builder->committed.xcnt_space); > > builder->committed.xip = repalloc(builder->committed.xip, > builder->committed.xcnt_space * sizeof(TransactionId)); > } > > /* > * TODO: It might make sense to keep the array sorted here instead of > * doing it every time we build a new snapshot. On the other hand this > * gets called repeatedly when a transaction with subtransactions commits. > */ > builder->committed.xip[builder->committed.xcnt++] = xid; > } > > It might be worth profiling this function to evaluate whether > maintaining a sorted array could bring potential benefits, although > accurately measuring its end-to-end impact could be difficult if it > isn’t a known hotspot. I also did a brief search on the mailing list > and found no reports of performance concerns or related proposals to > optimize this part of the code. It might also be worth researching what kind of workloads would need a better algorithm in terms of storing/updating xip and subxip arrays since it would be the primary motivation. Also, otherwise we cannot measure the real-world impact of a new algorithm. Having said that, I find that it would be discussed and developed separately from the proposed patch on this thread. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: