Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array |
| Date | |
| Msg-id | CABPTF7X3w9zQbkakb9BmhVU1j0RNs4BoOr1sw9h0d=PU+WLm1Q@mail.gmail.com Whole thread Raw |
| In response to | Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array (Masahiko Sawada <sawada.mshk@gmail.com>) |
| List | pgsql-hackers |
Hi Sawada-san, Michael, Thanks for your comments on this patch. On Thu, Oct 23, 2025 at 8:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 20, 2025 at 11:13 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > Hi, > > > > Thanks for looking into this. > > > > On Tue, Oct 21, 2025 at 1:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > > > On Tue, 21 Oct 2025 at 04:31, Michael Paquier <michael@paquier.xyz> wrote: > > > > > > > > On Sat, Oct 18, 2025 at 01:59:40PM +0500, Kirill Reshke wrote: > > > > > Indeed, these changes look correct. > > > > > I wonder why b89e151054a0 did this place this way, hope we do not miss > > > > > anything here. > > > > > > > > Perhaps a lack of time back in 2014? It feels like an item where we > > > > would need to research a bit some of the past threads, and see if this > > > > has been discussed, or if there were other potential alternatives > > > > discussed. This is not saying that what you are doing in this > > > > proposal is actually bad, but it's a bit hard to say what an > > > > "algorithm" should look like in this specific code path with XID > > > > manipulations. Perhaps since 2014, we may have other places in the > > > > tree that share similar characteristics as what's done here. > > > > > > > > So it feels like this needs a bit more historical investigation first, > > > > rather than saying that your proposal is the best choice on the table. Following these suggestions, I carefully searched the mailing list archives and found no reports of performance issues directly related to this code path. I also examined other parts of the codebase for similar patterns. Components like integerset might share some characteristics with SnapBuildPurgeOlderTxn, but they have constraints that make them not directly applicable here. I am not very familiar with the whole tree, so the investigation might not be exhaustive. > > > > > > Commit b89e151054a0 comes from [0] > > > Comment of SnapBuildPurgeCommittedTxn tracks to [1] (it was in form > > > "XXX: Neater algorithm?") > > > > > > Between these two messages, it was not disucccesseed... > > > > > > I will also study other related threads like [2], but i don't think > > > they will give more insight here. > > > > > > [0] https://www.postgresql.org/message-id/20140303162652.GB16654%40awork2.anarazel.de > > > [1] https://www.postgresql.org/message-id/20140115002223.GA17204%40awork2.anarazel.de > > > [2] https://www.postgresql.org/message-id/flat/20130914204913.GA4071%40awork2.anarazel.de#:~:text=20130914204913.GA4071%40awork2.anarazel.de > > > > Introducing logical decoding was a major feature, and I assume there > > were more critical design decisions and trade-offs to consider at that > > time. > > > > I proposed this refactor not because it is the most significant > > optimization, but because it seems to be a low-hanging fruit with > > clear benefits. By using an in-place two-pointer compaction, we can > > eliminate the extra memory allocation and copy-back step without > > introducing risks to this well-tested code path. > > I agree the proposed in-pace update is better than the current > copy-and-iterating approach. While the benefit might not be visible as > it's not a known hot-path, I find that the proposed patch makes sense > to improve the current codes. It could help some workloads where there > are many DDLs (e.g., creating temporary tables in many transactions). > I also agree this approach offers better efficiency within the scope of current SnapBuildPurgeOlderTxn. > > > > A comparable optimization exists in KnownAssignedXidsCompress() which > > uses the same algorithm to remove stale XIDs without workspace > > allocation. That implementation also adds a lazy compaction heuristic > > that delays compaction until a threshold of removed entries is > > reached, amortizing the O(N) cost across multiple operations. > > > > The comment above the data structure mentions the trade-off of keeping > > the committed.xip array sorted versus unsorted. If the array were > > sorted, we could use a binary search combined with memmove to compact > > it efficiently, achieving O(log n + n) complexity for purging. > > However, that design would increase the complexity of > > SnapBuildAddCommittedTxn from O(1) to O(n) and "more complicated wrt > > wraparound". > > > > /* > > * Array of committed transactions that have modified the catalog. > > * > > * As this array is frequently modified we do *not* keep it in > > * xidComparator order. Instead we sort the array when building & > > * distributing a snapshot. > > * > > * TODO: It's unclear whether that reasoning has much merit. Every > > * time we add something here after becoming consistent will also > > * require distributing a snapshot. Storing them sorted would > > * potentially also make it easier to purge (but more complicated wrt > > * wraparound?). Should be improved if sorting while building the > > * snapshot shows up in profiles. > > */ > > TransactionId *xip; > > } committed; > > > > /* > > * Keep track of a new catalog changing transaction that has committed. > > */ > > static void > > SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid) > > { > > Assert(TransactionIdIsValid(xid)); > > > > if (builder->committed.xcnt == builder->committed.xcnt_space) > > { > > builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1; > > > > elog(DEBUG1, "increasing space for committed transactions to %u", > > (uint32) builder->committed.xcnt_space); > > > > builder->committed.xip = repalloc(builder->committed.xip, > > builder->committed.xcnt_space * sizeof(TransactionId)); > > } > > > > /* > > * TODO: It might make sense to keep the array sorted here instead of > > * doing it every time we build a new snapshot. On the other hand this > > * gets called repeatedly when a transaction with subtransactions commits. > > */ > > builder->committed.xip[builder->committed.xcnt++] = xid; > > } > > > > It might be worth profiling this function to evaluate whether > > maintaining a sorted array could bring potential benefits, although > > accurately measuring its end-to-end impact could be difficult if it > > isn’t a known hotspot. I also did a brief search on the mailing list > > and found no reports of performance concerns or related proposals to > > optimize this part of the code. > > It might also be worth researching what kind of workloads would need a > better algorithm in terms of storing/updating xip and subxip arrays > since it would be the primary motivation. Also, otherwise we cannot > measure the real-world impact of a new algorithm. Having said that, > find that it would be discussed and developed separately from the > proposed patch on this thread. +1 for researching the workloads that might need a sorted array and more efficient algorithm. This exploration isn’t limited to the scope of SnapBuildPurgeOlderTxn but relates more broadly to the overall snapbuild process, which might be worth discussing in a separate thread as suggested. * TODO: It's unclear whether that reasoning has much merit. Every * time we add something here after becoming consistent will also * require distributing a snapshot. Storing them sorted would * potentially also make it easier to purge (but more complicated wrt * wraparound?). Should be improved if sorting while building the * snapshot shows up in profiles. I also constructed an artificial workload to try to surface the qsort call in SnapBuildBuildSnapshot, though such a scenario seems very unlikely to occur in production. for ((c=1; c<=DDL_CLIENTS; c++)); do ( local seq=1 while (( $(date +%s) < tB_end )); do local tbl="hp_ddl_${c}_$seq" "$psql" -h 127.0.0.1 -p "$PORT" -d postgres -c " BEGIN; CREATE TABLE ${tbl} (id int, data text); CREATE INDEX idx_${tbl} ON ${tbl} (id); INSERT INTO ${tbl} VALUES ($seq, 'd'); DROP TABLE ${tbl}; COMMIT;" >/dev/null 2>&1 || true seq=$((seq+1)) done ) 97.02% 0.00% postgres postgres [.] postmaster_child_launch | ---postmaster_child_launch | |--94.93%--BackendMain | PostgresMain | exec_replication_command | StartLogicalReplication | | | --94.92%--WalSndLoop | | | |--92.24%--XLogSendLogical | | | | | --91.63%--LogicalDecodingProcessRecord | | | | | |--89.55%--xact_decode | | | | | | | --89.23%--DecodeCommit | | | | | | | |--64.64%--SnapBuildCommitTxn | | | | | | | | | |--62.60%--SnapBuildBuildSnapshot | | | | | | | | | | | --62.33%--pg_qsort | | | | | | | | | | | |--56.84%--pg_qsort Best, Xuneng
pgsql-hackers by date: