Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits |
Date | |
Msg-id | 20201030004252.d63rwclys4bniluu@development Whole thread Raw |
In response to | Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits (Tatsuo Ishii <ishii@sraoss.co.jp>) |
Responses |
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
|
List | pgsql-hackers |
Hi, I might be somewhat late to the party, but I've done a bit of benchmarking too ;-) I used TPC-H data from a 100GB test, and tried different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on current master and with the patch. The results look like this (the columns say what combination of COPY and VACUUM was used - e.g. -/FREEZE means plain COPY and VACUUM FREEZE) master: - / - FREEZE / - - / FREEZE FREEZE / FREEZE ---------------------------------------------------------------- COPY 2471 2477 2486 2484 VACUUM 228 209 453 206 patched: - / - FREEZE / - - / FREEZE FREEZE / FREEZE ---------------------------------------------------------------- COPY 2459 2445 2458 2446 VACUUM 227 0 467 0 So I don't really observe any measurable slowdowns in the COPY part (in fact there seems to be a tiny speedup, but it might be just noise). In the VACUUM part, there's clear speedup when the data was already frozen by COPY (Yes, those are zeroes, because it took less than 1 second.) So that looks pretty awesome, I guess. For the record, these tests were run on a server with NVMe SSD, so hopefully reliable and representative numbers. A couple minor comments about the code: 1) Maybe add a comment before the block setting xlrec->flags in heap_multi_insert. It's not very complex, but it used to be a bit simpler, and all the other pieces around have comments, so it won't hurt. 2) I find the "if (all_frozen_set)" block a bit strange. It's a matter of personal preference, but I'd just use a single level of nesting, i.e. something like this: /* if everything frozen, the whole page has to be visible */ Assert(!(all_frozen_set && !PageIsAllVisible(page))); /* * If we've frozen everything on the page, and if we're already * holding pin on the vmbuffer, record that in the visibilitymap. * If we're not holding the pin, it's OK to skip this - it's just * an optimization. * * It's fine to use InvalidTransactionId here - this is only used * when HEAP_INSERT_FROZEN is specified, which intentionally * violates visibility rules. */ if (all_frozen_set && visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer)) visibilitymap_set(...); IMHO it's easier to read, but YMMV. I've also reworded the comment a bit to say what we're doing etc. And I've moved the comment from inside the if block into the main comment - that was making it harder to read too. 3) I see RelationGetBufferForTuple does this: /* * This is for COPY FREEZE needs. If page is empty, * pin vmbuffer to set all_frozen bit */ if ((options & HEAP_INSERT_FROZEN) && (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0)) visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer); so is it actually possible to get into the (all_frozen_set) without holding a pin on the visibilitymap? I haven't investigated this so maybe there are other ways to get into that code block. But the new additions to hio.c get the pin too. 4) In heap_xlog_multi_insert we now have this: if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) PageClearAllVisible(page); if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) PageSetAllVisible(page); IIUC it makes no sense to have both flags at the same time, right? So what about adding Assert(!(XLH_INSERT_ALL_FROZEN_SET && XLH_INSERT_ALL_VISIBLE_CLEARED)); to check that? 5) Not sure we need to explicitly say this is for COPY FREE in all the blocks added to hio.c. IMO it's sufficient to use HEAP_INSERT_FROZEN in the condition, at this level of abstraction. I wonder what to do about the heap_insert - I know there are concerns it would negatively impact regular insert, but is it really true? I suppose this optimization would be valuable even for cases where multi-insert is not possible. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: