Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits |
Date | |
Msg-id | 20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de Whole thread Raw |
In response to | Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits (Andres Freund <andres@anarazel.de>) |
Responses |
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
|
List | pgsql-hackers |
Hi, On 2019-04-04 21:04:49 -0700, Andres Freund wrote: > On 2019-04-04 23:57:58 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > I think the right approach would be to do all of this in heap_insert and > > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > > is specified, remember whether it is either currently empty, or is > > > already marked as all-visible. If previously empty, mark it as all > > > visible at the end. If already all visible, there's no need to change > > > that. If not yet all-visible, no need to do anything, since it can't > > > have been inserted with COPY FREEZE. Do you see any problem doing it > > > that way? > > > > Do we want to add overhead to these hot-spot routines for this purpose? > > For heap_multi_insert I can't see it being a problem - it's only used > from copy.c, and the cost should be "smeared" over many tuples. I'd > assume that compared with locking a page, WAL logging, etc, it'd not > even meaningfully show up for heap_insert. Especially because we already > have codepaths for options & HEAP_INSERT_FROZEN in > heap_prepare_insert(), and I'd assume those could be combined. > > I think we should measure it, but I don't think that one or two > additional, very well predictd, branches are going to be measurable in > in those routines. > > The patch, as implemented, has modifications in > RelationGetBufferForTuple(), that seems like they'd be more expensive. Here's a *prototype* patch for this. It only implements what I described for heap_multi_insert, not for plain inserts. I wanted to see what others think before investing additional time. I don't think it's possible to see the overhead of the changed code in heap_multi_insert(), and probably - with less confidence - that it's also going to be ok for heap_insert(). But we gotta measure that. This avoids an extra WAL record for setting empty pages to all visible, by adding XLH_INSERT_ALL_VISIBLE_SET & XLH_INSERT_ALL_FROZEN_SET, and setting those when appropriate in heap_multi_insert. Unfortunately currently visibilitymap_set() doesn't really properly allow to do this, as it has embedded WAL logging for heap. I think we should remove the WAL logging from visibilitymap_set(), and move it to a separate, heap specific, function. Right now different tableams e.g. would have to reimplement visibilitymap_set(), so that's a second need to separate that functionality. Let me try to come up with a proposal. The patch currently does a vmbuffer_pin() while holding an exclusive lwlock on the page. That's something we normally try to avoid - but I think it's probably OK here, because INSERT_FROZEN can only be used to insert into a new relfilenode (which thus no other session can see). I think it's preferrable to have this logic in specific to the INSERT_FROZEN path, rather than adding nontrivial complications to RelationGetBufferForTuple(). I noticed that, before this patch, we do a if (vmbuffer != InvalidBuffer) ReleaseBuffer(vmbuffer); after every filled page - that doesn't strike me as particularly smart - it's pretty likely that the next heap page to be filled is going to be on the same vm page as the previous iteration. I noticed one small oddity that I think is common to all the approaches presented in this thread so far. After BEGIN; TRUNCATE foo; COPY foo(i) FROM '/tmp/foo' WITH FREEZE; COPY foo(i) FROM '/tmp/foo' WITH FREEZE; COPY foo(i) FROM '/tmp/foo' WITH FREEZE; COMMIT; we currently end up with pages like: ┌───────┬───────────┬──────────┬───────┬───────┬───────┬─────────┬──────────┬─────────┬───────────┐ │ blkno │ lsn │ checksum │ flags │ lower │ upper │ special │ pagesize │ version │ prune_xid │ ├───────┼───────────┼──────────┼───────┼───────┼───────┼─────────┼──────────┼─────────┼───────────┤ │ 0 │ 0/50B5488 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 1 │ 0/50B6360 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 2 │ 0/50B71B8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 3 │ 0/50B8028 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 4 │ 0/50B8660 │ 0 │ 4 │ 408 │ 5120 │ 8192 │ 8192 │ 4 │ 0 │ │ 5 │ 0/50B94B8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 6 │ 0/50BA328 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 7 │ 0/50BB180 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 8 │ 0/50BBFD8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 9 │ 0/50BCF88 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 10 │ 0/50BDDE0 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 11 │ 0/50BEC50 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 12 │ 0/50BFAA8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 13 │ 0/50C06F8 │ 0 │ 4 │ 792 │ 2048 │ 8192 │ 8192 │ 4 │ 0 │ └───────┴───────────┴──────────┴───────┴───────┴───────┴─────────┴──────────┴─────────┴───────────┘ (14 rows) Note how block 4 has more space available. That's because the visibilitymap_pin() called in the first COPY has to vm_extend(), which in turn does: /* * Send a shared-inval message to force other backends to close any smgr * references they may have for this rel, which we are about to change. * This is a useful optimization because it means that backends don't have * to keep checking for creation or extension of the file, which happens * infrequently. */ CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode); which invalidates ->rd_smgr->smgr_targblock *after* the first COPY, because that's when the pending smgr invalidations are sent out. That's far from great, but it doesn't seem to be this patch's fault. It seems to me we need a separate invalidation that doesn't close the whole smgr relation, but just invalidates the VM specific fields. Greetings, Andres Freund
Attachment
pgsql-hackers by date: