Re: avoiding tuple copying in btree index builds - Mailing list pgsql-hackers

From Robert Haas
Subject Re: avoiding tuple copying in btree index builds
Date
Msg-id CA+TgmoZE-qGLioTijmBH2KUVn4pwHAm3Likty9BpRdyTzJfn0A@mail.gmail.com
Whole thread Raw
In response to Re: avoiding tuple copying in btree index builds  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: avoiding tuple copying in btree index builds  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Jun 16, 2014 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On further review, this is definitely the way to go: it's a
>> straight-up win.  The isnull array is never more than one element in
>> length, so testing the single element is quite trivial.   The
>> attached, revised patch provides a modest but useful speedup for both
>> hash and btree index builds.
>
>> Anyone see any reason NOT to do this?  So far it looks like a
>> slam-dunk from here.
>
> If index_form_tuple leaks any memory in the sort context, this would be
> not so much a slam-dunk win as a complete disaster.  I'm not sure that
> no-leak is a safe assumption, especially in cases where we do toast
> compression of the index datums.  (In fact, I'm pretty sure that the
> reason it was coded like this originally was exactly to avoid that
> assumption.)

Yes, that would be bad.  The reason I think it's probably OK is that
index_form_tuple seems to have had, from ancient times (cf.
f67e79045), code of non-trivial to free any memory that's allocated as
a result of de-TOASTing, which we presumably would not bother to do if
we were counting on memory context resets to clean things up.  It
calls toast_compress_datum(), which is similarly careful.  The most
complicated code path appears to be the VARATT_IS_EXTERNAL() case,
where we call heap_tuple_fetch_attr(), which calls
toast_fetch_datum().  I'm kind of wondering whether that can really
happen, though, because I didn't think indexes could use TOAST
storage.  Even if it does happen I don't see any obvious reason why it
should leak, but it'd be harder to be certain that it doesn't.

> Assuming that you inspect the code and determine it's safe, the patch
> had better decorate index_form_tuple with dire warnings about not leaking
> memory; because even if it's safe today, somebody might break it tomorrow.
>
> On a micro-optimization level, it might be worth passing the TID as
> ItemPointer not ItemPointerData (ie, pass a pointer until we get to
> the point of actually inserting the TID into the index tuple).
> I'm not sure that copying odd-size structs should be assumed to be
> efficient.

Yeah, true.  Checking existing precedent, it looks like we usually
pass ItemPointer rather than ItemPointerData, so it's probably a good
idea to do this that way too for reasons of style if nothing else.  I
kind of wonder whether it's really more efficient to pass an 8-byte
pointer to a 6-byte structure than to just pass the structure itself,
but it might be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: UPDATE SET (a,b,c) = (SELECT ...) versus rules
Next
From: Tom Lane
Date:
Subject: Re: UPDATE SET (a,b,c) = (SELECT ...) versus rules