Thread: tuplestore_putvalues()

tuplestore_putvalues()

From
Neil Conway
Date:
Attached is a patch that allows an array of Datums + nulls to be
inserted into a tuplestore without first creating a HeapTuple, per
recent suggestion on -hackers. This avoids making an unnecessary copy.
There isn't a really analogous optimization to be applied to tuplesort:
it takes either a TTS, an IndexTuple or a basic Datum, none of which
involve an extra copy.

BTW, I notice that almost all of the callers of the various
tuplestore_put methods switch into the tuplestore's context first. We
could simplify their lives a bit by having the tuplestore remember the
context in which it is allocated and do the switch itself. Perhaps it's
not worth bothering with at this point, though.

-Neil


Attachment

Re: tuplestore_putvalues()

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Attached is a patch that allows an array of Datums + nulls to be
> inserted into a tuplestore without first creating a HeapTuple, per
> recent suggestion on -hackers. This avoids making an unnecessary copy.

A small thought here: we were jousting recently over a point that came
down to whether or not tuplestore kept track of the tupdesc for the
tuples it was storing.  I can hardly imagine a use-case for a tuplestore
in which the tuples don't all have the same tupdesc.  I think I dropped
tupdesc from tuplestore's original API on the grounds that it wasn't
doing anything much with the tupdesc.  But now this patch adds back a
tuplestore API call that needs the tupdesc.  Would it be saner to supply
the tupdesc to tuplestore_begin_heap instead, as tuplesort does?

I haven't looked at all into what the implications of this would be,
either from a performance or number-of-places-to-change standpoint.
But it seems worth a bit of investigation while we're touching the
code.

Other than that issue, the patch seems OK in a quick once-over.

            regards, tom lane

Re: tuplestore_putvalues()

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Attached is a patch that allows an array of Datums + nulls to be
> inserted into a tuplestore without first creating a HeapTuple, per
> recent suggestion on -hackers. This avoids making an unnecessary copy.
> There isn't a really analogous optimization to be applied to tuplesort:
> it takes either a TTS, an IndexTuple or a basic Datum, none of which
> involve an extra copy.

After a quick read, looks sane except for one stylistic gripe:
in exec_stmt_return_next, you added an initialization of tuple = NULL
in order to remove a couple of lines like

            tuple = NULL;    /* keep compiler quiet */

I think this is not good coding style.  The point of not having the
initialization is so that the compiler will warn us if there are
any paths through the function in which we fail to set "tuple".
You've just disabled that bit of early-warning error detection.
It's better if each switch arm has to set tuple for itself.
(If only a minority of them needed to do it, I might think
differently, but in this case I'd vote for sticking with the
way it's being done.)

            regards, tom lane

Re: tuplestore_putvalues()

From
Neil Conway
Date:
On Sat, 2008-03-22 at 21:35 -0400, Tom Lane wrote:
> After a quick read, looks sane except for one stylistic gripe:
> in exec_stmt_return_next, you added an initialization of tuple = NULL
> in order to remove a couple of lines like
>
>             tuple = NULL;    /* keep compiler quiet */
>
> I think this is not good coding style.  The point of not having the
> initialization is so that the compiler will warn us if there are
> any paths through the function in which we fail to set "tuple".
> You've just disabled that bit of early-warning error detection.
> It's better if each switch arm has to set tuple for itself.
> (If only a minority of them needed to do it, I might think
> differently, but in this case I'd vote for sticking with the
> way it's being done.)

I agree with your general reasoning, but in this case, it seemed cleaner
to initialize "tuple" when it is declared: only 2 of the 6 branches in
the function actually initialize tuple to a non-NULL value.

I think the warn-about-uninitialized-value argument doesn't hold much
water in this particular case, either: each branch of the function
either makes use of "tuple", or does not. When "tuple" is utilized, we
want to push it into the tuplestore; otherwise, we want to do nothing.
Since "tuple" is only used in two cases, it seems cleaner to me to have
the default be "do nothing", and only deviate from it where needed.

-Neil



Re: tuplestore_putvalues()

From
Neil Conway
Date:
On Thu, 2008-02-28 at 16:37 -0800, Neil Conway wrote:
> Attached is a patch that allows an array of Datums + nulls to be
> inserted into a tuplestore without first creating a HeapTuple, per
> recent suggestion on -hackers. This avoids making an unnecessary copy.

Applied to HEAD.

-Neil