On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> (sorry if I CCed the wrong folks, the code has changed a fair bit)
>
> I noticed that several places in the partitioning code look like:
>
> /*
> * If the tuple has been routed, it's been converted to the
> * partition's rowtype, which might differ from the root
> * table's. We must convert it back to the root table's
> * rowtype so that val_desc shown error message matches the
> * input tuple.
> */
> if (resultRelInfo->ri_PartitionRoot)
> {
> TableTuple tuple = ExecFetchSlotTuple(slot);
> TupleConversionMap *map;
>
> rel = resultRelInfo->ri_PartitionRoot;
> tupdesc = RelationGetDescr(rel);
> /* a reverse map */
> map = convert_tuples_by_name(orig_tupdesc, tupdesc,
> gettext_noop("could not convert row type"));
> if (map != NULL)
> {
> tuple = do_convert_tuple(tuple, map);
> ExecSetSlotDescriptor(slot, tupdesc);
> ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> }
> }
>
>
> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> to reallocate the values/isnull arrays (and potentially do desc pinning
> etc).
I bumped into this code yesterday while looking at ExecFetchSlotTuple
and ExecMaterializeSlot usages. I was wondering the same.
ExecSetSlotDescriptor() always frees tts_values and tts_isnull array
even if the tuple descriptor being set has same number of attributes
as previous one. Most of the times that will be the case. I think we
should optimize that case. If you agree, I will submit a patch for
that.
Amit Langote has clarified that this doesn't happen often since
partitioned tables will not require tuple conversion usually. But I
agree that in pathological case this is a performance eater and should
be fixed.
> Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> be a virtual slot. Calling heap_deform_tuple(), as done in
> do_convert_tuple(), might be superfluous work, because the input tuple
> might already be entirely deformed in the input slot.
>
> I think it'd be good to rewrite the code so there's an input and an
> output slot that each will keep their slot descriptors set.
+1 for that.
But I am worried that the code will have to create thousand slots if
there are thousand partitions. I think we will need to see how much
effect that has.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company