Re: Gather Merge - Mailing list pgsql-hackers

From Rushabh Lathia
Subject Re: Gather Merge
Date
Msg-id CAGPqQf3-_iFsd6MD9cy16JaAYuHPqWjbt6+zsrD418ngXSZCeA@mail.gmail.com
Whole thread Raw
In response to Re: Gather Merge  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Gather Merge
List pgsql-hackers
Thanks Amit for reviewing this patch.

On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Oct 5, 2016 at 11:35 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> Hi hackers,
>
> Attached is the patch to implement Gather Merge.
>

Couple of review comments:

1.
ExecGatherMerge()
{
..
+ /* No workers? Then never mind. */
+ if (!got_any_worker
||
+ node->nreaders < 2)
+ {
+
ExecShutdownGatherMergeWorkers(node);
+ node->nreaders = 0;
+
}

Are you planning to restrict the use of gather merge based on number
of workers, if there is a valid reason, then I think comments should
be updated for same?


Thanks for catching this. This is left over from the earlier design patch. With
current design we don't have any limitation for the number of worker . I did
the performance testing with setting max_parallel_workers_per_gather to 1
and didn't noticed any performance degradation. So I removed this limitation
into attached patch.

2.
+ExecGatherMerge(GatherMergeState * node){
..
+ if (!node->initialized)
+ {
+ EState   *estate = node->ps.state;
+
GatherMerge *gm = (GatherMerge *) node->ps.plan;
+
+ /*
+ * Sometimes we
might have to run without parallelism; but if parallel
+ * mode is active then we can try to
fire up some workers.
+ */
+ if (gm->num_workers > 0 && IsInParallelMode())
+
{
+ ParallelContext *pcxt;
+ bool got_any_worker =
false;
+
+ /* Initialize the workers required to execute Gather node. */
+
if (!node->pei)
+ node->pei = ExecInitParallelPlan(node-
>ps.lefttree,
+
 estate,
+
 gm->num_workers);
..
}

There is lot of common code between ExecGatherMerge and ExecGather.
Do you think it makes sense to have a common function to avoid the
duplicity?

I see there are small discrepancies in both the codes like I don't see
the use of single_copy flag, as it is present in gather node.


Yes, even I thought to centrilize some things of ExecGather and ExecGatherMerge,
but its really not something that is fixed. And I thought it might change particularly
for the Gather Merge. And as explained by Robert single_copy doesn't make sense
for the Gather Merge. I will still look into this to see if something can be make
centralize.
 
3.
+gather_merge_readnext(GatherMergeState * gm_state, int reader, bool force)
{
..
+ tup = gm_readnext_tuple(gm_state, reader, force, NULL);
+
+ /*
+
 * try to read more tuple into nowait mode and store it into the tuple
+ * array.
+
 */
+ if (HeapTupleIsValid(tup))
+ fill_tuple_array(gm_state, reader);

How the above read tuple is stored in array?  In anycase the above
interface seems slightly awkward to me. Basically, I think what you
are trying to do here is after reading first tuple in waitmode, you
are trying to fill the array by reading more tuples.  So, can't we
push reading of this fist tuple into that function and name it as
form_tuple_array().


Yes, you are right. First its trying to read tuple into wait-mode, and once it
find tuple then its try to fill the tuple array (which basically try to read tuple
into nowait-mode). Reason I keep it separate is because in case of initializing
the gather merge, if we unable to read tuple from all the worker - while trying
re-read, we again try to fill the tuple array for the worker who already produced
atleast a single tuple (see gather_merge_init() for more details). Also I thought
filling tuple array (which basically read tuple into nowait mode) and reading tuple
(into wait-mode) are two separate task - and if its into separate function that code
look more clear. If you have any suggestion for the function name (fill_tuple_array)
then I am open to change that.

 
4.
+create_gather_merge_path(..)
{
..
+  0 /* FIXME: pathnode->limit_tuples*/);

What exactly you want to fix in above code?


Fixed.
 
5.
 +/* Tuple array size */
+#define MAX_TUPLE_STORE 10

Have you tried with other values of MAX_TUPLE_STORE?  If yes, then
what are the results?  I think it is better to add a comment why array
size is best for performance.
 

Actually I was thinking on that, but I don't wanted to add their because its just
performance number on my machine. Anyway I added the comments.
 

6.
+/* INTERFACE ROUTINES
+ * ExecInitGatherMerge - initialize the MergeAppend
node
+ * ExecGatherMerge - retrieve the next tuple from the node
+ *
ExecEndGatherMerge - shut down the MergeAppend node
+ *
ExecReScanGatherMerge - rescan the MergeAppend node

typo.  /MergeAppend/GatherMerge


Fixed.
 
7.
+static TupleTableSlot *gather_merge_getnext(GatherMergeState * gm_state);
+static HeapTuple
gm_readnext_tuple(GatherMergeState * gm_state, int nreader, bool
force, bool *done);

Formatting near GatherMergeState doesn't seem to be appropriate.  I
think you need to add GatherMergeState in typedefs.list and then run
pgindent again.


Fixed.
 
8.
+ /*
+ * Initialize funnel slot to same tuple descriptor as outer plan.
+ */
+ if
(!ExecContextForcesOids(&gm_state->ps, &hasoid))

I think in above comment, you mean Initialize GatherMerge slot.


No, it has to be funnel slot only - its just place holder. For Gather Merge, initialize
one slot per worker and it is done into gather_merge_init(). I will look into this point
to see if I can get rid of funnel slot completely.

9.
+ /* Does tuple array have any avaiable tuples? */
/avaiable/available


Fixed.
 
>
> Open Issue:
>
> - Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 fixed the leak into
> tqueue.c by
> calling gather_readnext() into per-tuple context. Now for gather merge that
> is
> not possible, as we storing the tuple into Tuple array and we want tuple to
> be
> free only its get pass through the merge sort algorithm. One idea is, we can
> also call gm_readnext_tuple() under per-tuple context (which will fix the
> leak
> into tqueue.c) and then store the copy of tuple into tuple array.
>

Won't extra copy per tuple impact performance?  Is the fix in
mentioned commit was for record or composite types, if so, does
GatherMerge support such types and if it support, does it provide any
benefit over Gather?


I don't think was specificially for the record or composite types - but I might be
wrong. As per my understanding commit fix leak into tqueue.c. Fix was to add
standard to call TupleQueueReaderNext() with shorter memory context - so that
tqueue.c doesn't leak the memory.

I have idea to fix this by calling the TupleQueueReaderNext() with per-tuple context,
and then copy the tuple and store it to the tuple array and later with the next run of
ExecStoreTuple() will free the earlier tuple. I will work on that.



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



--
Rushabh Lathia
Attachment

pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: Parallel Index Scans
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Index Scans