> 1.
> +#define BATCH_SORT_MAX_BATCHES 512
>
> Did you decide this number based on some experiment or is there some
> analysis behind selecting this number?
When there are too few batches, if a certain process works too slowly, it will cause unbalanced load.
When there are too many batches, FD will open and close files frequently.
> 2.
> +BatchSortState* ExecInitBatchSort(BatchSort *node, EState *estate, int eflags)
> +{
> + BatchSortState *state;
> + TypeCacheEntry *typentry;
> ....
> + for (i=0;i<node->numGroupCols;++i)
> + {
> ...
> + InitFunctionCallInfoData(*fcinfo, flinfo, 1, attr->attcollation, NULL, NULL);
> + fcinfo->args[0].isnull = false;
> + state->groupFuns = lappend(state->groupFuns, fcinfo);
> + }
>
> From the variable naming, it appeared like the batch sort is dependent
> upon the grouping node. I think instead of using the name
> numGroupCols and groupFuns we need to use names that are more relevant
> to the batch sort something like numSortKey.
Not all data types support both sorting and hashing calculations, such as user-defined data types.
We do not need all columns to support hash calculation when we batch, so I used two variables.
> 3.
> + if (eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))
> + {
> + /* for now, we only using in group aggregate */
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("not support execute flag(s) %d for group sort", eflags)));
> + }
>
> Instead of ereport, you should just put an Assert for the unsupported
> flag or elog.
In fact, this is an unfinished feature, BatchSort should also support these features, welcome to supplement.
> 4.
> + state = makeNode(BatchSortState);
> + state->ps.plan = (Plan*) node;
> + state->ps.state = estate;
> + state->ps.ExecProcNode = ExecBatchSortPrepare;
>
> I think the main executor entry function should be named ExecBatchSort
> instead of ExecBatchSortPrepare, it will look more consistent with the
> other executor machinery.