Home > mailing lists

Re: Re: parallel distinct union and aggregate support patch - Mailing list pgsql-hackers

From	bucoo@sohu.com
Subject	Re: Re: parallel distinct union and aggregate support patch
Date	November 30, 2020 20:27:11
Msg-id	202011101538306278671@sohu.com Whole thread Raw
In response to	parallel distinct union and aggregate support patch ("bucoo@sohu.com" <bucoo@sohu.com>)
List	pgsql-hackers

Tree view

> 1.

> +#define BATCH_SORT_MAX_BATCHES 512

> Did you decide this number based on some experiment or is there some

> analysis behind selecting this number?

When there are too few batches, if a certain process works too slowly, it will cause unbalanced load.

When there are too many batches, FD will open and close files frequently.

> 2.

> +BatchSortState* ExecInitBatchSort(BatchSort *node, EState *estate, int eflags)

> +{

> + BatchSortState *state;

> + TypeCacheEntry *typentry;

> ....

> + for (i=0;i<node->numGroupCols;++i)

> + {

> ...

> + InitFunctionCallInfoData(*fcinfo, flinfo, 1, attr->attcollation, NULL, NULL);

> + fcinfo->args[0].isnull = false;

> + state->groupFuns = lappend(state->groupFuns, fcinfo);

> + }

> From the variable naming, it appeared like the batch sort is dependent

> upon the grouping node. I think instead of using the name

> numGroupCols and groupFuns we need to use names that are more relevant

> to the batch sort something like numSortKey.

Not all data types support both sorting and hashing calculations, such as user-defined data types.

We do not need all columns to support hash calculation when we batch, so I used two variables.

> 3.

> + if (eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))

> + {

> + /* for now, we only using in group aggregate */

> + ereport(ERROR,

> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

> + errmsg("not support execute flag(s) %d for group sort", eflags)));

> + }

> Instead of ereport, you should just put an Assert for the unsupported

> flag or elog.

In fact, this is an unfinished feature, BatchSort should also support these features, welcome to supplement.

> 4.

> + state = makeNode(BatchSortState);

> + state->ps.plan = (Plan*) node;

> + state->ps.state = estate;

> + state->ps.ExecProcNode = ExecBatchSortPrepare;

> I think the main executor entry function should be named ExecBatchSort

> instead of ExecBatchSortPrepare, it will look more consistent with the

> other executor machinery.

The job of the ExecBatchSortPrepare function is to preprocess the data (batch and pre-sort),

and when its work ends, it will call "ExecSetExecProcNode(pstate, ExecBatchSort)" to return the data to the ExecBatchSort function.

There is another advantage of dividing into two functions,

It is not necessary to judge whether tuplesort is now available every time the function is processed to improve the subtle performance.

And I think this code is clearer.

pgsql-hackers by date:

From: "Bossart, Nathan"
Date: 30 November 2020, 20:17:41
Subject: Re: A few new options for CHECKPOINT

From: Alexey Kondratov
Date: 30 November 2020, 20:34:21
Subject: Re: Notes on physical replica failover with logical publisher or subscriber

Re: Re: parallel distinct union and aggregate support patch - Mailing list pgsql-hackers

Previous

Next