Optimize common expressions in projection evaluation - Mailing list pgsql-hackers

From Peifeng Qiu
Subject Optimize common expressions in projection evaluation
Date
Msg-id CAPH51beXmxKzjS2mF_ifrBT1UY=JcWbU9RHZWwpnfhwiaCeGZQ@mail.gmail.com
Whole thread Raw
Responses Re: Optimize common expressions in projection evaluation
List pgsql-hackers
Hi hackers.

When a star(*) expands into multiple fields, our current
implementation is to generate multiple copies of the expression
and do FieldSelects. This is very inefficient because the same
expression get evaluated multiple times but actually we only need
to do it once. This is stated in ExpandRowReference().

For example:
CREATE TABLE tbl(c1 int, c2 int, c3 int);
CREATE TABLE src(v text);
CREATE FUNCTION expensive_func(input text) RETURNS t;
INSERT INTO tbl SELECT (expensive_func(v)).* FROM src;

This is effectively the same as:
INSERT INTO tbl SELECT (expensive_func(v)).c1,
(expensive_func(v)).c2, (expensive_func(v)).c3 FROM src;

In this form, expensive_func will be evaluated for every column in
tbl. If tbl has hundreds of columns we are in trouble. To partially
solve this issue, when doing projection in ExecBuildProjectionInfo,
instead of generating normal steps for FieldSelects one by one, we
can group them by the expression(arg of FieldSelect node). Then
evaluate the epxression once to get a HeapTuple, deform it into
fields, and then assign needed fields in one step. I've attached
patch that introduce EEOP_FIELD_MULTI_SELECT_ASSIGN for this.

With this patch, the following query should generate only one
NOTICE, instead of 3.

CREATE TYPE proj_type AS (a int, b int, c text);
CREATE OR REPLACE FUNCTION proj_type_func1(input text)
RETURNS proj_type AS $$
BEGIN
    RAISE NOTICE 'proj_type_func called';
    RETURN ROW(1, 2, input);
END
$$ IMMUTABLE LANGUAGE PLPGSQL;
CREATE TEMP TABLE stage_table(a text);
INSERT INTO stage_table VALUES('aaaa');
SELECT (proj_type_func1(a)).* FROM stage_table;

This patch is just proof of concept. Some unsolved questions I
can think of right now:
- Carry some information in FieldSelect from ExpandRowReference
to assist grouping?
- This can only handle FuncExpr as the root node of FieldSelect
arg. What about a more general expression?
- How to determine whether a common expression is safe to be
optimized this way? Any unexpcted side-effects?

Any thoughts on this approach?

Best regards,
Peifeng Qiu

Attachment

pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Introduce a new view for checkpointer related stats
Next
From: Amit Langote
Date:
Subject: Re: ExecRTCheckPerms() and many prunable partitions