Thread: UPDATE using sub selects

UPDATE using sub selects

From

NikhilS

Date:

15 March 2007, 03:06:49

Hi, <br /><br />I have coded up a patch which solves the following TODO. I will submit a patch for this soon:<br /><br
/><ul><li>UPDATE<ul><li>Allow UPDATE tab SET ROW (col, ...) = (SELECT...) <p><a
href="http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php">http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php</a></ul></ul><br
/>Thequestion is that since this enhances the UPDATE syntax, what changes and where all they need to be made with
respectto the documentation?<br clear="all" /><br />Regards,<br />Nikhils<br />-- <br />EnterpriseDB               <a
href="http://www.enterprisedb.com">http://www.enterprisedb.com</a>

Re: UPDATE using sub selects

From

Tom Lane

Date:

15 March 2007, 03:33:06

NikhilS <nikkhils@gmail.com> writes:
> I have coded up a patch which solves the following TODO. I will submit a
> patch for this soon:
>       http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php

Cool...

> The question is that since this enhances the UPDATE syntax, what changes and
> where all they need to be made with respect to the documentation?

Documentation is the very least of your worries.  What exactly is your
implementation plan?  If this were a simple or narrow fix it would
have been done already.
        regards, tom lane

Re: UPDATE using sub selects

From

NikhilS

Date:

15 March 2007, 03:52:32

Hi,

> The question is that since this enhances the UPDATE syntax, what changes and
> where all they need to be made with respect to the documentation?

Documentation is the very least of your worries. What exactly is your
implementation plan? If this were a simple or narrow fix it would
have been done already.

The implementation that I have planned is pretty similar to the way "INSERT INTO ... SELECT" has been implemented.

Along with the grammar changes in gram.y, the changes are localized in the transformUpdateStmt code path. The SELECT clause ends up becoming a subquery to the update query with the target column expressions transformed properly to include the subquery expressions. Does this sound ok?

I have tried some update-subselect variations and they seem to work. For example the case in the src/test/regress/sql/update.sql, which used to fail till now, seems to work:

UPDATE update_test SET (a,b) = (select a,b FROM update_test where c = 'foo')
WHERE a = 10;

Will try testing out some other variations too.

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

Re: UPDATE using sub selects

From

Gaetano Mendola

Date:

15 March 2007, 11:48:11

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

NikhilS wrote:

> I have tried some update-subselect variations and they seem to work. For
> example the case in the src/test/regress/sql/update.sql, which used to
> fail till now, seems to work:
> 
> UPDATE update_test SET (a,b) = (select a,b FROM update_test where c =
> 'foo')
>   WHERE a = 10;

What's the expected result if the tuple from subselect is more than 1?
I expect no update at all in case of void result set, is this the case ?



Regards
Gaetano Mendola
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF+Vx47UpzwH2SGd4RAvyVAKCGK5pC9B/lmrNjAFPGXhm5ialwSwCglM2n
DxrxWyvJASX5WSF9B8cAMas=
=AoVF
-----END PGP SIGNATURE-----

Re: UPDATE using sub selects

From

Tom Lane

Date:

15 March 2007, 23:36:52

Gaetano Mendola <mendola@bigfoot.com> writes:
> NikhilS wrote:
>> I have tried some update-subselect variations and they seem to work. For
>> example the case in the src/test/regress/sql/update.sql, which used to
>> fail till now, seems to work:
>> 
>> UPDATE update_test SET (a,b) = (select a,b FROM update_test where c =
>> 'foo')
>> WHERE a = 10;

> What's the expected result if the tuple from subselect is more than 1?

Error, per SQL99 section 7.14:
        1) If the cardinality of a <row subquery> is greater than 1 (one),           then an exception condition is
raised:cardinality violation.
 

> I expect no update at all in case of void result set, is this the case ?

No, you get nulls; it's a subquery not a join.  Per SQL99 7.1:
           c) If the <row value constructor> is a <row subquery>, then:
             i) Let R be the result of the <row subquery> and let D be the                degree of R.
            ii) If the cardinality of R is 0 (zero), then the result of the                <row value constructor> is D
nullvalues.
 
           iii) If the cardinality of R is 1 (one), then the result of the                <row value constructor> is
R.
        regards, tom lane

Re: UPDATE using sub selects

From

NikhilS

Date:

16 March 2007, 04:44:31

Hi,

> What's the expected result if the tuple from subselect is more than 1?

Error, per SQL99 section 7.14:

1) If the cardinality of a <row subquery> is greater than 1 (one),
then an exception condition is raised: cardinality violation.

> I expect no update at all in case of void result set, is this the case ?

No, you get nulls; it's a subquery not a join.  Per SQL99 7.1:

            c) If the <row value constructor> is a <row subquery>, then:

              i) Let R be the result of the <row subquery> and let D be the
                 degree of R.

             ii) If the cardinality of R is 0 (zero), then the result of the
                 <row value constructor> is D null values.

            iii) If the cardinality of R is 1 (one), then the result of the
                 <row value constructor> is R.

                        regards, tom lane

To allow both of the above to hold, I think the subselect will have to be treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar mechanism for plain selects/subselects to check and restrict their output to a single row.

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

Re: UPDATE using sub selects

From

Tom Lane

Date:

16 March 2007, 11:37:46

NikhilS <nikkhils@gmail.com> writes:
> To allow both of the above to hold, I think the subselect will have to be
> treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar
> mechanism for plain selects/subselects to check and restrict their output to
> a single row.

No.  Offhand I think you'd either need to relax EXPR_SUBLINK to allow
multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
just like EXPR_SUBLINK except for allowing multiple output columns.
The latter would probably be less likely to break other things...
        regards, tom lane

Re: UPDATE using sub selects

From

NikhilS

Date:

16 March 2007, 11:55:43

Hi,

On 3/16/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

NikhilS <nikkhils@gmail.com> writes:
> To allow both of the above to hold, I think the subselect will have to be
> treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar
> mechanism for plain selects/subselects to check and restrict their output to
> a single row.

No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow
multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
just like EXPR_SUBLINK except for allowing multiple output columns.
The latter would probably be less likely to break other things...

Yeah, was looking at EXPR_SUBLINK and its single column use case and drove to the same conclusion that inventing a new sublink type would be better too. It is indeed becoming a "not so simple and narrow fix" as you had mentioned earlier in your first response :)

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

Re: UPDATE using sub selects

From

NikhilS

Date:

30 March 2007, 08:26:38

Hi,

No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow
multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
just like EXPR_SUBLINK except for allowing multiple output columns.
The latter would probably be less likely to break other things...

Yeah, was looking at EXPR_SUBLINK and its single column use case and drove to the same conclusion that inventing a new sublink type would be better too. It is indeed becoming a "not so simple and narrow fix" as you had mentioned earlier in your first response :)

I have invented a ROWEXPR_SUBLINK type that handles multiple output columns. The trouble is that since eventually columns of the parents have to be part of the query's targetList, I am sending the entire subquery as one of the entries in that list and the targetList gets populated with entries dependent on the subquery much later via make_subplan.

This breaks code in rewriteTargetList (which expects every list entry to be of type TargetEntry), and expand_targetlist (which expects the targets to be present in attrno order, the entries added because of the subquery will not be in order as compared to normal "SET colname = expr" targets).

Is there a simpler way of doing things? Should I try generating a resjunk TargetEntry in transformUpdateStmt and have its expr point to the subquery and see if that works?

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

Re: UPDATE using sub selects

From

Tom Lane

Date:

30 March 2007, 17:46:23

NikhilS <nikkhils@gmail.com> writes:
> I have invented a ROWEXPR_SUBLINK type that handles multiple output columns.
> The trouble is that since eventually columns of the parents have to be part
> of the query's targetList, I am sending the entire subquery as one of the
> entries in that list and the targetList gets populated with entries
> dependent on the subquery much later via make_subplan.

> This breaks code in rewriteTargetList (which expects every list entry to be
> of type TargetEntry), and expand_targetlist (which expects the targets to be
> present in attrno order, the entries added because of the subquery will not
> be in order as compared to normal "SET colname = expr" targets).

Hmm.  That sounds like it would be a horrid mess.  You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree.  That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.

An alternative approach is to put the subplan into the rangetable and
use Vars to reference its outputs.  Again it's not quite clear what
drives re-execution of the subplan.  It strikes me though that an
approach like this might also serve for SQL2003's LATERAL construct,
which'd be a nice thing to support.
        regards, tom lane

Re: UPDATE using sub selects

From

NikhilS

Date:

31 March 2007, 02:15:03

Hi,

On 3/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

NikhilS <nikkhils@gmail.com> writes:
> I have invented a ROWEXPR_SUBLINK type that handles multiple output columns.
> The trouble is that since eventually columns of the parents have to be part
> of the query's targetList, I am sending the entire subquery as one of the
> entries in that list and the targetList gets populated with entries
> dependent on the subquery much later via make_subplan.

> This breaks code in rewriteTargetList (which expects every list entry to be
> of type TargetEntry), and expand_targetlist (which expects the targets to be
> present in attrno order, the entries added because of the subquery will not
> be in order as compared to normal "SET colname = expr" targets).

Hmm. That sounds like it would be a horrid mess. You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree. That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.

Yes, I have tried this already. As you suspect, it seems that the subplan does not get evaluated if its not part of the targetList at all.

An alternative approach is to put the subplan into the rangetable and
use Vars to reference its outputs. Again it's not quite clear what
drives re-execution of the subplan. It strikes me though that an
approach like this might also serve for SQL2003's LATERAL construct,
which'd be a nice thing to support.

Ok, I will try this out.

Regards,

Nikhils

--
EnterpriseDB http://www.enterprisedb.com

Re: UPDATE using sub selects

From

NikhilS

Date:

11 April 2007, 06:04:52

Hi,

Hmm. That sounds like it would be a horrid mess. You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree. That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.

Ended up using something similar to the above suggestion. I have posted the patch to -patches based on this.

An important concern was where to stick the evaluation of the subqueries so that they end up becoming subplans which are used in the execution. For this I have added a new field in the "Query" structure. This entry gets preprocessed similar to other fields of the Query from within subquery_planner.

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com