Thread: UPDATE using sub selects
Hi, <br /><br />I have coded up a patch which solves the following TODO. I will submit a patch for this soon:<br /><br /><ul><li>UPDATE<ul><li>Allow UPDATE tab SET ROW (col, ...) = (SELECT...) <p><a href="http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php">http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php</a></ul></ul><br />Thequestion is that since this enhances the UPDATE syntax, what changes and where all they need to be made with respectto the documentation?<br clear="all" /><br />Regards,<br />Nikhils<br />-- <br />EnterpriseDB <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a>
NikhilS <nikkhils@gmail.com> writes: > I have coded up a patch which solves the following TODO. I will submit a > patch for this soon: > http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php Cool... > The question is that since this enhances the UPDATE syntax, what changes and > where all they need to be made with respect to the documentation? Documentation is the very least of your worries. What exactly is your implementation plan? If this were a simple or narrow fix it would have been done already. regards, tom lane
Hi,
The implementation that I have planned is pretty similar to the way "INSERT INTO ... SELECT" has been implemented.
Along with the grammar changes in gram.y, the changes are localized in the transformUpdateStmt code path. The SELECT clause ends up becoming a subquery to the update query with the target column expressions transformed properly to include the subquery expressions. Does this sound ok?
I have tried some update-subselect variations and they seem to work. For example the case in the src/test/regress/sql/update.sql, which used to fail till now, seems to work:
UPDATE update_test SET (a,b) = (select a,b FROM update_test where c = 'foo')
WHERE a = 10;
Will try testing out some other variations too.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
> The question is that since this enhances the UPDATE syntax, what changes and
> where all they need to be made with respect to the documentation?
Documentation is the very least of your worries. What exactly is your
implementation plan? If this were a simple or narrow fix it would
have been done already.
The implementation that I have planned is pretty similar to the way "INSERT INTO ... SELECT" has been implemented.
Along with the grammar changes in gram.y, the changes are localized in the transformUpdateStmt code path. The SELECT clause ends up becoming a subquery to the update query with the target column expressions transformed properly to include the subquery expressions. Does this sound ok?
I have tried some update-subselect variations and they seem to work. For example the case in the src/test/regress/sql/update.sql, which used to fail till now, seems to work:
UPDATE update_test SET (a,b) = (select a,b FROM update_test where c = 'foo')
WHERE a = 10;
Will try testing out some other variations too.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NikhilS wrote: > I have tried some update-subselect variations and they seem to work. For > example the case in the src/test/regress/sql/update.sql, which used to > fail till now, seems to work: > > UPDATE update_test SET (a,b) = (select a,b FROM update_test where c = > 'foo') > WHERE a = 10; What's the expected result if the tuple from subselect is more than 1? I expect no update at all in case of void result set, is this the case ? Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF+Vx47UpzwH2SGd4RAvyVAKCGK5pC9B/lmrNjAFPGXhm5ialwSwCglM2n DxrxWyvJASX5WSF9B8cAMas= =AoVF -----END PGP SIGNATURE-----
Gaetano Mendola <mendola@bigfoot.com> writes: > NikhilS wrote: >> I have tried some update-subselect variations and they seem to work. For >> example the case in the src/test/regress/sql/update.sql, which used to >> fail till now, seems to work: >> >> UPDATE update_test SET (a,b) = (select a,b FROM update_test where c = >> 'foo') >> WHERE a = 10; > What's the expected result if the tuple from subselect is more than 1? Error, per SQL99 section 7.14: 1) If the cardinality of a <row subquery> is greater than 1 (one), then an exception condition is raised:cardinality violation. > I expect no update at all in case of void result set, is this the case ? No, you get nulls; it's a subquery not a join. Per SQL99 7.1: c) If the <row value constructor> is a <row subquery>, then: i) Let R be the result of the <row subquery> and let D be the degree of R. ii) If the cardinality of R is 0 (zero), then the result of the <row value constructor> is D nullvalues. iii) If the cardinality of R is 1 (one), then the result of the <row value constructor> is R. regards, tom lane
Hi,
To allow both of the above to hold, I think the subselect will have to be treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar mechanism for plain selects/subselects to check and restrict their output to a single row.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
> What's the expected result if the tuple from subselect is more than 1?
Error, per SQL99 section 7.14:
1) If the cardinality of a <row subquery> is greater than 1 (one),
then an exception condition is raised: cardinality violation.
> I expect no update at all in case of void result set, is this the case ?
No, you get nulls; it's a subquery not a join. Per SQL99 7.1:
c) If the <row value constructor> is a <row subquery>, then:
i) Let R be the result of the <row subquery> and let D be the
degree of R.
ii) If the cardinality of R is 0 (zero), then the result of the
<row value constructor> is D null values.
iii) If the cardinality of R is 1 (one), then the result of the
<row value constructor> is R.
regards, tom lane
To allow both of the above to hold, I think the subselect will have to be treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar mechanism for plain selects/subselects to check and restrict their output to a single row.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
NikhilS <nikkhils@gmail.com> writes: > To allow both of the above to hold, I think the subselect will have to be > treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar > mechanism for plain selects/subselects to check and restrict their output to > a single row. No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow multiple output columns, or invent a ROW_SUBLINK SubLinkType that is just like EXPR_SUBLINK except for allowing multiple output columns. The latter would probably be less likely to break other things... regards, tom lane
Hi,
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
On 3/16/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, was looking at EXPR_SUBLINK and its single column use case and drove to the same conclusion that inventing a new sublink type would be better too. It is indeed becoming a "not so simple and narrow fix" as you had mentioned earlier in your first response :) NikhilS <nikkhils@gmail.com> writes:
> To allow both of the above to hold, I think the subselect will have to be
> treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar
> mechanism for plain selects/subselects to check and restrict their output to
> a single row.
No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow
multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
just like EXPR_SUBLINK except for allowing multiple output columns.
The latter would probably be less likely to break other things...
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Hi,
I have invented a ROWEXPR_SUBLINK type that handles multiple output columns. The trouble is that since eventually columns of the parents have to be part of the query's targetList, I am sending the entire subquery as one of the entries in that list and the targetList gets populated with entries dependent on the subquery much later via make_subplan.
This breaks code in rewriteTargetList (which expects every list entry to be of type TargetEntry), and expand_targetlist (which expects the targets to be present in attrno order, the entries added because of the subquery will not be in order as compared to normal "SET colname = expr" targets).
Is there a simpler way of doing things? Should I try generating a resjunk TargetEntry in transformUpdateStmt and have its expr point to the subquery and see if that works?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow
multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
just like EXPR_SUBLINK except for allowing multiple output columns.
The latter would probably be less likely to break other things...
Yeah, was looking at EXPR_SUBLINK and its single column use case and drove to the same conclusion that inventing a new sublink type would be better too. It is indeed becoming a "not so simple and narrow fix" as you had mentioned earlier in your first response :)
I have invented a ROWEXPR_SUBLINK type that handles multiple output columns. The trouble is that since eventually columns of the parents have to be part of the query's targetList, I am sending the entire subquery as one of the entries in that list and the targetList gets populated with entries dependent on the subquery much later via make_subplan.
This breaks code in rewriteTargetList (which expects every list entry to be of type TargetEntry), and expand_targetlist (which expects the targets to be present in attrno order, the entries added because of the subquery will not be in order as compared to normal "SET colname = expr" targets).
Is there a simpler way of doing things? Should I try generating a resjunk TargetEntry in transformUpdateStmt and have its expr point to the subquery and see if that works?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
NikhilS <nikkhils@gmail.com> writes: > I have invented a ROWEXPR_SUBLINK type that handles multiple output columns. > The trouble is that since eventually columns of the parents have to be part > of the query's targetList, I am sending the entire subquery as one of the > entries in that list and the targetList gets populated with entries > dependent on the subquery much later via make_subplan. > This breaks code in rewriteTargetList (which expects every list entry to be > of type TargetEntry), and expand_targetlist (which expects the targets to be > present in attrno order, the entries added because of the subquery will not > be in order as compared to normal "SET colname = expr" targets). Hmm. That sounds like it would be a horrid mess. You need to decouple the execution of the subplan from the use of its outputs, apparently. There is some precedent for this in the way that InitPlans are handled: the result of the subplan is stored into a ParamList array entry that's later referenced by a Param node in the parent's expression tree. That would generalize easily enough to setting more than one Param, but I'm not clear on where you'd want to stick the subplan itself in the plan tree, nor on what controls how often it needs to get evaluated. An alternative approach is to put the subplan into the rangetable and use Vars to reference its outputs. Again it's not quite clear what drives re-execution of the subplan. It strikes me though that an approach like this might also serve for SQL2003's LATERAL construct, which'd be a nice thing to support. regards, tom lane
Hi,
--
EnterpriseDB http://www.enterprisedb.com
On 3/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
NikhilS <nikkhils@gmail.com> writes:
> I have invented a ROWEXPR_SUBLINK type that handles multiple output columns.
> The trouble is that since eventually columns of the parents have to be part
> of the query's targetList, I am sending the entire subquery as one of the
> entries in that list and the targetList gets populated with entries
> dependent on the subquery much later via make_subplan.
> This breaks code in rewriteTargetList (which expects every list entry to be
> of type TargetEntry), and expand_targetlist (which expects the targets to be
> present in attrno order, the entries added because of the subquery will not
> be in order as compared to normal "SET colname = expr" targets).
Hmm. That sounds like it would be a horrid mess. You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree. That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.
Yes, I have tried this already. As you suspect, it seems that the subplan does not get evaluated if its not part of the targetList at all.
An alternative approach is to put the subplan into the rangetable and
use Vars to reference its outputs. Again it's not quite clear what
drives re-execution of the subplan. It strikes me though that an
approach like this might also serve for SQL2003's LATERAL construct,
which'd be a nice thing to support.
Ok, I will try this out.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Hi,
Ended up using something similar to the above suggestion. I have posted the patch to -patches based on this.
An important concern was where to stick the evaluation of the subqueries so that they end up becoming subplans which are used in the execution. For this I have added a new field in the "Query" structure. This entry gets preprocessed similar to other fields of the Query from within subquery_planner.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Hmm. That sounds like it would be a horrid mess. You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree. That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.
Ended up using something similar to the above suggestion. I have posted the patch to -patches based on this.
An important concern was where to stick the evaluation of the subqueries so that they end up becoming subplans which are used in the execution. For this I have added a new field in the "Query" structure. This entry gets preprocessed similar to other fields of the Query from within subquery_planner.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com