Thread: MERGE and parsing with prepared statements
We've used INSERT ON CONFLICT for a few years (with partitions as the target). That's also combined with prepared statements, for bulk loading. I was looking to see if we should use MERGE (probably not, but looking anyway). And came across this behavior. I'm not sure if it's any issue. CREATE TABLE CustomerAccount (CustomerId int, Balance float); PREPARE p AS MERGE INTO CustomerAccount CA USING (SELECT $1 AS CustomerId, $2 AS TransactionValue) AS T ON CA.CustomerId = T.CustomerId WHEN NOT MATCHED THEN INSERT (CustomerId, Balance) VALUES (T.CustomerId, T.TransactionValue) WHEN MATCHED THEN UPDATE SET Balance = Balance + TransactionValue; ERROR: operator does not exist: integer = text LINE 3: ON CA.CustomerId = T.CustomerId postgres: pryzbyj postgres [local] PREPARE(+0x2337be) [0x56108322e7be] postgres: pryzbyj postgres [local] PREPARE(oper+0x198) [0x56108322f1fb] postgres: pryzbyj postgres [local] PREPARE(make_op+0x7e) [0x56108322f55a] postgres: pryzbyj postgres [local] PREPARE(+0x228f2b) [0x561083223f2b] postgres: pryzbyj postgres [local] PREPARE(+0x227aa9) [0x561083222aa9] postgres: pryzbyj postgres [local] PREPARE(transformExpr+0x1c) [0x5610832227f9] postgres: pryzbyj postgres [local] PREPARE(transformMergeStmt+0x339) [0x56108322d988] postgres: pryzbyj postgres [local] PREPARE(transformStmt+0x70) [0x5610831f4071] postgres: pryzbyj postgres [local] PREPARE(+0x1fa350) [0x5610831f5350] postgres: pryzbyj postgres [local] PREPARE(transformTopLevelStmt+0x11) [0x5610831f5385] postgres: pryzbyj postgres [local] PREPARE(parse_analyze_varparams+0x5b) [0x5610831f54f4] postgres: pryzbyj postgres [local] PREPARE(pg_analyze_and_rewrite_varparams+0x38) [0x5610834bcdfe] postgres: pryzbyj postgres [local] PREPARE(PrepareQuery+0xcc) [0x561083292155] postgres: pryzbyj postgres [local] PREPARE(standard_ProcessUtility+0x4ea) [0x5610834c31a0] Why is $1 construed to be of type text ?
On Thu, 14 Jul 2022, 18:26 Justin Pryzby, <pryzby@telsasoft.com> wrote: > > We've used INSERT ON CONFLICT for a few years (with partitions as the target). > That's also combined with prepared statements, for bulk loading. > > I was looking to see if we should use MERGE (probably not, but looking anyway). > And came across this behavior. I'm not sure if it's any issue. > > CREATE TABLE CustomerAccount (CustomerId int, Balance float); > > PREPARE p AS > MERGE INTO CustomerAccount CA > USING (SELECT $1 AS CustomerId, $2 AS TransactionValue) AS T > ON CA.CustomerId = T.CustomerId > WHEN NOT MATCHED THEN > INSERT (CustomerId, Balance) > VALUES (T.CustomerId, T.TransactionValue) > WHEN MATCHED THEN > UPDATE SET Balance = Balance + TransactionValue; > > ERROR: operator does not exist: integer = text > LINE 3: ON CA.CustomerId = T.CustomerId > > Why is $1 construed to be of type text ? The select statement that generates the row type of T `(select $1 CID, $2 TxV) AS T` does not put type bounds on the input parameters, so it remains `unknown` for the scope of that subselect. Once stored into the row type, the type of that column defaults to text, as a row type should not have 'unknown'-typed columns. You'll see the same issue with other subselects that select input parameters without casts, such as `select a from (select $1 a) A where A.a = 1;`. It's a pre-existing issue that has popped up earlier, and I think it's not something we've planned to fix in backbranches. Kind regards, Matthias van de Meent
On Fri, Jul 15, 2022 at 11:25:35AM +0200, Matthias van de Meent wrote: > On Thu, 14 Jul 2022, 18:26 Justin Pryzby, <pryzby@telsasoft.com> wrote: > > > > Why is $1 construed to be of type text ? > > The select statement that generates the row type of T `(select $1 CID, > $2 TxV) AS T` does not put type bounds on the input parameters, so it > remains `unknown` for the scope of that subselect. Once stored into > the row type, the type of that column defaults to text, as a row type > should not have 'unknown'-typed columns. Thanks for looking into it. I see now that the same thing can happen with "ON CONFLICT" if used with a subselect. PREPARE p AS INSERT INTO t SELECT a FROM (SELECT $1 AS a)a ON CONFLICT (i) DO UPDATE SET i=excluded.i; ERROR: column "i" is of type integer but expression is of type text It seems a bit odd that it's impossible to use merge with prepared statements without specifically casting the source types (which I did now to continue my experiment). -- Justin
On 2022-Jul-15, Justin Pryzby wrote: > On Fri, Jul 15, 2022 at 11:25:35AM +0200, Matthias van de Meent wrote: > Thanks for looking into it. Definitely! Thanks, Matthias. > I see now that the same thing can happen with "ON CONFLICT" if used with a > subselect. > > PREPARE p AS INSERT INTO t SELECT a FROM (SELECT $1 AS a)a > ON CONFLICT (i) DO UPDATE SET i=excluded.i; > ERROR: column "i" is of type integer but expression is of type text Right, I didn't think that MERGE was doing anything peculiar in this respect. > It seems a bit odd that it's impossible to use merge with prepared statements > without specifically casting the source types (which I did now to continue my > experiment). I have no comments on this. Maybe it can be improved, but I don't know how. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2022-Jul-15, Justin Pryzby wrote: >> I see now that the same thing can happen with "ON CONFLICT" if used with a >> subselect. >> >> PREPARE p AS INSERT INTO t SELECT a FROM (SELECT $1 AS a)a >> ON CONFLICT (i) DO UPDATE SET i=excluded.i; >> ERROR: column "i" is of type integer but expression is of type text > Right, I didn't think that MERGE was doing anything peculiar in this > respect. Yeah. The current theory about this is that if we haven't assigned a type to an unknown-type parameter (or literal) that is an output column of a sub-SELECT, we will as a rule force it to text. That MERGE USING clause is a sub-SELECT, so that rule applies. There is a hoary old exception to that rule, which is that if you write INSERT INTO tab SELECT ..., $1, ... we will figure out the type of the column of "tab" that $1 is going into, and force $1 to that type instead of text. It looks like this also works in INSERT ... VALUES. You could make a case that MERGE should be equally smart, but it's not clear to me that the info is available sufficiently close by to make it reasonable to do that. It looks like the MERGE syntax has a couple of levels of indirection, which'd probably be enough to put the kibosh on that idea -- in particular, AFAICS there might not be a unique target column corresponding to a given data_source column. regards, tom lane
On Fri, Jul 15, 2022 at 11:40 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Jul-15, Justin Pryzby wrote:
> It seems a bit odd that it's impossible to use merge with prepared statements
> without specifically casting the source types (which I did now to continue my
> experiment).
I have no comments on this. Maybe it can be improved, but I don't know
how.
Not tested, but the example prepare command fails to make use of the optional data types specification. Using that should remove the need to cast the parameter placeholder within the query itself.
That said, in theory the INSERT specification of the MERGE could be used to either resolve unknowns or even forcibly convert the data types of the relation produced by the USING clause to match the actual types required for the INSERT (since that will happen at insert time anyway). This would make the UPDATE behavior slightly different than a top-level UPDATE command though, since that does not have the same context information.
David J.
On Fri, Jul 15, 2022 at 12:17:51PM -0700, David G. Johnston wrote: > On Fri, Jul 15, 2022 at 11:40 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Jul-15, Justin Pryzby wrote: > > > > > It seems a bit odd that it's impossible to use merge with prepared statements > > > without specifically casting the source types (which I did now to continue my > > > experiment). > > > > I have no comments on this. Maybe it can be improved, but I don't know > > how. > > Not tested, but the example prepare command fails to make use of the > optional data types specification. Using that should remove the need to > cast the parameter placeholder within the query itself. What optional data type specification ? > That said, in theory the INSERT specification of the MERGE could be used to > either resolve unknowns or even forcibly convert the data types of the > relation produced by the USING clause to match the actual types required > for the INSERT (since that will happen at insert time anyway). Yeah. I hadn't looked before, but just noticed this: https://www.postgresql.org/docs/devel/sql-merge.html | If the expression for any column is not of the correct data type, automatic type conversion will be attempted. That appears to be copied from the INSERT page. What does that mean, if not that data types will be resolved as needed ? Note that if I add casts to the "ON" condition, MERGE complains about the INSERT VALUES. PREPARE p AS MERGE INTO CustomerAccount CA USING (SELECT $1 AS CustomerId, $2 AS TransactionValue) AS T ON CA.CustomerId = T.CustomerId::int WHEN NOT MATCHED THEN INSERT (CustomerId, Balance) VALUES (T.CustomerId, T.TransactionValue) WHEN MATCHED THEN UPDATE SET Balance = Balance + TransactionValue; ERROR: column "customerid" is of type integer but expression is of type text LINE 7: VALUES (T.CustomerId, T.TransactionValue) postgres: pryzbyj postgres [local] PREPARE(transformAssignedExpr+0x3b6) [0x5605f9699e8e] postgres: pryzbyj postgres [local] PREPARE(transformInsertRow+0x2ce) [0x5605f9653a47] postgres: pryzbyj postgres [local] PREPARE(transformMergeStmt+0x7ec) [0x5605f968fe3b] postgres: pryzbyj postgres [local] PREPARE(transformStmt+0x70) [0x5605f9656071] postgres: pryzbyj postgres [local] PREPARE(+0x1fa350) [0x5605f9657350] postgres: pryzbyj postgres [local] PREPARE(transformTopLevelStmt+0x11) [0x5605f9657385] postgres: pryzbyj postgres [local] PREPARE(parse_analyze_varparams+0x5b) [0x5605f96574f4] postgres: pryzbyj postgres [local] PREPARE(pg_analyze_and_rewrite_varparams+0x38) [0x5605f991edfe] postgres: pryzbyj postgres [local] PREPARE(PrepareQuery+0xcc) [0x5605f96f4155] postgres: pryzbyj postgres [local] PREPARE(standard_ProcessUtility+0x4ea) [0x5605f99251a0] postgres: pryzbyj postgres [local] PREPARE(ProcessUtility+0xdb) [0x5605f992587e] -- Justin
On Fri, Jul 15, 2022 at 12:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Jul 15, 2022 at 12:17:51PM -0700, David G. Johnston wrote:
> On Fri, Jul 15, 2022 at 11:40 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > On 2022-Jul-15, Justin Pryzby wrote:
> >
> > > It seems a bit odd that it's impossible to use merge with prepared statements
> > > without specifically casting the source types (which I did now to continue my
> > > experiment).
> >
> > I have no comments on this. Maybe it can be improved, but I don't know
> > how.
>
> Not tested, but the example prepare command fails to make use of the
> optional data types specification. Using that should remove the need to
> cast the parameter placeholder within the query itself.
What optional data type specification ?
The one documented here:
PREPARE name [ ( data_type [, ...] ) ] AS statement
David J.
On Fri, Jul 15, 2022 at 12:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
That appears to be copied from the INSERT page.
What does that mean, if not that data types will be resolved as needed ?
Yep, and the system needs to resolve the type at a point where there is no contextual information and so it chooses text.
Note that if I add casts to the "ON" condition, MERGE complains about the
INSERT VALUES.
PREPARE p AS
MERGE INTO CustomerAccount CA
USING (SELECT $1 AS CustomerId, $2 AS TransactionValue) AS T
ON CA.CustomerId = T.CustomerId::int
WHEN NOT MATCHED THEN
INSERT (CustomerId, Balance)
VALUES (T.CustomerId, T.TransactionValue)
WHEN MATCHED THEN
UPDATE SET Balance = Balance + TransactionValue;
ERROR: column "customerid" is of type integer but expression is of type text
LINE 7: VALUES (T.CustomerId, T.TransactionValue)
Noted. Not surprised. That error was always present, it's just that the join happens first. Since your fix narrowly targeted the join this error remained to be discovered.
David J.
On Fri, Jul 15, 2022 at 12:59:34PM -0700, David G. Johnston wrote: > On Fri, Jul 15, 2022 at 12:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > | If the expression for any column is not of the correct data type, automatic type conversion will be attempted. > > That appears to be copied from the INSERT page. > > What does that mean, if not that data types will be resolved as needed ? > > Yep, and the system needs to resolve the type at a point where there is no > contextual information and so it chooses text. I don't know if that's a viable interpretation. The parser "resolved the type" by assuming a default, which caused an error before finishing parsing. In the case of INSERT, "conversion will be attempted" means it looks for a cast from the source type to the target type, and its "automatic type conversion" will fail if (for example) you try to insert a timestamp into an int. In the case of MERGE, that same sentence evidently means that it assumes a default source type (rather than looking for a cast) and then fails if it doesn't exactly match the target type. Should that sentence be removed from MERGE ? -- Justin
On Fri, Jul 15, 2022 at 03:43:41PM -0500, Justin Pryzby wrote: > Should that sentence be removed from MERGE ? Also, I think these examples should be more similar. doc/src/sgml/ref/merge.sgml > <programlisting> > MERGE INTO CustomerAccount CA > USING RecentTransactions T > ON T.CustomerId = CA.CustomerId > WHEN MATCHED THEN > UPDATE SET Balance = Balance + TransactionValue > WHEN NOT MATCHED THEN > INSERT (CustomerId, Balance) > VALUES (T.CustomerId, T.TransactionValue); > </programlisting> > </para> > > <para> > Notice that this would be exactly equivalent to the following > statement because the <literal>MATCHED</literal> result does not change > during execution. > > <programlisting> > MERGE INTO CustomerAccount CA > USING (Select CustomerId, TransactionValue From RecentTransactions) AS T > ON CA.CustomerId = T.CustomerId > WHEN NOT MATCHED THEN > INSERT (CustomerId, Balance) > VALUES (T.CustomerId, T.TransactionValue) > WHEN MATCHED THEN > UPDATE SET Balance = Balance + TransactionValue; > </programlisting> > </para> The "ON" lines can be the same. The "MATCHED" can be in the same order. -- Justin
On Fri, 12 Aug 2022 at 12:20, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2022-Jul-18, Justin Pryzby wrote: > > > On Fri, Jul 15, 2022 at 03:43:41PM -0500, Justin Pryzby wrote: > > > Should that sentence be removed from MERGE ? > > > > Also, I think these examples should be more similar. > > Agreed, done. Sorry, but I disagree with this chunk in the latest commit, specifically, changing the MATCHED from after to before the NOT MATCHED clause. The whole point of the second example was to demonstrate that the order of the MATCHED/NOT MATCHED clauses made no difference. By changing the examples so they are the same, the sentence at line 573 now makes no sense. -- Simon Riggs http://www.EnterpriseDB.com/
On 2022-Aug-12, Simon Riggs wrote: > Sorry, but I disagree with this chunk in the latest commit, > specifically, changing the MATCHED from after to before the NOT > MATCHED clause. > > The whole point of the second example was to demonstrate that the > order of the MATCHED/NOT MATCHED clauses made no difference. > > By changing the examples so they are the same, the sentence at line > 573 now makes no sense. Hmm, I thought the point of the example was to show that you can replace the table in the USING clause with a query that retrieves the column; but you're right, we lost the thing there. Maybe it was too subtle to the point that I failed to understand it. Perhaps we can put it back the way it was and explain these two differences (change of data source *and* clause ordering) more explicitly. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La virtud es el justo medio entre dos defectos" (Aristóteles)
On Fri, Aug 12, 2022 at 01:53:25PM +0200, Alvaro Herrera wrote: > On 2022-Aug-12, Simon Riggs wrote: > > > Sorry, but I disagree with this chunk in the latest commit, > > specifically, changing the MATCHED from after to before the NOT > > MATCHED clause. 3d895bc84 also moved a semicolon into the middle of the sql statement. > > The whole point of the second example was to demonstrate that the > > order of the MATCHED/NOT MATCHED clauses made no difference. > > > > By changing the examples so they are the same, the sentence at line > > 573 now makes no sense. > > Hmm, I thought the point of the example was to show that you can replace > the table in the USING clause with a query that retrieves the column; > but you're right, we lost the thing there. Maybe it was too subtle to > the point that I failed to understand it. Perhaps we can put it back > the way it was and explain these two differences (change of data source > *and* clause ordering) more explicitly. Evidently I misunderstood it, too. -- Justin
On 2022-Aug-12, Simon Riggs wrote: > Sorry, but I disagree with this chunk in the latest commit, > specifically, changing the MATCHED from after to before the NOT > MATCHED clause. > > The whole point of the second example was to demonstrate that the > order of the MATCHED/NOT MATCHED clauses made no difference. > > By changing the examples so they are the same, the sentence at line > 573 now makes no sense. Not sure how to fix this. We could add put the WHEN clauses in the order they were, and add another phrase to that paragraph to explain more explicitly that the order is not relevant (while at the same time explaining that if you have multiple WHEN MATCHED clauses, the relative order of those *is* relevant). -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "At least to kernel hackers, who really are human, despite occasional rumors to the contrary" (LWN.net)