Thread: Alias collision in `refresh materialized view concurrently`
Hello, we had a Customer-Report in which `refresh materialized view CONCURRENTLY` failed with: `ERROR: column reference "mv" is ambiguous` They're using `mv` as an alias for one column and this is causing a collision with an internal alias. They also made it reproducible like this: ``` create materialized view testmv as select 'asdas' mv; --ok create unique index on testmv (mv); --ok refresh materialized view testmv; --ok refresh materialized view CONCURRENTLY testmv; ---BAM! ``` ``` ERROR: column reference "mv" is ambiguous LINE 1: ...alog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE ... ^ QUERY: CREATE TEMP TABLE pg_temp_4.pg_temp_218322_2 AS SELECT mv.ctid AS tid, newdata FROM public.testmv mv FULL JOIN pg_temp_4.pg_temp_218322 newdata ON (newdata.mv OPERATOR(pg_catalog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid ``` The corresponding Code is in `matview.c` in function `refresh_by_match_merge`. With adding a prefix like `_pg_internal_` we could make collisions pretty unlikely, without intrusive changes. The appended patch does this change for the aliases `mv`, `newdata` and `newdata2`. Kind regards, Mathis
Attachment
On Wed, May 19, 2021 at 5:33 PM Mathis Rudolf <mathis.rudolf@credativ.de> wrote: > > Hello, > > we had a Customer-Report in which `refresh materialized view > CONCURRENTLY` failed with: `ERROR: column reference "mv" is ambiguous` > > They're using `mv` as an alias for one column and this is causing a > collision with an internal alias. They also made it reproducible like this: > ``` > create materialized view testmv as select 'asdas' mv; --ok > create unique index on testmv (mv); --ok > refresh materialized view testmv; --ok > refresh materialized view CONCURRENTLY testmv; ---BAM! > ``` > > ``` > ERROR: column reference "mv" is ambiguous > LINE 1: ...alog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE ... > ^ > QUERY: CREATE TEMP TABLE pg_temp_4.pg_temp_218322_2 AS SELECT mv.ctid > AS tid, newdata FROM public.testmv mv FULL JOIN pg_temp_4.pg_temp_218322 > newdata ON (newdata.mv OPERATOR(pg_catalog.=) mv.mv AND newdata > OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid > ``` > > The corresponding Code is in `matview.c` in function > `refresh_by_match_merge`. With adding a prefix like `_pg_internal_` we > could make collisions pretty unlikely, without intrusive changes. > > The appended patch does this change for the aliases `mv`, `newdata` and > `newdata2`. I think it's better to have some random name, see below. We could either use "OIDNewHeap" or "MyBackendId" to make those column names unique and almost unguessable. So, something like "pg_temp1_XXXX", "pg_temp2_XXXX" or "pg_temp3_XXXX" and so on would be better IMO. snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u", OIDOldHeap); snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId); With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Am Mittwoch, dem 19.05.2021 um 18:06 +0530 schrieb Bharath Rupireddy: > > The corresponding Code is in `matview.c` in function > > `refresh_by_match_merge`. With adding a prefix like `_pg_internal_` > > we > > could make collisions pretty unlikely, without intrusive changes. > > > > The appended patch does this change for the aliases `mv`, `newdata` > > and > > `newdata2`. > > I think it's better to have some random name, see below. We could > either use "OIDNewHeap" or "MyBackendId" to make those column names > unique and almost unguessable. So, something like "pg_temp1_XXXX", > "pg_temp2_XXXX" or "pg_temp3_XXXX" and so on would be better IMO. > > snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u", > OIDOldHeap); > snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", > MyBackendId); Hmm, it's an idea, but this can also lead to pretty random failures if you have an unlucky user who had the same idea in its generating query tool than the backend, no? Not sure if that's really better. With the current implementation of REFRESH MATERIALIZED VIEW CONCURRENTLY we always have the problem of possible collisions here, you'd never get out of this area without analyzing the whole query for such collisions. "mv" looks like a very common alias (i use it all over the time when testing or playing around with materialized views, so i'm wondering why i didn't see this issue already myself). So the risk here for such a collision looks very high. We can try to lower this risk by choosing an alias name, which is not so common. With a static alias however you get a static error condition, not something that fails here and then. -- Thanks, Bernd
On Thu, May 20, 2021 at 7:52 PM Bernd Helmle <mailings@oopsware.de> wrote: > > Am Mittwoch, dem 19.05.2021 um 18:06 +0530 schrieb Bharath Rupireddy: > > > The corresponding Code is in `matview.c` in function > > > `refresh_by_match_merge`. With adding a prefix like `_pg_internal_` > > > we > > > could make collisions pretty unlikely, without intrusive changes. > > > > > > The appended patch does this change for the aliases `mv`, `newdata` > > > and > > > `newdata2`. > > > > I think it's better to have some random name, see below. We could > > either use "OIDNewHeap" or "MyBackendId" to make those column names > > unique and almost unguessable. So, something like "pg_temp1_XXXX", > > "pg_temp2_XXXX" or "pg_temp3_XXXX" and so on would be better IMO. > > > > snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u", > > OIDOldHeap); > > snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", > > MyBackendId); > > Hmm, it's an idea, but this can also lead to pretty random failures if > you have an unlucky user who had the same idea in its generating query > tool than the backend, no? Not sure if that's really better. > > With the current implementation of REFRESH MATERIALIZED VIEW > CONCURRENTLY we always have the problem of possible collisions here, > you'd never get out of this area without analyzing the whole query for > such collisions. > > "mv" looks like a very common alias (i use it all over the time when > testing or playing around with materialized views, so i'm wondering why > i didn't see this issue already myself). So the risk here for such a > collision looks very high. We can try to lower this risk by choosing an > alias name, which is not so common. With a static alias however you get > a static error condition, not something that fails here and then. Another idea is to use random() function to generate required number of uint32 random values(refresh_by_match_merge might need 3 values to replace newdata, newdata2 and mv) and use the names like pg_temp_rmv_<<rand_no1>>, pg_temp_rmv_<<rand_no2>> and so on. This would make the name unguessable. Note that we use this in choose_dsm_implementation, dsm_impl_posix. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 20, 2021 at 09:14:45PM +0530, Bharath Rupireddy wrote: > On Thu, May 20, 2021 at 7:52 PM Bernd Helmle <mailings@oopsware.de> wrote: >> "mv" looks like a very common alias (i use it all over the time when >> testing or playing around with materialized views, so i'm wondering why >> i didn't see this issue already myself). So the risk here for such a >> collision looks very high. We can try to lower this risk by choosing an >> alias name, which is not so common. With a static alias however you get >> a static error condition, not something that fails here and then. > > Another idea is to use random() function to generate required number > of uint32 random values(refresh_by_match_merge might need 3 values to > replace newdata, newdata2 and mv) and use the names like > pg_temp_rmv_<<rand_no1>>, pg_temp_rmv_<<rand_no2>> and so on. This > would make the name unguessable. Note that we use this in > choose_dsm_implementation, dsm_impl_posix. I am not sure that I see the point of using a random() number here while the backend ID, or just the PID, would easily provide enough entropy for this internal alias. I agree that "mv" is a bad choice for this alias name. One thing that comes in mind here is to use an alias similar to what we do for dropped attributes, say ........pg.matview.%d........ where %d is the PID. This will very unlikely cause conflicts. -- Michael
Attachment
On Fri, May 21, 2021 at 6:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, May 20, 2021 at 09:14:45PM +0530, Bharath Rupireddy wrote: > > On Thu, May 20, 2021 at 7:52 PM Bernd Helmle <mailings@oopsware.de> wrote: > >> "mv" looks like a very common alias (i use it all over the time when > >> testing or playing around with materialized views, so i'm wondering why > >> i didn't see this issue already myself). So the risk here for such a > >> collision looks very high. We can try to lower this risk by choosing an > >> alias name, which is not so common. With a static alias however you get > >> a static error condition, not something that fails here and then. > > > > Another idea is to use random() function to generate required number > > of uint32 random values(refresh_by_match_merge might need 3 values to > > replace newdata, newdata2 and mv) and use the names like > > pg_temp_rmv_<<rand_no1>>, pg_temp_rmv_<<rand_no2>> and so on. This > > would make the name unguessable. Note that we use this in > > choose_dsm_implementation, dsm_impl_posix. > > I am not sure that I see the point of using a random() number here > while the backend ID, or just the PID, would easily provide enough > entropy for this internal alias. I agree that "mv" is a bad choice > for this alias name. One thing that comes in mind here is to use an > alias similar to what we do for dropped attributes, say > ........pg.matview.%d........ where %d is the PID. This will very > unlikely cause conflicts. I agree that backend ID and/or PID is enough. I'm not fully convinced with using random(). To make it more concrete, how about something like pg.matview.%d.%d (MyBackendId, MyProcPid)? If the user still sees some collisions, then IMHO, it's better to ensure that this kind of table/alias names are not generated outside of the server. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Fri, May 21, 2021 at 03:56:31PM +0530, Bharath Rupireddy wrote: > On Fri, May 21, 2021 at 6:08 AM Michael Paquier <michael@paquier.xyz> wrote: >> I am not sure that I see the point of using a random() number here >> while the backend ID, or just the PID, would easily provide enough >> entropy for this internal alias. I agree that "mv" is a bad choice >> for this alias name. One thing that comes in mind here is to use an >> alias similar to what we do for dropped attributes, say >> ........pg.matview.%d........ where %d is the PID. This will very >> unlikely cause conflicts. > > I agree that backend ID and/or PID is enough. I'm not fully convinced > with using random(). To make it more concrete, how about something > like pg.matview.%d.%d (MyBackendId, MyProcPid)? If the user still sees > some collisions, then IMHO, it's better to ensure that this kind of > table/alias names are not generated outside of the server. There is no need to have the PID if MyBackendId is enough, so after considering it I would just choose something like what I quoted above. Don't we need also to worry about the queries using newdata, newdata2 and diff as aliases? Would you like to implement a patch doing something like that? -- Michael
Attachment
On Tue, Jun 1, 2021 at 7:11 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, May 21, 2021 at 03:56:31PM +0530, Bharath Rupireddy wrote: > > On Fri, May 21, 2021 at 6:08 AM Michael Paquier <michael@paquier.xyz> wrote: > >> I am not sure that I see the point of using a random() number here > >> while the backend ID, or just the PID, would easily provide enough > >> entropy for this internal alias. I agree that "mv" is a bad choice > >> for this alias name. One thing that comes in mind here is to use an > >> alias similar to what we do for dropped attributes, say > >> ........pg.matview.%d........ where %d is the PID. This will very > >> unlikely cause conflicts. > > > > I agree that backend ID and/or PID is enough. I'm not fully convinced > > with using random(). To make it more concrete, how about something > > like pg.matview.%d.%d (MyBackendId, MyProcPid)? If the user still sees > > some collisions, then IMHO, it's better to ensure that this kind of > > table/alias names are not generated outside of the server. > > There is no need to have the PID if MyBackendId is enough, so after > considering it I would just choose something like what I quoted above. > Don't we need also to worry about the queries using newdata, newdata2 > and diff as aliases? Would you like to implement a patch doing > something like that? Sure. PSA v2 patch. We can't have "." as separator in the alias names, so I used "_" instead. I used MyProcPid which seems more random than MyBackendId (which is just a number like 1,2,3...). Even with this, someone could argue that they can look at the backend PID, use it in the materialized view names just to trick the server. I'm not sure if anyone would want to do this. I used the existing function make_temptable_name_n to prepare the alias names. The advantage of this is that the code looks cleaner, but it leaks memory, 1KB string for each call of the function. This is also true with the existing usage of the function. Now, we will have 5 make_temptable_name_n function calls leaking 5KB memory. And we also have quote_qualified_identifier leaking memory, 2 function calls, 2KB. So, in total, these two functions will leak 7KB of memory (with the patch). Shall I pfree the memory for all the strings returned by the functions make_temptable_name_n and quote_qualified_identifier? The problem is that pfree isn't cheaper. Or shall we leave it as is so that the memory will be freed up by the context? Note I have not added tests for this, as the code is covered by the existing tests. With Regards, Bharath Rupireddy.
Attachment
Am Dienstag, dem 01.06.2021 um 13:13 +0530 schrieb Bharath Rupireddy: > I used MyProcPid which seems more random than MyBackendId (which is > just a number like 1,2,3...). Even with this, someone could argue > that > they can look at the backend PID, use it in the materialized view > names just to trick the server. I'm not sure if anyone would want to > do this. > > A generated query likely uses just an incremented value derived from somewhere and in my opinion 1,2,3 makes it more likely that you get a chance for collisions if you managed to get the same alias prefix somehow. So +1 with the MyProcPid... > I used the existing function make_temptable_name_n to prepare the > alias names. The advantage of this is that the code looks cleaner, > but > it leaks memory, 1KB string for each call of the function. This is > also true with the existing usage of the function. Now, we will have > 5 > make_temptable_name_n function calls leaking 5KB memory. And we also > have quote_qualified_identifier leaking memory, 2 function calls, > 2KB. > So, in total, these two functions will leak 7KB of memory (with the > patch). > > Shall I pfree the memory for all the strings returned by the > functions > make_temptable_name_n and quote_qualified_identifier? The problem is > that pfree isn't cheaper. > Or shall we leave it as is so that the memory will be freed up by the > context? > afaics the memory context is deleted after execution immediately, so i'd assume it's okay. -- Thanks, Bernd
On Tue, Jun 1, 2021 at 5:24 PM Bernd Helmle <mailings@oopsware.de> wrote: > > Am Dienstag, dem 01.06.2021 um 13:13 +0530 schrieb Bharath Rupireddy: > > I used MyProcPid which seems more random than MyBackendId (which is > > just a number like 1,2,3...). Even with this, someone could argue > > that > > they can look at the backend PID, use it in the materialized view > > names just to trick the server. I'm not sure if anyone would want to > > do this. > > > > A generated query likely uses just an incremented value derived from > somewhere and in my opinion 1,2,3 makes it more likely that you get a > chance for collisions if you managed to get the same alias prefix > somehow. So +1 with the MyProcPid... Thanks. > > I used the existing function make_temptable_name_n to prepare the > > alias names. The advantage of this is that the code looks cleaner, > > but > > it leaks memory, 1KB string for each call of the function. This is > > also true with the existing usage of the function. Now, we will have > > 5 > > make_temptable_name_n function calls leaking 5KB memory. And we also > > have quote_qualified_identifier leaking memory, 2 function calls, > > 2KB. > > So, in total, these two functions will leak 7KB of memory (with the > > patch). > > > > Shall I pfree the memory for all the strings returned by the > > functions > > make_temptable_name_n and quote_qualified_identifier? The problem is > > that pfree isn't cheaper. > > Or shall we leave it as is so that the memory will be freed up by the > > context? > > > > afaics the memory context is deleted after execution immediately, so > i'd assume it's okay. Yes, the refresh operation happens in the "PortalContext", which gets destroyed at the end of the query in PortalDrop. PSA v3 patch. I added a commit message and made some cosmetic adjustments. With Regards, Bharath Rupireddy.
Attachment
On Wed, Jun 2, 2021 at 2:02 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > PSA v3 patch. I added a commit message and made some cosmetic adjustments. Reminds me of this fun topic in Lisp: https://en.wikipedia.org/wiki/Hygienic_macro#Strategies_used_in_languages_that_lack_hygienic_macros I wondered if we could find a way to make identifiers that regular queries are prohibited from using, at least by documentation. You could take advantage of the various constraints on unquoted identifiers in the standard (for example, something involving $), but it does seem a shame to remove the ability for users to put absolutely anything except NUL in quoted identifiers. I do wonder if at least using something like _$mv would be slightly more principled than pg_mv_1234, since nothing says pg_XXX is reserved (except in some very specific places like schema names), and the number on the end seems a bit cargo-cultish.
On Wed, Jun 02, 2021 at 12:30:55PM +1200, Thomas Munro wrote: > I wondered if we could find a way to make identifiers that regular > queries are prohibited from using, at least by documentation. You > could take advantage of the various constraints on unquoted > identifiers in the standard (for example, something involving $), but > it does seem a shame to remove the ability for users to put absolutely > anything except NUL in quoted identifiers. I do wonder if at least > using something like _$mv would be slightly more principled than > pg_mv_1234, since nothing says pg_XXX is reserved (except in some very > specific places like schema names), and the number on the end seems a > bit cargo-cultish. Yeah, using an underscore at the beginning of the name would have the advantage to mark the relation as an internal thing. + "(SELECT %s.tid FROM %s %s " + "WHERE %s.tid IS NOT NULL " + "AND %s.%s IS NULL)", Anyway, I have another problem with the patch: readability. It becomes really hard for one to guess to which object or alias portions of the internal queries refer to, especially with a total of five temporary names lying around. I think that you should drop the business with make_temptable_name_n(), and just append those extra underscores and uses of MyProcPid directly in the query string. The surroundings of quote_qualified_identifier() require two extra printf calls, but that does not sound bad to me compared to the long-term maintenance of those queries. -- Michael
Attachment
On Wed, Jun 2, 2021 at 6:33 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jun 02, 2021 at 12:30:55PM +1200, Thomas Munro wrote: > > I wondered if we could find a way to make identifiers that regular > > queries are prohibited from using, at least by documentation. You > > could take advantage of the various constraints on unquoted > > identifiers in the standard (for example, something involving $), but > > it does seem a shame to remove the ability for users to put absolutely > > anything except NUL in quoted identifiers. I do wonder if at least > > using something like _$mv would be slightly more principled than > > pg_mv_1234, since nothing says pg_XXX is reserved (except in some very > > specific places like schema names), and the number on the end seems a > > bit cargo-cultish. > > Yeah, using an underscore at the beginning of the name would have the > advantage to mark the relation as an internal thing. > > + "(SELECT %s.tid FROM %s %s " > + "WHERE %s.tid IS NOT NULL " > + "AND %s.%s IS NULL)", > Anyway, I have another problem with the patch: readability. It > becomes really hard for one to guess to which object or alias portions > of the internal queries refer to, especially with a total of five > temporary names lying around. I think that you should drop the > business with make_temptable_name_n(), and just append those extra > underscores and uses of MyProcPid directly in the query string. The > surroundings of quote_qualified_identifier() require two extra printf > calls, but that does not sound bad to me compared to the long-term > maintenance of those queries. Thanks. PSA v4. With Regards, Bharath Rupireddy.
Attachment
On Wed, Jun 02, 2021 at 10:53:22AM +0530, Bharath Rupireddy wrote: > Thanks. PSA v4. Thanks for the new version. + MyProcPid, tempname, MyProcPid, MyProcPid, + tempname, MyProcPid, MyProcPid, MyProcPid, + MyProcPid, MyProcPid, MyProcPid); This style is still a bit heavy-ish. Perhaps we should just come back to Thomas's suggestion and just use a prefix with _$ for all that. -- Michael
Attachment
On Wed, Jun 2, 2021 at 1:27 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jun 02, 2021 at 10:53:22AM +0530, Bharath Rupireddy wrote: > > Thanks. PSA v4. > > Thanks for the new version. > > + MyProcPid, tempname, MyProcPid, MyProcPid, > + tempname, MyProcPid, MyProcPid, MyProcPid, > + MyProcPid, MyProcPid, MyProcPid); > This style is still a bit heavy-ish. Perhaps we should just come back > to Thomas's suggestion and just use a prefix with _$ for all that. Thanks.The changes with that approach are very minimal. PSA v5 and let me know if any more changes are needed. With Regards, Bharath Rupireddy.
Attachment
On Wed, Jun 02, 2021 at 03:44:45PM +0530, Bharath Rupireddy wrote: > Thanks.The changes with that approach are very minimal. PSA v5 and let > me know if any more changes are needed. Simple enough, so applied and back-patched. It took 8 years for somebody to complain about the current aliases, so that should be enough to get us close to zero conflicts now. I have looked a bit to see if anybody would use this naming convention, but could not find a trace, FWIW. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Wed, Jun 02, 2021 at 03:44:45PM +0530, Bharath Rupireddy wrote: >> Thanks.The changes with that approach are very minimal. PSA v5 and let >> me know if any more changes are needed. > Simple enough, so applied and back-patched. I just came across this issue while preparing the release notes. ISTM that people have expended a great deal of effort on a fundamentally unreliable solution, when a reliable one is easily available. The originally complained-of issue was that a user-chosen column name could collide with the query-chosen table name: ERROR: column reference "mv" is ambiguous LINE 1: ...alog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE ... This is true, but it's self-inflicted damage, because all you have to do is write the query so that mv is clearly a table name: ... mv.mv AND newdata.* OPERATOR(pg_catalog.*=) mv.*) WHERE ... AFAICT that works and generates the identical parse tree to the original coding. The only place touched by the patch where it's a bit difficult to make the syntax unambiguous this way is "CREATE TEMP TABLE %s AS " "SELECT _$mv.ctid AS tid, _$newdata " "FROM %s _$mv FULL JOIN %s _$newdata ON (", because newdata.* would ordinarily get expanded to multiple columns if it's at the top level of a SELECT list, and that's not what we want. However, that's easily fixed using the same hack as in ruleutils.c's get_variable: add a no-op cast to the table's rowtype. So this would look like appendStringInfo(&querybuf, "CREATE TEMP TABLE %s AS " "SELECT mv.ctid AS tid, newdata.*::%s " "FROM %s mv FULL JOIN %s newdata ON (", diffname, matviewname, matviewname, tempname); Given that it took this long to notice the problem at all, maybe this is not a fix to cram in on the weekend before the release wrap. But I don't see why we need to settle for "mostly works" when "always works" is barely any harder. regards, tom lane
I wrote: > I just came across this issue while preparing the release notes. > ISTM that people have expended a great deal of effort on a fundamentally > unreliable solution, when a reliable one is easily available. Concretely, I propose the attached. regards, tom lane diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index 9493b227b4..512b00bc65 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -537,9 +537,12 @@ transientrel_destroy(DestReceiver *self) /* * Given a qualified temporary table name, append an underscore followed by * the given integer, to make a new table name based on the old one. + * The result is a palloc'd string. * - * This leaks memory through palloc(), which won't be cleaned up until the - * current memory context is freed. + * As coded, this would fail to make a valid SQL name if the given name were, + * say, "FOO"."BAR". Currently, the table name portion of the input will + * never be double-quoted because it's of the form "pg_temp_NNN", cf + * make_new_heap(). But we might have to work harder someday. */ static char * make_temptable_name_n(char *tempname, int n) @@ -627,16 +630,20 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, * that in a way that allows showing the first duplicated row found. Even * after we pass this test, a unique index on the materialized view may * find a duplicate key problem. + * + * Note: here and below, we use "tablename.*::tablerowtype" as a hack to + * keep ".*" from being expanded into multiple columns in a SELECT list. + * Compare ruleutils.c's get_variable(). */ resetStringInfo(&querybuf); appendStringInfo(&querybuf, - "SELECT _$newdata FROM %s _$newdata " - "WHERE _$newdata IS NOT NULL AND EXISTS " - "(SELECT 1 FROM %s _$newdata2 WHERE _$newdata2 IS NOT NULL " - "AND _$newdata2 OPERATOR(pg_catalog.*=) _$newdata " - "AND _$newdata2.ctid OPERATOR(pg_catalog.<>) " - "_$newdata.ctid)", - tempname, tempname); + "SELECT newdata.*::%s FROM %s newdata " + "WHERE newdata.* IS NOT NULL AND EXISTS " + "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL " + "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* " + "AND newdata2.ctid OPERATOR(pg_catalog.<>) " + "newdata.ctid)", + tempname, tempname, tempname); if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT) elog(ERROR, "SPI_exec failed: %s", querybuf.data); if (SPI_processed > 0) @@ -663,9 +670,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, resetStringInfo(&querybuf); appendStringInfo(&querybuf, "CREATE TEMP TABLE %s AS " - "SELECT _$mv.ctid AS tid, _$newdata " - "FROM %s _$mv FULL JOIN %s _$newdata ON (", - diffname, matviewname, tempname); + "SELECT mv.ctid AS tid, newdata.*::%s AS newdata " + "FROM %s mv FULL JOIN %s newdata ON (", + diffname, tempname, matviewname, tempname); /* * Get the list of index OIDs for the table from the relcache, and look up @@ -757,9 +764,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, if (foundUniqueIndex) appendStringInfoString(&querybuf, " AND "); - leftop = quote_qualified_identifier("_$newdata", + leftop = quote_qualified_identifier("newdata", NameStr(attr->attname)); - rightop = quote_qualified_identifier("_$mv", + rightop = quote_qualified_identifier("mv", NameStr(attr->attname)); generate_operator_clause(&querybuf, @@ -787,8 +794,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, Assert(foundUniqueIndex); appendStringInfoString(&querybuf, - " AND _$newdata OPERATOR(pg_catalog.*=) _$mv) " - "WHERE _$newdata IS NULL OR _$mv IS NULL " + " AND newdata.* OPERATOR(pg_catalog.*=) mv.*) " + "WHERE newdata.* IS NULL OR mv.* IS NULL " "ORDER BY tid"); /* Create the temporary "diff" table. */ @@ -814,10 +821,10 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, /* Deletes must come before inserts; do them first. */ resetStringInfo(&querybuf); appendStringInfo(&querybuf, - "DELETE FROM %s _$mv WHERE ctid OPERATOR(pg_catalog.=) ANY " - "(SELECT _$diff.tid FROM %s _$diff " - "WHERE _$diff.tid IS NOT NULL " - "AND _$diff._$newdata IS NULL)", + "DELETE FROM %s mv WHERE ctid OPERATOR(pg_catalog.=) ANY " + "(SELECT diff.tid FROM %s diff " + "WHERE diff.tid IS NOT NULL " + "AND diff.newdata IS NULL)", matviewname, diffname); if (SPI_exec(querybuf.data, 0) != SPI_OK_DELETE) elog(ERROR, "SPI_exec failed: %s", querybuf.data); @@ -825,8 +832,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, /* Inserts go last. */ resetStringInfo(&querybuf); appendStringInfo(&querybuf, - "INSERT INTO %s SELECT (_$diff._$newdata).* " - "FROM %s _$diff WHERE tid IS NULL", + "INSERT INTO %s SELECT (diff.newdata).* " + "FROM %s diff WHERE tid IS NULL", matviewname, diffname); if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT) elog(ERROR, "SPI_exec failed: %s", querybuf.data); diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out index 4b3a2e0cb7..313c72a268 100644 --- a/src/test/regress/expected/matview.out +++ b/src/test/regress/expected/matview.out @@ -551,7 +551,15 @@ NOTICE: drop cascades to materialized view mvtest_mv_v -- make sure running as superuser works when MV owned by another role (bug #11208) CREATE ROLE regress_user_mvtest; SET ROLE regress_user_mvtest; -CREATE TABLE mvtest_foo_data AS SELECT i, md5(random()::text) +-- this test case also checks for ambiguity in the queries issued by +-- refresh_by_match_merge(), by choosing column names that intentionally +-- duplicate all the aliases used in those queries +CREATE TABLE mvtest_foo_data AS SELECT i, + i+1 AS tid, + md5(random()::text) AS mv, + md5(random()::text) AS newdata, + md5(random()::text) AS newdata2, + md5(random()::text) AS diff FROM generate_series(1, 10) i; CREATE MATERIALIZED VIEW mvtest_mv_foo AS SELECT * FROM mvtest_foo_data; CREATE MATERIALIZED VIEW mvtest_mv_foo AS SELECT * FROM mvtest_foo_data; diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql index 4a4bd0d6b6..68b9ccfd45 100644 --- a/src/test/regress/sql/matview.sql +++ b/src/test/regress/sql/matview.sql @@ -211,7 +211,15 @@ DROP TABLE mvtest_v CASCADE; -- make sure running as superuser works when MV owned by another role (bug #11208) CREATE ROLE regress_user_mvtest; SET ROLE regress_user_mvtest; -CREATE TABLE mvtest_foo_data AS SELECT i, md5(random()::text) +-- this test case also checks for ambiguity in the queries issued by +-- refresh_by_match_merge(), by choosing column names that intentionally +-- duplicate all the aliases used in those queries +CREATE TABLE mvtest_foo_data AS SELECT i, + i+1 AS tid, + md5(random()::text) AS mv, + md5(random()::text) AS newdata, + md5(random()::text) AS newdata2, + md5(random()::text) AS diff FROM generate_series(1, 10) i; CREATE MATERIALIZED VIEW mvtest_mv_foo AS SELECT * FROM mvtest_foo_data; CREATE MATERIALIZED VIEW mvtest_mv_foo AS SELECT * FROM mvtest_foo_data;
On Fri, Aug 06, 2021 at 10:48:40AM -0400, Tom Lane wrote: > AFAICT that works and generates the identical parse tree to the original > coding. The only place touched by the patch where it's a bit difficult to > make the syntax unambiguous this way is > > "CREATE TEMP TABLE %s AS " > "SELECT _$mv.ctid AS tid, _$newdata " > "FROM %s _$mv FULL JOIN %s _$newdata ON (", > > because newdata.* would ordinarily get expanded to multiple columns > if it's at the top level of a SELECT list, and that's not what we want. > However, that's easily fixed using the same hack as in ruleutils.c's > get_variable: add a no-op cast to the table's rowtype. So this > would look like > > appendStringInfo(&querybuf, > "CREATE TEMP TABLE %s AS " > "SELECT mv.ctid AS tid, newdata.*::%s " > "FROM %s mv FULL JOIN %s newdata ON (", > diffname, matviewname, matviewname, tempname); Smart piece. I haven't thought of that. > Given that it took this long to notice the problem at all, maybe > this is not a fix to cram in on the weekend before the release wrap. > But I don't see why we need to settle for "mostly works" when > "always works" is barely any harder. Yes, I would vote to delay that for a couple of days. That's not worth taking a risk for. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Fri, Aug 06, 2021 at 10:48:40AM -0400, Tom Lane wrote: >> Given that it took this long to notice the problem at all, maybe >> this is not a fix to cram in on the weekend before the release wrap. >> But I don't see why we need to settle for "mostly works" when >> "always works" is barely any harder. > Yes, I would vote to delay that for a couple of days. That's not > worth taking a risk for. I went ahead and created the patch, including test case, and it seems fine. So I'm leaning towards pushing that tomorrow. Mainly because I don't want to have to document "we partially fixed this" in this release set and then "we really fixed it" three months from now. regards, tom lane