Thread: BUG #16631: postgres_fdw tries to insert into generated columns
The following bug has been logged on the website: Bug reference: 16631 Logged by: Daniel Cherniy Email address: dacherny@gmail.com PostgreSQL version: 12.4 Operating system: Debian (as "docker pull postgres:latest") Description: Postgres FDW imports generated (stored) columns from foreign table like usual columns and tries to insert values into them instead of skipping. Steps to reproduce: create extension postgres_fdw; create server host_as_foreign foreign data wrapper postgres_fdw options (dbname 'postgres'); create user mapping for current_user server host_as_foreign; create schema src; create table src.test ( id int primary key, name text not null, name_hash char(32) generated always as (md5(name)) stored ); insert into src.test (id, name) values (1, 'Hello') returning *; +--+-----+--------------------------------+ |id|name |name_hash | +--+-----+--------------------------------+ |1 |Hello|8b1a9953c4611296a827abf8c47804d7| +--+-----+--------------------------------+ -- lets import schema create schema fgn; import foreign schema src limit to (test) from server host_as_foreign into fgn; -- and check what we've got select * from fgn.test; +--+-----+--------------------------------+ |id|name |name_hash | +--+-----+--------------------------------+ |1 |Hello|8b1a9953c4611296a827abf8c47804d7| +--+-----+--------------------------------+ -- try to insert only columns what we suppose to insert into fgn.test (id, name) values (2, 'Try to insert without generated column'); > [42601] ERROR: cannot insert into column "name_hash" > Detail: Column "name_hash" is a generated column. > Where: remote SQL command: INSERT INTO src.test(id, name, name_hash) VALUES ($1, $2, $3) -- maybe now? insert into fgn.test (id, name, name_hash) values (2, 'Try to insert with "default" value', default); > [42601] ERROR: cannot insert into column "name_hash" > Detail: Column "name_hash" is a generated column. > Where: remote SQL command: INSERT INTO src.test(id, name, name_hash) VALUES ($1, $2, $3) select version(); +----------------------------------------------------------------------------------------------------------------+ |version | +----------------------------------------------------------------------------------------------------------------+ |PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit| +----------------------------------------------------------------------------------------------------------------+
I CCed Peter E. On Thu, Sep 24, 2020 at 4:10 AM PG Bug reporting form <noreply@postgresql.org> wrote: > Postgres FDW imports generated (stored) columns from foreign table like > usual columns and tries to insert values into them instead of skipping. > > Steps to reproduce: > > create extension postgres_fdw; > create server host_as_foreign foreign data wrapper postgres_fdw options > (dbname 'postgres'); > create user mapping for current_user server host_as_foreign; > > create schema src; > create table src.test ( > id int primary key, > name text not null, > name_hash char(32) generated always as (md5(name)) stored > ); > > insert into src.test (id, name) values (1, 'Hello') returning *; > +--+-----+--------------------------------+ > |id|name |name_hash | > +--+-----+--------------------------------+ > |1 |Hello|8b1a9953c4611296a827abf8c47804d7| > +--+-----+--------------------------------+ > > -- lets import schema > create schema fgn; > import foreign schema src limit to (test) from server host_as_foreign into > fgn; > > -- and check what we've got > select * from fgn.test; > +--+-----+--------------------------------+ > |id|name |name_hash | > +--+-----+--------------------------------+ > |1 |Hello|8b1a9953c4611296a827abf8c47804d7| > +--+-----+--------------------------------+ > > -- try to insert only columns what we suppose to > insert into fgn.test (id, name) values (2, 'Try to insert without generated > column'); > > [42601] ERROR: cannot insert into column "name_hash" > > Detail: Column "name_hash" is a generated column. > > Where: remote SQL command: INSERT INTO src.test(id, name, name_hash) > VALUES ($1, $2, $3) Reproduced. Thanks for the report! I studied the handling of generated columns in foreign tables, but I’m not sure it is very well designed. This is the documentation note about it in create_foreign_table.sgml: Similar considerations apply to generated columns. Stored generated columns are computed on insert or update on the local <productname>PostgreSQL</productname> server and handed to the foreign-data wrapper for writing out to the foreign data store, but it is not enforced that a query of the foreign table returns values for stored generated columns that are consistent with the generation expression. Again, this might result in incorrect query results. I’m not sure why this is similar to the constraint case. But rather than computing the generated columns locally, I’m wondering that we should compute them remotely, assuming that the corresponding generated columns are defined on the remote sides. (It would be the user’s responsibility to ensure that.) This seems to me similar to the constraint case, and if we did so, I think we could fix the reported issue by extending postgresImportForeignSchema to support generated columns. Maybe I’m missing something, though. (Column defaults are also computed locally, but as discussed in [1], that wouldn’t be ideal, and it would be good if we fixed to compute them remotely.) While looking into this, I noticed this: create schema fgn2; import foreign schema src limit to (test) from server host_as_foreign into fgn2 options (import_default 'true'); ERROR: cannot use column reference in DEFAULT expression LINE 4: ... 'name_hash') COLLATE pg_catalog."default" DEFAULT md5(name) ^ QUERY: CREATE FOREIGN TABLE test ( id integer OPTIONS (column_name 'id') NOT NULL, name text OPTIONS (column_name 'name') COLLATE pg_catalog."default" NOT NULL, name_hash character(32) OPTIONS (column_name 'name_hash') COLLATE pg_catalog."default" DEFAULT md5(name) ) SERVER host_as_foreign OPTIONS (schema_name 'src', table_name 'test'); CONTEXT: importing foreign table "test" When enabling the import_default option, postgresImportForeignSchema incorrectly imports the generation expression for generated column name_hash defined on the remote table as a default expression for it. Attached is a patch for fixing this issue. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/26654.1380145647%40sss.pgh.pa.us
Attachment
On Sat, Jul 3, 2021 at 7:44 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > But rather > than computing the generated columns locally, I’m wondering that we > should compute them remotely, assuming that the corresponding > generated columns are defined on the remote sides. (It would be the > user’s responsibility to ensure that.) This seems to me similar to > the constraint case, and if we did so, I think we could fix the > reported issue by extending postgresImportForeignSchema to support > generated columns. I updated the patch as such: * Modified nodeModifyTable.c and copyfrom.c so that they don't compute generated columns for FDWs anymore. * Modified deparse functions such as deparseInsertSql() so that the query is deparsed to set target generated columns (if any) to DEFAULT. * Added an option import_generated to postgres_fdw to support importing generated columns. * Added a bit more test cases to postgres_fdw.sql. * Updated docs. Currently, foreign tables are not allowed to be updated directly when they have stored generated columns, but I think they could be. But I left that as future work as that would be an improvement rather than a fix. Attached is an updated version of the patch. Best regards, Etsuro Fujita
Attachment
On 07.07.21 09:20, Etsuro Fujita wrote: > * Modified nodeModifyTable.c and copyfrom.c so that they don't compute > generated columns for FDWs anymore. I don't agree with that change. What is the point of declaring a generated column on a foreign table if you ignore it? Then you might as well prohibit declaring such columns in the first place.
On Wed, Jul 7, 2021 at 5:45 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > On 07.07.21 09:20, Etsuro Fujita wrote: > > * Modified nodeModifyTable.c and copyfrom.c so that they don't compute > > generated columns for FDWs anymore. > > I don't agree with that change. What is the point of declaring a > generated column on a foreign table if you ignore it? Then you might as > well prohibit declaring such columns in the first place. I made that change as FDWs could call ExecComputeStoredGenerated() by themselves if necessary. I think that that would be consistent with the constraint case. Best regards, Etsuro Fujita
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 07.07.21 09:20, Etsuro Fujita wrote: >> * Modified nodeModifyTable.c and copyfrom.c so that they don't compute >> generated columns for FDWs anymore. > I don't agree with that change. What is the point of declaring a > generated column on a foreign table if you ignore it? Then you might as > well prohibit declaring such columns in the first place. Maybe what we need here is some documentation clarifying exactly what the point of a generated column on a foreign table actually *is*. When would you do it, and what relationship if any has it got to the generation status of the underlying remote column? regards, tom lane
On 07.07.21 17:54, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >> On 07.07.21 09:20, Etsuro Fujita wrote: >>> * Modified nodeModifyTable.c and copyfrom.c so that they don't compute >>> generated columns for FDWs anymore. > >> I don't agree with that change. What is the point of declaring a >> generated column on a foreign table if you ignore it? Then you might as >> well prohibit declaring such columns in the first place. > > Maybe what we need here is some documentation clarifying exactly what > the point of a generated column on a foreign table actually *is*. > When would you do it, and what relationship if any has it got to the > generation status of the underlying remote column? The behavior is explained here: https://www.postgresql.org/docs/current/sql-createforeigntable.html If you have a foreign table with a column "a" and a column "b" generated as (a * 2), and say the underlying storage is a flat file, then the generated value is computed by the server and written to the file. The file behaves essentially analogous to the regular table heap file. The way I understand this bug report is, what if the underlying storage is another SQL database with its own generated columns. Then how do we manage it so that IMPORT FOREIGN SCHEMA can mark the foreign table column as "column exists to be read, but don't write to it". The proposal was to repurpose the generated column syntax for that, but that breaks the flat-file use case above. The follow-up proposal was to let the FDW implementation decide. I don't think that is always the right thing either. What if I want the foreign-table node to do the computations, because it has access to special parameters or something. There might be multiple possibilities of what is appropriate here. But that's not something we can just change around as part of a bug fix.
On Tue, Jul 13, 2021 at 5:36 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > On 07.07.21 17:54, Tom Lane wrote: > > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > >> On 07.07.21 09:20, Etsuro Fujita wrote: > >>> * Modified nodeModifyTable.c and copyfrom.c so that they don't compute > >>> generated columns for FDWs anymore. > > > >> I don't agree with that change. What is the point of declaring a > >> generated column on a foreign table if you ignore it? Then you might as > >> well prohibit declaring such columns in the first place. > > > > Maybe what we need here is some documentation clarifying exactly what > > the point of a generated column on a foreign table actually *is*. > > When would you do it, and what relationship if any has it got to the > > generation status of the underlying remote column? > > The behavior is explained here: > https://www.postgresql.org/docs/current/sql-createforeigntable.html > > If you have a foreign table with a column "a" and a column "b" generated > as (a * 2), and say the underlying storage is a flat file, then the > generated value is computed by the server and written to the file. The > file behaves essentially analogous to the regular table heap file. > > The way I understand this bug report is, what if the underlying storage > is another SQL database with its own generated columns. Then how do we > manage it so that IMPORT FOREIGN SCHEMA can mark the foreign table > column as "column exists to be read, but don't write to it". The > proposal was to repurpose the generated column syntax for that, but that > breaks the flat-file use case above. I think this may also depend on the FDW implementation. > The follow-up proposal was to let the FDW implementation decide. I > don't think that is always the right thing either. What if I want the > foreign-table node to do the computations, because it has access to > special parameters or something. The follow-up proposal was mainly for consistency with the handling of constraints on foreign tables; they are not enforced on the local PG server at all, as noted in the above documentation. > There might be multiple possibilities of what is appropriate here. But > that's not something we can just change around as part of a bug fix. I agree on this part. The changes I made to the core side would go beyond a fix for the reported issue. I'll remove it in the next version of the patch. Best regards, Etsuro Fujita
On Thu, Jul 15, 2021 at 6:15 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Tue, Jul 13, 2021 at 5:36 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: > > The follow-up proposal was to let the FDW implementation decide. I > > don't think that is always the right thing either. What if I want the > > foreign-table node to do the computations, because it has access to > > special parameters or something. > > The follow-up proposal was mainly for consistency with the handling of > constraints on foreign tables; they are not enforced on the local PG > server at all, as noted in the above documentation. > > > There might be multiple possibilities of what is appropriate here. But > > that's not something we can just change around as part of a bug fix. > > I agree on this part. The changes I made to the core side would go > beyond a fix for the reported issue. I'll remove it in the next > version of the patch. I updated the patch as such. Attached is an update patch (postgres_fdw-generated-columns-PG14-HEAD.patch), which would apply to HEAD and v14. I'm also attaching a patch for v12 and v13 (postgres_fdw-generated-columns-PG12-PG13.patch). I'll apply these if there are no objections. Best regards, Etsuro Fujita
Attachment
On Wed, Aug 4, 2021 at 5:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > I'll apply these if there are no objections. Applied. Best regards, Etsuro Fujita