Thread: Query on postgres_fdw extension

Query on postgres_fdw extension

From
Swathi P
Date:
Hello EveryOne,

 Hope you are all doing well and staying safe. 

Am Swathi, have been working with postgres for last 3 years. Currently we are working on a project to build a sharding solution with the help of native postgres_fdw extension. During this process, we have encountered an issue with postgres_fdw. I tried to give as much as details below on the issue we are facing, it would be of great help if you can help us overcome this issue. 

- We have Host_A and Host_B , where Host_A is out coordinator node and Host_B is used as our data node. 

- Host_B has a table "Table_B" with a sequence id column which auto generates the series by default 
  CREATE TABLE public.table_a
 (
     id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
     topic character varying(50) NOT NULL,
     CONSTRAINT table_a_pk PRIMARY KEY (id)
 )

- on Host_A we have a foreign table created with the ddl below 
 CREATE FOREIGN TABLE public.table_a
 (
      id bigint ,
      topic character varying(50) NOT NULL,
  ) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');

- When we try to insert data directly on the table_a while connected to Host_B, works fine with the auto incremented values for the id column 

- But the same insert fails when run from the coordinator node with below error. 
poc=> insert into table_a(topic) values ('test');
ERROR:  null value in column "id" of relation "table_a" violates not-null constraint
DETAIL:  Failing row contains (null, test).
CONTEXT:  remote SQL command: INSERT INTO public.table_a(id, topic) VALUES ($1, $2)

- If we omit the primary key and not null constraints on the table_a on remote host (Host_B) , inserts will work fine on Host_A but a NULL value insertedd for the id column instead of sequence 

We are looking for some help on understanding the issue here and the best possible workaround for the same. 

Your help will be greatly appreciated 

Thanks 
Swathi P

Re: Query on postgres_fdw extension

From
Bharath Rupireddy
Date:
On Thu, May 13, 2021 at 4:58 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
>
> Hello EveryOne,
>
>  Hope you are all doing well and staying safe.
>
> Am Swathi, have been working with postgres for last 3 years. Currently we are working on a project to build a
shardingsolution with the help of native postgres_fdw extension. During this process, we have encountered an issue with
postgres_fdw.I tried to give as much as details below on the issue we are facing, it would be of great help if you can
helpus overcome this issue. 
>
> - We have Host_A and Host_B , where Host_A is out coordinator node and Host_B is used as our data node.
>
> - Host_B has a table "Table_B" with a sequence id column which auto generates the series by default
>   CREATE TABLE public.table_a
>  (
>      id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
>      topic character varying(50) NOT NULL,
>      CONSTRAINT table_a_pk PRIMARY KEY (id)
>  )
>
> - on Host_A we have a foreign table created with the ddl below
>  CREATE FOREIGN TABLE public.table_a
>  (
>       id bigint ,
>       topic character varying(50) NOT NULL,
>   ) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');
>
> - When we try to insert data directly on the table_a while connected to Host_B, works fine with the auto incremented
valuesfor the id column 
>
> - But the same insert fails when run from the coordinator node with below error.
> poc=> insert into table_a(topic) values ('test');
> ERROR:  null value in column "id" of relation "table_a" violates not-null constraint
> DETAIL:  Failing row contains (null, test).
> CONTEXT:  remote SQL command: INSERT INTO public.table_a(id, topic) VALUES ($1, $2)
>
> - If we omit the primary key and not null constraints on the table_a on remote host (Host_B) , inserts will work fine
onHost_A but a NULL value insertedd for the id column instead of sequence 
>
> We are looking for some help on understanding the issue here and the best possible workaround for the same.
>
> Your help will be greatly appreciated

I think you need to declare your foreign table column id as "serial"
type instead of "bigint". Below is what I tried from my end.

On remote server:
CREATE USER foreign_user;
DROP TABLE table_a;
CREATE TABLE table_a
 (
     id serial NOT NULL,
     topic character varying(50) NOT NULL,
     CONSTRAINT table_a_pk PRIMARY KEY (id)
 );
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO foreign_user;

On local server:
DROP EXTENSION postgres_fdw CASCADE;
CREATE EXTENSION postgres_fdw;
CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE USER MAPPING FOR public
        SERVER foreign_server
        OPTIONS (user 'foreign_user', password '');
CREATE FOREIGN TABLE table_a (id serial NOT NULL,
     topic character varying(50) NOT NULL)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'table_a');
SELECT * FROM table_a;
INSERT INTO table_a(topic) VALUES('row1');
INSERT INTO table_a(topic) VALUES('row2');
INSERT INTO table_a(topic) VALUES('row3');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Query on postgres_fdw extension

From
Bharath Rupireddy
Date:
On Thu, May 13, 2021 at 4:58 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
>
> Hello EveryOne,
>
>  Hope you are all doing well and staying safe.
>
> Am Swathi, have been working with postgres for last 3 years. Currently we are working on a project to build a
shardingsolution with the help of native postgres_fdw extension. During this process, we have encountered an issue with
postgres_fdw.I tried to give as much as details below on the issue we are facing, it would be of great help if you can
helpus overcome this issue. 
>
> - We have Host_A and Host_B , where Host_A is out coordinator node and Host_B is used as our data node.
>
> - Host_B has a table "Table_B" with a sequence id column which auto generates the series by default
>   CREATE TABLE public.table_a
>  (
>      id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
>      topic character varying(50) NOT NULL,
>      CONSTRAINT table_a_pk PRIMARY KEY (id)
>  )
>
> - on Host_A we have a foreign table created with the ddl below
>  CREATE FOREIGN TABLE public.table_a
>  (
>       id bigint ,
>       topic character varying(50) NOT NULL,
>   ) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');
>
> - When we try to insert data directly on the table_a while connected to Host_B, works fine with the auto incremented
valuesfor the id column 
>
> - But the same insert fails when run from the coordinator node with below error.
> poc=> insert into table_a(topic) values ('test');
> ERROR:  null value in column "id" of relation "table_a" violates not-null constraint
> DETAIL:  Failing row contains (null, test).
> CONTEXT:  remote SQL command: INSERT INTO public.table_a(id, topic) VALUES ($1, $2)
>
> - If we omit the primary key and not null constraints on the table_a on remote host (Host_B) , inserts will work fine
onHost_A but a NULL value insertedd for the id column instead of sequence 
>
> We are looking for some help on understanding the issue here and the best possible workaround for the same.
>
> Your help will be greatly appreciated

I think you need to declare your foreign table column id as "serial"
type instead of "bigint". Below is what I tried from my end.

On remote server:
CREATE USER foreign_user;
DROP TABLE table_a;
CREATE TABLE table_a
 (
     id serial NOT NULL,
     topic character varying(50) NOT NULL,
     CONSTRAINT table_a_pk PRIMARY KEY (id)
 );
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO foreign_user;

On local server:
DROP EXTENSION postgres_fdw CASCADE;
CREATE EXTENSION postgres_fdw;
CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE USER MAPPING FOR public
        SERVER foreign_server
        OPTIONS (user 'foreign_user', password '');
CREATE FOREIGN TABLE table_a (id serial NOT NULL,
     topic character varying(50) NOT NULL)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'table_a');
SELECT * FROM table_a;
INSERT INTO table_a(topic) VALUES('row1');
INSERT INTO table_a(topic) VALUES('row2');
INSERT INTO table_a(topic) VALUES('row3');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Query on postgres_fdw extension

From
Swathi P
Date:
Thanks Bharath for the details. 

In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we might end up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from multiple coordinatoor nodes and might end up in duplicate key violations. 

Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence. Let me know if this makes sense. 

Have come across multiple articles on the same issue, i would like to know if we are doing something wrong here or we have bette workaround for this issue,


Thanks again for your time. 

- Swathi

On Thu, May 13, 2021 at 5:34 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, May 13, 2021 at 4:58 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
>
> Hello EveryOne,
>
>  Hope you are all doing well and staying safe.
>
> Am Swathi, have been working with postgres for last 3 years. Currently we are working on a project to build a sharding solution with the help of native postgres_fdw extension. During this process, we have encountered an issue with postgres_fdw. I tried to give as much as details below on the issue we are facing, it would be of great help if you can help us overcome this issue.
>
> - We have Host_A and Host_B , where Host_A is out coordinator node and Host_B is used as our data node.
>
> - Host_B has a table "Table_B" with a sequence id column which auto generates the series by default
>   CREATE TABLE public.table_a
>  (
>      id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
>      topic character varying(50) NOT NULL,
>      CONSTRAINT table_a_pk PRIMARY KEY (id)
>  )
>
> - on Host_A we have a foreign table created with the ddl below
>  CREATE FOREIGN TABLE public.table_a
>  (
>       id bigint ,
>       topic character varying(50) NOT NULL,
>   ) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');
>
> - When we try to insert data directly on the table_a while connected to Host_B, works fine with the auto incremented values for the id column
>
> - But the same insert fails when run from the coordinator node with below error.
> poc=> insert into table_a(topic) values ('test');
> ERROR:  null value in column "id" of relation "table_a" violates not-null constraint
> DETAIL:  Failing row contains (null, test).
> CONTEXT:  remote SQL command: INSERT INTO public.table_a(id, topic) VALUES ($1, $2)
>
> - If we omit the primary key and not null constraints on the table_a on remote host (Host_B) , inserts will work fine on Host_A but a NULL value insertedd for the id column instead of sequence
>
> We are looking for some help on understanding the issue here and the best possible workaround for the same.
>
> Your help will be greatly appreciated

I think you need to declare your foreign table column id as "serial"
type instead of "bigint". Below is what I tried from my end.

On remote server:
CREATE USER foreign_user;
DROP TABLE table_a;
CREATE TABLE table_a
 (
     id serial NOT NULL,
     topic character varying(50) NOT NULL,
     CONSTRAINT table_a_pk PRIMARY KEY (id)
 );
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO foreign_user;

On local server:
DROP EXTENSION postgres_fdw CASCADE;
CREATE EXTENSION postgres_fdw;
CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE USER MAPPING FOR public
        SERVER foreign_server
        OPTIONS (user 'foreign_user', password '');
CREATE FOREIGN TABLE table_a (id serial NOT NULL,
     topic character varying(50) NOT NULL)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'table_a');
SELECT * FROM table_a;
INSERT INTO table_a(topic) VALUES('row1');
INSERT INTO table_a(topic) VALUES('row2');
INSERT INTO table_a(topic) VALUES('row3');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Query on postgres_fdw extension

From
Swathi P
Date:
Thanks Bharath for the details. 

In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we might end up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from multiple coordinatoor nodes and might end up in duplicate key violations. 

Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence. Let me know if this makes sense. 

Have come across multiple articles on the same issue, i would like to know if we are doing something wrong here or we have bette workaround for this issue,


Thanks again for your time. 

- Swathi

On Thu, May 13, 2021 at 5:34 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, May 13, 2021 at 4:58 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
>
> Hello EveryOne,
>
>  Hope you are all doing well and staying safe.
>
> Am Swathi, have been working with postgres for last 3 years. Currently we are working on a project to build a sharding solution with the help of native postgres_fdw extension. During this process, we have encountered an issue with postgres_fdw. I tried to give as much as details below on the issue we are facing, it would be of great help if you can help us overcome this issue.
>
> - We have Host_A and Host_B , where Host_A is out coordinator node and Host_B is used as our data node.
>
> - Host_B has a table "Table_B" with a sequence id column which auto generates the series by default
>   CREATE TABLE public.table_a
>  (
>      id bigint NOT NULL DEFAULT nextval('table_a_id_seq'::regclass),
>      topic character varying(50) NOT NULL,
>      CONSTRAINT table_a_pk PRIMARY KEY (id)
>  )
>
> - on Host_A we have a foreign table created with the ddl below
>  CREATE FOREIGN TABLE public.table_a
>  (
>       id bigint ,
>       topic character varying(50) NOT NULL,
>   ) SERVER Host_A OPTIONS (schema_name 'public', table_name 'table_a');
>
> - When we try to insert data directly on the table_a while connected to Host_B, works fine with the auto incremented values for the id column
>
> - But the same insert fails when run from the coordinator node with below error.
> poc=> insert into table_a(topic) values ('test');
> ERROR:  null value in column "id" of relation "table_a" violates not-null constraint
> DETAIL:  Failing row contains (null, test).
> CONTEXT:  remote SQL command: INSERT INTO public.table_a(id, topic) VALUES ($1, $2)
>
> - If we omit the primary key and not null constraints on the table_a on remote host (Host_B) , inserts will work fine on Host_A but a NULL value insertedd for the id column instead of sequence
>
> We are looking for some help on understanding the issue here and the best possible workaround for the same.
>
> Your help will be greatly appreciated

I think you need to declare your foreign table column id as "serial"
type instead of "bigint". Below is what I tried from my end.

On remote server:
CREATE USER foreign_user;
DROP TABLE table_a;
CREATE TABLE table_a
 (
     id serial NOT NULL,
     topic character varying(50) NOT NULL,
     CONSTRAINT table_a_pk PRIMARY KEY (id)
 );
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO foreign_user;

On local server:
DROP EXTENSION postgres_fdw CASCADE;
CREATE EXTENSION postgres_fdw;
CREATE SERVER foreign_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE USER MAPPING FOR public
        SERVER foreign_server
        OPTIONS (user 'foreign_user', password '');
CREATE FOREIGN TABLE table_a (id serial NOT NULL,
     topic character varying(50) NOT NULL)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'table_a');
SELECT * FROM table_a;
INSERT INTO table_a(topic) VALUES('row1');
INSERT INTO table_a(topic) VALUES('row2');
INSERT INTO table_a(topic) VALUES('row3');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Query on postgres_fdw extension

From
Bharath Rupireddy
Date:
On Fri, May 14, 2021 at 12:37 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
>
> Thanks Bharath for the details.
>
> In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we
mightend up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from
multiplecoordinatoor nodes and might end up in duplicate key violations. 
>
> Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence.
Letme know if this makes sense. 

Hm.

> Have come across multiple articles on the same issue, i would like to know if we are doing something wrong here or we
havebette workaround for this issue, 
>
> https://www.postgresql.org/message-id/CAP=oouH=FccW4V2zm1VjGit=NZDCXzU2tYBoZe88v3mXrEA9Qg@mail.gmail.com
> https://stackoverflow.com/questions/66582694/how-to-get-the-generated-id-from-an-insert-to-a-postgres-foreign-table
> https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/

Did you try using the approach specified by Michael in [1]? Won't that help?

"As a conclusion, you can create tables using unique values across
multiple nodes by associating for example foreign_seq_nextval() with
DEFAULT for a column.
=# CREATE TABLE tab (a int DEFAULT foreign_seq_nextval());
CREATE TABLE
=# INSERT INTO tab VALUES (DEFAULT), (DEFAULT), (DEFAULT);
INSERT 0 3
=# SELECT * FROM tab;
 a
----
 9
10
11
(3 rows)
"

[1] - https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Query on postgres_fdw extension

From
Bharath Rupireddy
Date:
On Fri, May 14, 2021 at 12:37 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
>
> Thanks Bharath for the details.
>
> In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we
mightend up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from
multiplecoordinatoor nodes and might end up in duplicate key violations. 
>
> Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence.
Letme know if this makes sense. 

Hm.

> Have come across multiple articles on the same issue, i would like to know if we are doing something wrong here or we
havebette workaround for this issue, 
>
> https://www.postgresql.org/message-id/CAP=oouH=FccW4V2zm1VjGit=NZDCXzU2tYBoZe88v3mXrEA9Qg@mail.gmail.com
> https://stackoverflow.com/questions/66582694/how-to-get-the-generated-id-from-an-insert-to-a-postgres-foreign-table
> https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/

Did you try using the approach specified by Michael in [1]? Won't that help?

"As a conclusion, you can create tables using unique values across
multiple nodes by associating for example foreign_seq_nextval() with
DEFAULT for a column.
=# CREATE TABLE tab (a int DEFAULT foreign_seq_nextval());
CREATE TABLE
=# INSERT INTO tab VALUES (DEFAULT), (DEFAULT), (DEFAULT);
INSERT 0 3
=# SELECT * FROM tab;
 a
----
 9
10
11
(3 rows)
"

[1] - https://paquier.xyz/postgresql-2/global-sequences-with-postgres_fdw-and-postgres-core/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Query on postgres_fdw extension

From
Tom Lane
Date:
[ removing -bugs ]

Swathi P <swathi.bluepearl@gmail.com> writes:
> Hence we decided to have the coordinator nodes as stateless and hence
> declared the column with no serial/sequence. Let me know if this makes
> sense.

Attaching serial-sequence defaults on both sides would certainly not
work very well, because the sequences wouldn't stay in sync.

Unfortunately, postgres_fdw just doesn't have a good way right now
to make use of dynamically-generated defaults at the remote server.
If you leave out a column in your INSERT, it's going to compute
and send the locally-defined default (which is just null in this
case), so the remote's default expression is never used.

I remember that we spent a great deal of effort in postgres_fdw's
early days, trying to find a way that we could use the remote's
defaults in cases like this.  But everything we tried ended up
causing horrible semantic inconsistencies, so we ended up with
the always-use-the-local-default approach.  There was some feeling
that maybe this could be revisited later, but no one's done so.

One conceivable workaround is to do your insertions through a
foreign table that doesn't even have the serial column, so that
the INSERT command received by the remote server lacks that
column and the default gets applied.  Probably too messy though.

            regards, tom lane



Re: Query on postgres_fdw extension

From
Bharath Rupireddy
Date:
On Fri, May 14, 2021 at 8:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> [ removing -bugs ]
>
> Swathi P <swathi.bluepearl@gmail.com> writes:
> > Hence we decided to have the coordinator nodes as stateless and hence
> > declared the column with no serial/sequence. Let me know if this makes
> > sense.
>
> Attaching serial-sequence defaults on both sides would certainly not
> work very well, because the sequences wouldn't stay in sync.
>
> Unfortunately, postgres_fdw just doesn't have a good way right now
> to make use of dynamically-generated defaults at the remote server.
> If you leave out a column in your INSERT, it's going to compute
> and send the locally-defined default (which is just null in this
> case), so the remote's default expression is never used.
>
> I remember that we spent a great deal of effort in postgres_fdw's
> early days, trying to find a way that we could use the remote's
> defaults in cases like this.  But everything we tried ended up
> causing horrible semantic inconsistencies, so we ended up with
> the always-use-the-local-default approach.  There was some feeling
> that maybe this could be revisited later, but no one's done so.
>
> One conceivable workaround is to do your insertions through a
> foreign table that doesn't even have the serial column, so that
> the INSERT command received by the remote server lacks that
> column and the default gets applied.  Probably too messy though.

Then the serial column cannot be selected via the foreign table.
Maybe, drop the serial column from the foreign table before insertions
and add the serial column before the selects from the foreign table.
This is not elegant though.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Query on postgres_fdw extension

From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> On Fri, May 14, 2021 at 8:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One conceivable workaround is to do your insertions through a
>> foreign table that doesn't even have the serial column, so that
>> the INSERT command received by the remote server lacks that
>> column and the default gets applied.  Probably too messy though.

> Then the serial column cannot be selected via the foreign table.

Yeah, you'd probably need to use different tables for inserting
and reading, which is why I called it messy.

The idea you pointed to in Michael's blog might work though,
ie make a local function that reaches over to the remote server
to get a globally-valid next sequence value, and use that as
the default expression of the foreign table.

            regards, tom lane



Re: Query on postgres_fdw extension

From
Bharath Rupireddy
Date:
On Tue, Jun 1, 2021 at 3:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> Hi,
>
> On Fri, May 14, 2021 at 6:08 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
> > In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we
mightend up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from
multiplecoordinatoor nodes and might end up in duplicate key violations. 
> >
> > Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence.
Letme know if this makes sense. 
>
> It seems reasonable to me to make coodinator nodes stateless, but may
> I ask the reason you use multiple coordinator nodes?

Perhaps, as a redundant node to avoid single point of failures? It's
just a guess as I'm not the right one to answer that question though.

With Regards,
Bharath Rupireddy.



Re: Query on postgres_fdw extension

From
Swathi P
Date:
You are right, we added more than one 
coordinator nodes for high availability and to avoid single point of failure. 

Thanks
Swathi 

On Tue, Jun 1, 2021, 3:54 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Jun 1, 2021 at 3:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> Hi,
>
> On Fri, May 14, 2021 at 6:08 PM Swathi P <swathi.bluepearl@gmail.com> wrote:
> > In our sharding solution, we have multiple coodinator nodes. If we declare the table column as serial data type, we might end up having duplicate values for id column in the table_a in host_b (data node) as cconnections come from multiple coordinatoor nodes and might end up in duplicate key violations.
> >
> > Hence we decided to have the coordinator nodes as stateless and hence declared the column with no serial/sequence. Let me know if this makes sense.
>
> It seems reasonable to me to make coodinator nodes stateless, but may
> I ask the reason you use multiple coordinator nodes?

Perhaps, as a redundant node to avoid single point of failures? It's
just a guess as I'm not the right one to answer that question though.

With Regards,
Bharath Rupireddy.

Re: Query on postgres_fdw extension

From
Duarte Carreira
Date:
Hello everyone.

I got here after encountering the same difficulty, although on a much more mundane scenario.

I'm used to fdw on a read-only basis. I was just inserting a new record on a foreign table and got blocked... and after much searching got here.

Not to rant or anything, but I am completely surprised by this limitation. As far as I can see it is impossible to use fdw to insert records on 99% of tables, since all have some kind of primary sequential key.

I'm just a user so cannot really understand the intricacies involved in this process. Tried to find past messages and up to 2013 without understanding the real problem.

The simplest workaround seems to be to quit using auto-numbering mechanisms and implement numbering trigger functions, which is really just going back to the 90s...

Another option would be a local function that would get the remote default and use it in a local insert trigger. The complexity is just orders of magnitude higher. We are talking about auto-numbering keys...

I don't know... realistically what do you guys see as a best/simple approach? 

Having 2 tables seems to me the easiest, less complex solution, but it's hard on maintenance...

And don't take this the wrong way, but is it really that hard to have a compromise: if there's a serial on the remote, then the user could change the local definition so to just send the "DEFAULT" keyword to the remote and let it figure it out? At least the user would have a chance of setting the preferred behavior without much fuss, on a per-table basis. And still use the basic functionality of serial/identity columns. PostgreSQL has such complex stuff that this seems odd to be left out.

Well I hope I didn't cross over as negative or anything. I do love pgsql and always promote it as the best thing under the sun.

Best regards,
Duarte

Laurenz Albe <laurenz.albe@cybertec.at> escreveu no dia quinta, 20/01/2022 à(s) 15:36:
On Fri, 2021-05-14 at 10:53 -0400, Tom Lane wrote:
> Swathi P <swathi.bluepearl@gmail.com> writes:
> > Hence we decided to have the coordinator nodes as stateless and hence
> > declared the column with no serial/sequence. Let me know if this makes
> > sense.
>
> Attaching serial-sequence defaults on both sides would certainly not
> work very well, because the sequences wouldn't stay in sync.
>
> Unfortunately, postgres_fdw just doesn't have a good way right now
> to make use of dynamically-generated defaults at the remote server.
> If you leave out a column in your INSERT, it's going to compute
> and send the locally-defined default (which is just null in this
> case), so the remote's default expression is never used.
>
> I remember that we spent a great deal of effort in postgres_fdw's
> early days, trying to find a way that we could use the remote's
> defaults in cases like this.  But everything we tried ended up
> causing horrible semantic inconsistencies, so we ended up with
> the always-use-the-local-default approach.  There was some feeling
> that maybe this could be revisited later, but no one's done so.
>
> One conceivable workaround is to do your insertions through a
> foreign table that doesn't even have the serial column, so that
> the INSERT command received by the remote server lacks that
> column and the default gets applied.  Probably too messy though.

One possibility might be to define a trigger on the remote table
that fetches the next sequence value if you try to insert NULL.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com





Re: Query on postgres_fdw extension

From
Vijaykumar Jain
Date:
On Thu, 20 Jan 2022 at 21:29, Duarte Carreira <dncarreira@gmail.com> wrote:
Hello everyone.

I don't know... realistically what do you guys see as a best/simple approach? 

We implemented a custom sharding (directory sharding with lookup tables) layer of 10 shards, but it was write local, read global.
the api was responsible for all rebalancing incase of hotspots.
other api sharding examples ...

 although it worked really well, when you are maintaining it on your own, it gets really painful, much beyond id generation globally.

i will not go into the details, but in short, sharded setup is not the same as local setup. there would be many more things that would not work as expected
which would otherwise work really well on a standalone setup.

writes over shard may work, but you realize it is over the network, so you can lock you table for a much longer duration and cause a much more serious outage,
if you really wanted to have distributed writes with unique keys, you can go with uuid i think or have your own seq generator globally (see below).

Move ID generation out of the database to an ID generation service outside of the database… As soon as a piece of work enters their system, an ID gets assigned to it… and that ID generated in a way that is known to be globally unique within their system

Index of /shard_manager/shard_manager-0.0.1/ (pgxn.org)  (pretty old but if you can use your coordinator server as a id_generator(), then you can generate ids which are globally unique)

imho, do not try sharding manually, unless you have enough dbas to maintain the shards, try using citus, it would make a lot of the manual stuff easier.

also, the below work arounds are bad, incase you just want to rush through

postgres=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=#
localdb=# \dt
Did not find any relations.
localdb=# \det
     List of foreign tables
 Schema | Table |    Server
--------+-------+---------------
 public | t     | remote_server
(1 row)

localdb=# \det+ t
                                List of foreign tables
 Schema | Table |    Server     |              FDW options               | Description
--------+-------+---------------+----------------------------------------+-------------
 public | t     | remote_server | (schema_name 'public', table_name 't') |
(1 row)

localdb=# \det t
     List of foreign tables
 Schema | Table |    Server
--------+-------+---------------
 public | t     | remote_server
(1 row)

localdb=# create or replace function getnext() returns int as $_$ select id FROM dblink ('dbname = remotedb', $$ select nextval('t_id_seq') $$  ) as  newtable(id int); $_$ language sql;
CREATE FUNCTION
localdb=# \c remotedb
You are now connected to database "remotedb" as user "postgres".
remotedb=# \dt t
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | t    | table | postgres
(1 row)

remotedb=# \ds t_id_seq
            List of relations
 Schema |   Name   |   Type   |  Owner
--------+----------+----------+----------
 public | t_id_seq | sequence | postgres
(1 row)

remotedb=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# select * from t;
 id | col1
----+------
 11 |    4
 12 |    5
 13 |  100
 14 |  100
(4 rows)

just my opinion, ignore it not useful.


Re: Query on postgres_fdw extension

From
Duarte Carreira
Date:
Thanks for your help!

I'm not going forward with the id generating scheme... I prefer to let the bd do that work on its own. Sharding is way over my head.
For now I just created the 2 tables, one for inserting (without the id column), another for everything else. It's awkward and prone to human error but as long as nothing changes and no one deletes it thinking it's garbage...

Thanks.

Vijaykumar Jain <vijaykumarjain.github@gmail.com> escreveu no dia quinta, 20/01/2022 à(s) 17:39:
On Thu, 20 Jan 2022 at 21:29, Duarte Carreira <dncarreira@gmail.com> wrote:
Hello everyone.

I don't know... realistically what do you guys see as a best/simple approach? 

We implemented a custom sharding (directory sharding with lookup tables) layer of 10 shards, but it was write local, read global.
the api was responsible for all rebalancing incase of hotspots.
other api sharding examples ...

 although it worked really well, when you are maintaining it on your own, it gets really painful, much beyond id generation globally.

i will not go into the details, but in short, sharded setup is not the same as local setup. there would be many more things that would not work as expected
which would otherwise work really well on a standalone setup.

writes over shard may work, but you realize it is over the network, so you can lock you table for a much longer duration and cause a much more serious outage,
if you really wanted to have distributed writes with unique keys, you can go with uuid i think or have your own seq generator globally (see below).

Move ID generation out of the database to an ID generation service outside of the database… As soon as a piece of work enters their system, an ID gets assigned to it… and that ID generated in a way that is known to be globally unique within their system

Index of /shard_manager/shard_manager-0.0.1/ (pgxn.org)  (pretty old but if you can use your coordinator server as a id_generator(), then you can generate ids which are globally unique)

imho, do not try sharding manually, unless you have enough dbas to maintain the shards, try using citus, it would make a lot of the manual stuff easier.

also, the below work arounds are bad, incase you just want to rush through

postgres=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=#
localdb=# \dt
Did not find any relations.
localdb=# \det
     List of foreign tables
 Schema | Table |    Server
--------+-------+---------------
 public | t     | remote_server
(1 row)

localdb=# \det+ t
                                List of foreign tables
 Schema | Table |    Server     |              FDW options               | Description
--------+-------+---------------+----------------------------------------+-------------
 public | t     | remote_server | (schema_name 'public', table_name 't') |
(1 row)

localdb=# \det t
     List of foreign tables
 Schema | Table |    Server
--------+-------+---------------
 public | t     | remote_server
(1 row)

localdb=# create or replace function getnext() returns int as $_$ select id FROM dblink ('dbname = remotedb', $$ select nextval('t_id_seq') $$  ) as  newtable(id int); $_$ language sql;
CREATE FUNCTION
localdb=# \c remotedb
You are now connected to database "remotedb" as user "postgres".
remotedb=# \dt t
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | t    | table | postgres
(1 row)

remotedb=# \ds t_id_seq
            List of relations
 Schema |   Name   |   Type   |  Owner
--------+----------+----------+----------
 public | t_id_seq | sequence | postgres
(1 row)

remotedb=# \c localdb
You are now connected to database "localdb" as user "postgres".
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# insert into t values (getnext(), 100);
INSERT 0 1
localdb=# select * from t;
 id | col1
----+------
 11 |    4
 12 |    5
 13 |  100
 14 |  100
(4 rows)

just my opinion, ignore it not useful.


Re: Query on postgres_fdw extension

From
Laurenz Albe
Date:
On Thu, 2022-01-20 at 15:59 +0000, Duarte Carreira wrote:
> I got here after encountering the same difficulty, although on a much more mundane scenario.
> 
> I'm used to fdw on a read-only basis. I was just inserting a new record on a foreign table
> and got blocked... and after much searching got here.
> 
> As far as I can see it is impossible to use fdw to insert records on 99% of tables,
> since all have some kind of primary sequential key.

Yes, this is tricky.  You could use something like this:

CREATE TABLE local (
   id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
   data text
);

CREATE FOREIGN TABLE remote (id bigint NOT NULL, data text)
   SERVER whatever OPTIONS (table_name 'local');

CREATE FOREIGN TABLE remote_noid (data text)
   SERVER whatever OPTIONS (table_name 'local');

CREATE VIEW v_remote AS SELECT * FROM remote;

CREATE FUNCTION ins_trig() RETURNS trigger LANGUAGE plpgsql AS
$$BEGIN
   INSERT INTO remote_noid (data) VALUES (NEW.data);
   RETURN NEW;
END;$$;

CREATE TRIGGER ins_trig INSTEAD OF INSERT ON v_remote
    FOR EACH ROW EXECUTE FUNCTION ins_trig();

INSERT INTO v_remote (data) VALUES ('something');

SELECT * FROM v_remote;

 id │   data    
════╪═══════════
  1 │ something

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Query on postgres_fdw extension

From
Duarte Carreira
Date:
Hmmm... I don't think a view or trigger are necessary.

If we just create the 2 foreign tables, one complete and one without id, you can simply insert into the table without id and it will work fine.
To select and show data, you use the "complete" table that has the id column.

No need for trigger and view. If I understood correctly.

I have this 2 table setup working.

It's a workaround that quickly escalates out of hand though... with little added value.

Thanks.

Laurenz Albe <laurenz.albe@cybertec.at> escreveu no dia sexta, 21/01/2022 à(s) 13:59:
On Thu, 2022-01-20 at 15:59 +0000, Duarte Carreira wrote:
> I got here after encountering the same difficulty, although on a much more mundane scenario.
>
> I'm used to fdw on a read-only basis. I was just inserting a new record on a foreign table
> and got blocked... and after much searching got here.
>
> As far as I can see it is impossible to use fdw to insert records on 99% of tables,
> since all have some kind of primary sequential key.

Yes, this is tricky.  You could use something like this:

CREATE TABLE local (
   id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
   data text
);

CREATE FOREIGN TABLE remote (id bigint NOT NULL, data text)
   SERVER whatever OPTIONS (table_name 'local');

CREATE FOREIGN TABLE remote_noid (data text)
   SERVER whatever OPTIONS (table_name 'local');

CREATE VIEW v_remote AS SELECT * FROM remote;

CREATE FUNCTION ins_trig() RETURNS trigger LANGUAGE plpgsql AS
$$BEGIN
   INSERT INTO remote_noid (data) VALUES (NEW.data);
   RETURN NEW;
END;$$;

CREATE TRIGGER ins_trig INSTEAD OF INSERT ON v_remote
    FOR EACH ROW EXECUTE FUNCTION ins_trig();

INSERT INTO v_remote (data) VALUES ('something');

SELECT * FROM v_remote;

 id │   data   
════╪═══════════
  1 │ something

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: Query on postgres_fdw extension

From
Laurenz Albe
Date:
On Fri, 2022-01-21 at 14:33 +0000, Duarte Carreira wrote:
> If we just create the 2 foreign tables, one complete and one without id,
> you can simply insert into the table without id and it will work fine.
> To select and show data, you use the "complete" table that has the id column.
> 
> No need for trigger and view. If I understood correctly.
> 
> I have this 2 table setup working.
> 
> It's a workaround that quickly escalates out of hand though... with little added value.

Sure, that will work.

I just wanted to show how you can do without dealing with two different
foreign tables explicitly.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Query on postgres_fdw extension

From
Shaozhong SHI
Date:
Any functional code to be tested to confirm?

Regards,

David

On Fri, 21 Jan 2022 at 15:55, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2022-01-21 at 14:33 +0000, Duarte Carreira wrote:
> If we just create the 2 foreign tables, one complete and one without id,
> you can simply insert into the table without id and it will work fine.
> To select and show data, you use the "complete" table that has the id column.
>
> No need for trigger and view. If I understood correctly.
>
> I have this 2 table setup working.
>
> It's a workaround that quickly escalates out of hand though... with little added value.

Sure, that will work.

I just wanted to show how you can do without dealing with two different
foreign tables explicitly.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com