Thread: Keeping state in a foreign data wrapper
Hi, I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for examplethe total number of requests submiited. Ideally these data should be kept “per-user” and of course with the properlocking to eliminate race conditions. So the question I have is how to design such a global (but per user and database)state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fieldsin the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblityseems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is evenless clear how to do it. So is there any simple way to do implement such shared thread/process safe state? Thank you Stelios
2020年8月4日(火) 1:24 Stelios Sfakianakis <sgsfak@gmail.com>: > > Hi, > > I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for examplethe total number of requests submiited. Ideally these data should be kept “per-user” and of course with the properlocking to eliminate race conditions. So the question I have is how to design such a global (but per user and database)state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fieldsin the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblityseems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is evenless clear how to do it. Shared memory would probably work; you'd need to load the FDW via "shared_preload_libraries" and have the FDW handler function(s) update shared memory with whatever statistics you want to track. You could then define SQL functions to retrieve the stored values, and possibly persist them over server restarts by storing/retrieving them from a file. Look at "pg_stat_statements" for a good example of how to do that kind of thing. Regards Ian Barwick -- Ian Barwick https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Thank you Ian for the prompt reply! I will certainly have a look at pg_stat_statements
I also came across mysql_fdw (https://github.com/EnterpriseDB/mysql_fdw) that features a connection pool shared across queries. It uses a hash table with the serverid and userid as lookup key : https://github.com/EnterpriseDB/mysql_fdw/blob/REL-2_5_4/connection.c#L55 The hash table is allocated in the cache memory context but it worries me that 1) no locks are used, 2) the "ConnectionHash" variable is declared static so in the multi-process architecture of Postgres could have been the case that multiple copies of this exist when the shared library of mysql_fdw is loaded?
Best regards
Stelios
On 4 Aug 2020, at 06:25, Ian Lawrence Barwick <barwick@gmail.com> wrote:2020年8月4日(火) 1:24 Stelios Sfakianakis <sgsfak@gmail.com>:
Hi,
I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for example the total number of requests submiited. Ideally these data should be kept “per-user” and of course with the proper locking to eliminate race conditions. So the question I have is how to design such a global (but per user and database) state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fields in the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblity seems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is even less clear how to do it.
Shared memory would probably work; you'd need to load the FDW via
"shared_preload_libraries" and have the FDW handler function(s) update
shared memory with whatever statistics you want to track. You could
then define SQL functions to retrieve the stored values, and possibly
persist them over server restarts by storing/retrieving them from a
file.
Look at "pg_stat_statements" for a good example of how to do that kind of thing.
Regards
Ian Barwick
--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
2020年8月4日(火) 14:54 Stelios Sfakianakis <sgsfak@gmail.com>: > On 4 Aug 2020, at 06:25, Ian Lawrence Barwick <barwick@gmail.com> wrote: > > 2020年8月4日(火) 1:24 Stelios Sfakianakis <sgsfak@gmail.com>: > > Hi, > >>> I am trying to implement a FDW in Postgres for accessing a web api and I would like to keep information like for examplethe total number of requests submiited. Ideally these data should be kept “per-user” and of course with the properlocking to eliminate race conditions. So the question I have is how to design such a global (but per user and database)state, using the C FDW API of Postgres. I have gone through the API and for example I see various fdw_private fieldsin the query planning structures and callbacks but these do not seem to be relevant to my requirements. Another possiblityseems to be to use shared memory (https://www.postgresql.org/docs/12/xfunc-c.html#id-1.8.3.13.14) but it is evenless clear how to do it. > > >> Shared memory would probably work; you'd need to load the FDW via >> "shared_preload_libraries" and have the FDW handler function(s) update >> shared memory with whatever statistics you want to track. You could >> then define SQL functions to retrieve the stored values, and possibly >> persist them over server restarts by storing/retrieving them from a >> file. > >> Look at "pg_stat_statements" for a good example of how to do that kind of thing. > > Thank you Ian for the prompt reply! I will certainly have a look at pg_stat_statements > > I also came across mysql_fdw (https://github.com/EnterpriseDB/mysql_fdw) that features a connection pool shared acrossqueries. It uses a hash table with the serverid and userid as lookup key : https://github.com/EnterpriseDB/mysql_fdw/blob/REL-2_5_4/connection.c#L55 This is essentially the same as what "postgres_fdw" and similar FDW implementations do. > The hash table is allocated in the cache memory context but it worries me that 1) no locks are used, 2) the "ConnectionHash"variable is declared static so in the multi-process architecture of Postgres could have been the case thatmultiple copies of this exist when the shared library of mysql_fdw is loaded? The hash table is specific to each running backend so will only be accessed by that process. Pre-loading a shared library just gives the library an opportunity to set up shared memory etc. You can always try adding one of the FDW libraries to "shared_preload_libraries" and see what happens (theoretically nothing). Regards Ian Barwick > > Best regards > Stelios > > > Regards > > Ian Barwick > > -- > Ian Barwick https://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > >
Thank you again, I have another question in order to make sure I have a clear understanding: > On 4 Aug 2020, at 11:24, Ian Lawrence Barwick <barwick@gmail.com> wrote: > > The hash table is specific to each running backend so will only be > accessed by that process. > > Pre-loading a shared library just gives the library an opportunity to > set up shared memory etc. You can always try adding one of the FDW > libraries to "shared_preload_libraries" and see what happens > (theoretically nothing). > My impression was that since each client (e.g. libpq) connection results in the creation of a Postgres process in the backend(https://www.postgresql.org/developer/backend/) then this (mysql) "connection pool" hash table is not global perse and shared among the different client / users sessions. But that defeats the purpose, no? Thank you Best Stelios
On 2020/08/04 19:21, Stelios Sfakianakis wrote: > Thank you again, I have another question in order to make sure I have a clear understanding: > > >> On 4 Aug 2020, at 11:24, Ian Lawrence Barwick <barwick@gmail.com> wrote: >> >> The hash table is specific to each running backend so will only be >> accessed by that process. >> >> Pre-loading a shared library just gives the library an opportunity to >> set up shared memory etc. You can always try adding one of the FDW >> libraries to "shared_preload_libraries" and see what happens >> (theoretically nothing). >> > > My impression was that since each client (e.g. libpq) connection results in the creation of a Postgres process in thebackend (https://www.postgresql.org/developer/backend/) then this (mysql) "connection pool" hash table is not globalper se and shared among the different client / users sessions. Correct, the connections are specific to each individual backend. > But that defeats the purpose, no? The purpose is to cache connections within the session, to avoid the overhead of reconnecting to the remote server each time a query for that server is issued in that session. Regards Ian Barwick -- Ian Barwick https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2020/08/04 19:21, Stelios Sfakianakis wrote: > Thank you again, I have another question in order to make sure I have a clear understanding: > > >> On 4 Aug 2020, at 11:24, Ian Lawrence Barwick <barwick@gmail.com> wrote: >> >> The hash table is specific to each running backend so will only be >> accessed by that process. >> >> Pre-loading a shared library just gives the library an opportunity to >> set up shared memory etc. You can always try adding one of the FDW >> libraries to "shared_preload_libraries" and see what happens >> (theoretically nothing). >> > > My impression was that since each client (e.g. libpq) connection results in the creation of a Postgres process in thebackend (https://www.postgresql.org/developer/backend/) then this (mysql) "connection pool" hash table is not globalper se and shared among the different client / users sessions. Correct, the connections are specific to each individual backend. > But that defeats the purpose, no? The purpose is to cache connections within the session, to avoid the overhead of reconnecting to the remote server each time a query for that server is issued in that session. Regards Ian Barwick -- Ian Barwick https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services