Thread: Support for VACUUMing Foreign Tables
Hi, I think it will be useful to allow foreign tables to be VACUUMed if the underlying FDW supports, currently VACUUM doesn't support foreign tables, see [1]. In case of postgres_fdw, if foreign tables are specified in the local VACUUM command, a per-server remote VACUUM command can be prepared with the foreign tables that belong to the same server and sent to the foreign server. This design is similar to TRUNCATE on foreign tables committed as part of 8ff1c946. Although, this may not be much useful for FDWs that connect to remote non-MVCC databases where the concept of VACUUM may not apply, but for postgres_fdw and others it might help. I would like to hear opinions from the hackers. If it's not possible/feasible to do this, please let me know the reasons. Thanks. [1] postgres=# vacuum ft1; ---> ft1 is a foreign table WARNING: skipping "ft1" --- cannot vacuum non-tables or special system tables With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> > I think it will be useful to allow foreign tables to be VACUUMed if > the underlying FDW supports, currently VACUUM doesn't support foreign > tables, see [1]. Could you let us imagine more concretely how useful it will be? While TRUNCATE can be part of an application's data processingas alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the DBA andcan be done naturally locally on the server where the table resides. (The existing ANALYZE on FDW is an exception; it'suseful to also have data statistics locally.) > this may not be much useful for FDWs that connect to remote non-MVCC > databases where the concept of VACUUM may not apply, but for > postgres_fdw and others it might help. Can you show some examples of "others"? I believe we should be careful not to make the FDW interface a swamp for functionsthat are only convenient for PostgreSQL. How about adding a routine to the FDW interface that allows to execute an arbitrary command like the following? VACUUM willbe able to use this. PGresult *DoCommandPathThrough(ForeignTable *table, const char *command); Or, maybe it's more flexible to use ForeignServer instead of ForeignTable. Regards Takayuki Tsunakawa
On Fri, May 14, 2021 at 01:05:02AM +0000, tsunakawa.takay@fujitsu.com wrote: > Could you let us imagine more concretely how useful it will be? > While TRUNCATE can be part of an application's data processing as > alternative to DELETE, I think VACUUM is purely the data storage > maintenance that's performed by the DBA and can be done naturally > locally on the server where the table resides. (The existing > ANALYZE on FDW is an exception; it's useful to also have data > statistics locally.) The concept of vacuuming applies to PG because of its concepts behind MVCC. Thinking broader, in which aspect can that apply to FDWs in general? > How about adding a routine to the FDW interface that allows to > execute an arbitrary command like the following? VACUUM will be > able to use this. > > PGresult *DoCommandPathThrough(ForeignTable *table, const char *command); > > Or, maybe it's more flexible to use ForeignServer instead of ForeignTable. Being able to pass down to remote servers arbitrary command strings sounds like a recipy for security holes, IMO. -- Michael
Attachment
On Fri, May 14, 2021 at 6:35 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> > > I think it will be useful to allow foreign tables to be VACUUMed if > > the underlying FDW supports, currently VACUUM doesn't support foreign > > tables, see [1]. > > Could you let us imagine more concretely how useful it will be? While TRUNCATE can be part of an application's data processingas alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the DBA andcan be done naturally locally on the server where the table resides. (The existing ANALYZE on FDW is an exception; it'suseful to also have data statistics locally.) I agree that TRUNCATE is a user-visible command so it is good to send such a command to a remote server. But, sending ANALYZE and VACUUM to FDW can have a similar use case. I mean based on the current data changes/operation the DBA can decide whether it needs to generate the statistic or whether it needs to do garbage collection. I think ideally both these operations can be performed locally on the remote server but if we are sending ANALYZE to the remote server then IMHO there is some merit to sending VACUUM as well. Having said that, I think the purpose of ANALYZE can be generic across the databases that we want to update the statistic but VACUUM is different, it too much depends upon how the data is stored (what MVCC mechanism they are using) on the remote server and what is the vacuuming need for that database. So maybe garbage collection should be controlled locally by the DBA on that server. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, May 14, 2021 at 6:35 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> > > I think it will be useful to allow foreign tables to be VACUUMed if > > the underlying FDW supports, currently VACUUM doesn't support foreign > > tables, see [1]. > > Could you let us imagine more concretely how useful it will be? While TRUNCATE can be part of an application's data processingas alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the DBA andcan be done naturally locally on the server where the table resides. (The existing ANALYZE on FDW is an exception; it'suseful to also have data statistics locally.) This can be useful in situations like where there are many remote postgres servers that are connected to a single coordinator on which foreign tables are defined for each of the remote tables. In this case, the DBA (or whoever is responsible to do that job) doesn't have to figure out which remote server should be logged onto to perform the VACUUM. They can issue VACUUM command on the foreign table from the coordinator server. > > this may not be much useful for FDWs that connect to remote non-MVCC > > databases where the concept of VACUUM may not apply, but for > > postgres_fdw and others it might help. > > Can you show some examples of "others"? I believe we should be careful not to make the FDW interface a swamp for functionsthat are only convenient for PostgreSQL. There are other databases that have MVCC implemented for which the bloat clean up might be necessary at some point. They may not have the same terminology that postgres has for cleaning up the bloat. For instance, MySQL (instead of VACUUM they have OPTIMIZE TABLE command) which can be connected to postgres_fdw using supported fdw And see [1] for the databases that have MVCC support. I'm not sure if all of them have a FDW to connect to postgres. [1] https://dbdb.io/browse?concurrency-control=multi-version-concurrency-control-mvcc > How about adding a routine to the FDW interface that allows to execute an arbitrary command like the following? VACUUMwill be able to use this. > > PGresult *DoCommandPathThrough(ForeignTable *table, const char *command); > > Or, maybe it's more flexible to use ForeignServer instead of ForeignTable. I agree with Michael Paquier's response to this point that it can be an issue from a security standpoint. But we could have had such kind of a generic API for commands like TRUNCATE, ANALYZE, VACUUM etc. void ExecRemoteCommand(ForeignServer *server, const char *command, void *input_params, void *output_params); with the API knowing all the supported commands and erroring out on unsupported commands. Now, we have a separate API for each of the supported commands which looks actually cleaner. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Fri, May 14, 2021 at 11:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, May 14, 2021 at 6:35 AM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > > > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> > > > I think it will be useful to allow foreign tables to be VACUUMed if > > > the underlying FDW supports, currently VACUUM doesn't support foreign > > > tables, see [1]. > > > > Could you let us imagine more concretely how useful it will be? While TRUNCATE can be part of an application's dataprocessing as alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the DBAand can be done naturally locally on the server where the table resides. (The existing ANALYZE on FDW is an exception;it's useful to also have data statistics locally.) > > I agree that TRUNCATE is a user-visible command so it is good to send > such a command to a remote server. But, sending ANALYZE and VACUUM to > FDW can have a similar use case. I mean based on the current data > changes/operation the DBA can decide whether it needs to generate the > statistic or whether it needs to do garbage collection. I think > ideally both these operations can be performed locally on the remote > server but if we are sending ANALYZE to the remote server then IMHO > there is some merit to sending VACUUM as well. Having said that, I > think the purpose of ANALYZE can be generic across the databases that > we want to update the statistic but VACUUM is different, it too much > depends upon how the data is stored (what MVCC mechanism they are > using) on the remote server and what is the vacuuming need for that > database. So maybe garbage collection should be controlled locally by > the DBA on that server. Agree. Different MVCC databases can have different commands to clean up the bloat, their implementation of the vacuum's FdwRoutine can be implemented accordingly. postgres_fdw can prepare the "VACUUM rel;" command. Having said that, I don't think all the remote databases will have the same ANALYZE rel; or TRUNCATE rel; commands either. It's better left to the implementation of the FdwRoutine for a particular remote database. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Thu, 2021-05-13 at 09:44 +0530, Bharath Rupireddy wrote: > I think it will be useful to allow foreign tables to be VACUUMed if > the underlying FDW supports, currently VACUUM doesn't support foreign > tables, see [1]. In case of postgres_fdw, if foreign tables are > specified in the local VACUUM command, a per-server remote VACUUM > command can be prepared with the foreign tables that belong to the > same server and sent to the foreign server. This design is similar to > TRUNCATE on foreign tables committed as part of 8ff1c946. Although, > this may not be much useful for FDWs that connect to remote non-MVCC > databases where the concept of VACUUM may not apply, but for > postgres_fdw and others it might help. > > I would like to hear opinions from the hackers. If it's not > possible/feasible to do this, please let me know the reasons. Thanks. I see no value in this. First, it wouldn't make sense for anything except postgres_fdw, so I think it should not be part of the FDW API. If anything, it should mean that knowledge about postgres_fdw gets hardwired into VACUUM. But I don't think that is a smart idea either. Each database cluster is in charge of vacuuming its own tables, so the better approach would be to tune autovacuum on the remote side so that it does the right thing. Yours, Laurenz Albe
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> > This can be useful in situations like where there are many remote > postgres servers that are connected to a single coordinator on which > foreign tables are defined for each of the remote tables. In this > case, the DBA (or whoever is responsible to do that job) doesn't have > to figure out which remote server should be logged onto to perform the > VACUUM. They can issue VACUUM command on the foreign table from the > coordinator server. I thought the FDW mechanism was initially, and probably still is, designed to access data on other data sources that areoperated independently, mostly for data integration. Are you thinking of shared-nothing clustering that consist of tightlycoupled servers, because you're referring to a coordinator server? (Is EDB (re-)starting the sharding scale-out?) > There are other databases that have MVCC implemented for which the > bloat clean up might be necessary at some point. They may not have the > same terminology that postgres has for cleaning up the bloat. For > instance, MySQL (instead of VACUUM they have OPTIMIZE TABLE command) > which can be connected to postgres_fdw using supported fdw MySQL's OPTIMIZE TABLE looks like VACUUM FULL, not plain VACUUM, although I'm not completely sure. How would the various options map to the FDW interface, such as FREEZE, VERBOSE, vacuum_truncate, index_cleanup? Also, howwould the following GUC settings propagate to the foreign server? SET vacuum_freeze_table_age = 0; SET vacuum_freeze_min_age = 0; VACUUM mytable; I think people who want to run manual VACUUM will want to control VACUUM behavior. But I'm afraid VACUUM is too specificto Postgres to be not very good to be incorporated into the FDW interface. What's our stance toward the FDW interface? I've thought 1 so far. 1) We carefully choose FDW routines so that many other data sources can provide implementations for. We want to allow accessto various data sources through the frame of Postgres. 2) We don't care other data sources. We are happy if multiple Postgres instances can interoperate with each other. Otherdata source providers can choose to implement suitable routines when they can fit their implementations into the Postgresworld. Regards Takayuki Tsunakawa