Thread: Support for VACUUMing Foreign Tables

Support for VACUUMing Foreign Tables

From
Bharath Rupireddy
Date:
Hi,

I think it will be useful to allow foreign tables to be VACUUMed if
the underlying FDW supports, currently VACUUM doesn't support foreign
tables, see [1]. In case of postgres_fdw, if foreign tables are
specified in the local VACUUM command, a per-server remote VACUUM
command can be prepared with the foreign tables that belong to the
same server and sent to the foreign server. This design is similar to
TRUNCATE on foreign tables committed as part of 8ff1c946. Although,
this may not be much useful for FDWs that connect to remote non-MVCC
databases where the concept of VACUUM may not apply, but for
postgres_fdw and others it might help.

I would like to hear opinions from the hackers. If it's not
possible/feasible to do this, please let me know the reasons. Thanks.

[1] postgres=# vacuum ft1;   ---> ft1 is a foreign table
WARNING:  skipping "ft1" --- cannot vacuum non-tables or special system tables

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



RE: Support for VACUUMing Foreign Tables

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> I think it will be useful to allow foreign tables to be VACUUMed if
> the underlying FDW supports, currently VACUUM doesn't support foreign
> tables, see [1].

Could you let us imagine more concretely how useful it will be?  While TRUNCATE can be part of an application's data
processingas alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the DBA
andcan be done naturally locally on the server where the table resides.  (The existing ANALYZE on FDW is an exception;
it'suseful to also have data statistics locally.)
 


> this may not be much useful for FDWs that connect to remote non-MVCC
> databases where the concept of VACUUM may not apply, but for
> postgres_fdw and others it might help.

Can you show some examples of "others"?  I believe we should be careful not to make the FDW interface a swamp for
functionsthat are only convenient for PostgreSQL.
 

How about adding a routine to the FDW interface that allows to execute an arbitrary command like the following?  VACUUM
willbe able to use this.
 

    PGresult *DoCommandPathThrough(ForeignTable *table, const char *command);

Or, maybe it's more flexible to use ForeignServer instead of ForeignTable.


Regards
Takayuki Tsunakawa


Re: Support for VACUUMing Foreign Tables

From
Michael Paquier
Date:
On Fri, May 14, 2021 at 01:05:02AM +0000, tsunakawa.takay@fujitsu.com wrote:
> Could you let us imagine more concretely how useful it will be?
> While TRUNCATE can be part of an application's data processing as
> alternative to DELETE, I think VACUUM is purely the data storage
> maintenance that's performed by the DBA and can be done naturally
> locally on the server where the table resides.  (The existing
> ANALYZE on FDW is an exception; it's useful to also have data
> statistics locally.)

The concept of vacuuming applies to PG because of its concepts behind
MVCC.  Thinking broader, in which aspect can that apply to FDWs in
general?

> How about adding a routine to the FDW interface that allows to
> execute an arbitrary command like the following?  VACUUM will be
> able to use this.
>
>     PGresult *DoCommandPathThrough(ForeignTable *table, const char *command);
>
> Or, maybe it's more flexible to use ForeignServer instead of ForeignTable.

Being able to pass down to remote servers arbitrary command strings
sounds like a recipy for security holes, IMO.
--
Michael

Attachment

Re: Support for VACUUMing Foreign Tables

From
Dilip Kumar
Date:
On Fri, May 14, 2021 at 6:35 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > I think it will be useful to allow foreign tables to be VACUUMed if
> > the underlying FDW supports, currently VACUUM doesn't support foreign
> > tables, see [1].
>
> Could you let us imagine more concretely how useful it will be?  While TRUNCATE can be part of an application's data
processingas alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the DBA
andcan be done naturally locally on the server where the table resides.  (The existing ANALYZE on FDW is an exception;
it'suseful to also have data statistics locally.) 

I agree that TRUNCATE is a user-visible command so it is good to send
such a command to a remote server.  But, sending ANALYZE and VACUUM to
FDW can have a similar use case.  I mean based on the current data
changes/operation the DBA can decide whether it needs to generate the
statistic or whether it needs to do garbage collection.  I think
ideally both these operations can be performed locally on the remote
server but if we are sending ANALYZE to the remote server then IMHO
there is some merit to sending VACUUM as well.  Having said that, I
think the purpose of ANALYZE can be generic across the databases that
we want to update the statistic but VACUUM is different, it too much
depends upon how the data is stored (what MVCC mechanism they are
using) on the remote server and what is the vacuuming need for that
database.  So maybe garbage collection should be controlled locally by
the DBA on that server.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Support for VACUUMing Foreign Tables

From
Bharath Rupireddy
Date:
On Fri, May 14, 2021 at 6:35 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > I think it will be useful to allow foreign tables to be VACUUMed if
> > the underlying FDW supports, currently VACUUM doesn't support foreign
> > tables, see [1].
>
> Could you let us imagine more concretely how useful it will be?  While TRUNCATE can be part of an application's data
processingas alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the DBA
andcan be done naturally locally on the server where the table resides.  (The existing ANALYZE on FDW is an exception;
it'suseful to also have data statistics locally.) 

This can be useful in situations like where there are many remote
postgres servers that are connected to a single coordinator on which
foreign tables are defined for each of the remote tables. In this
case, the DBA (or whoever is responsible to do that job) doesn't have
to figure out which remote server should be logged onto to perform the
VACUUM. They can issue VACUUM command on the foreign table from the
coordinator server.

> > this may not be much useful for FDWs that connect to remote non-MVCC
> > databases where the concept of VACUUM may not apply, but for
> > postgres_fdw and others it might help.
>
> Can you show some examples of "others"?  I believe we should be careful not to make the FDW interface a swamp for
functionsthat are only convenient for PostgreSQL. 

There are other databases that have MVCC implemented for which the
bloat clean up might be necessary at some point. They may not have the
same terminology that postgres has for cleaning up the bloat. For
instance, MySQL (instead of VACUUM they have OPTIMIZE TABLE command)
which can be connected to postgres_fdw using supported fdw

And see [1] for the databases that have MVCC support. I'm not sure if
all of them have a FDW to connect to postgres.

[1] https://dbdb.io/browse?concurrency-control=multi-version-concurrency-control-mvcc

> How about adding a routine to the FDW interface that allows to execute an arbitrary command like the following?
VACUUMwill be able to use this. 
>
>     PGresult *DoCommandPathThrough(ForeignTable *table, const char *command);
>
> Or, maybe it's more flexible to use ForeignServer instead of ForeignTable.

I agree with Michael Paquier's response to this point that it can be
an issue from a security standpoint. But we could have had such kind
of a generic API for commands like TRUNCATE, ANALYZE, VACUUM etc. void
ExecRemoteCommand(ForeignServer *server, const char *command, void
*input_params, void *output_params); with the API knowing all the
supported commands and erroring out on unsupported commands. Now, we
have a separate API for each of the supported commands which looks
actually cleaner.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Support for VACUUMing Foreign Tables

From
Bharath Rupireddy
Date:
On Fri, May 14, 2021 at 11:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, May 14, 2021 at 6:35 AM tsunakawa.takay@fujitsu.com
> <tsunakawa.takay@fujitsu.com> wrote:
> >
> > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > > I think it will be useful to allow foreign tables to be VACUUMed if
> > > the underlying FDW supports, currently VACUUM doesn't support foreign
> > > tables, see [1].
> >
> > Could you let us imagine more concretely how useful it will be?  While TRUNCATE can be part of an application's
dataprocessing as alternative to DELETE, I think VACUUM is purely the data storage maintenance that's performed by the
DBAand can be done naturally locally on the server where the table resides.  (The existing ANALYZE on FDW is an
exception;it's useful to also have data statistics locally.) 
>
> I agree that TRUNCATE is a user-visible command so it is good to send
> such a command to a remote server.  But, sending ANALYZE and VACUUM to
> FDW can have a similar use case.  I mean based on the current data
> changes/operation the DBA can decide whether it needs to generate the
> statistic or whether it needs to do garbage collection.  I think
> ideally both these operations can be performed locally on the remote
> server but if we are sending ANALYZE to the remote server then IMHO
> there is some merit to sending VACUUM as well.  Having said that, I
> think the purpose of ANALYZE can be generic across the databases that
> we want to update the statistic but VACUUM is different, it too much
> depends upon how the data is stored (what MVCC mechanism they are
> using) on the remote server and what is the vacuuming need for that
> database.  So maybe garbage collection should be controlled locally by
> the DBA on that server.

Agree. Different MVCC databases can have different commands to clean
up the bloat, their implementation of the vacuum's FdwRoutine can be
implemented accordingly. postgres_fdw can prepare the "VACUUM rel;"
command. Having said that, I don't think all the remote databases will
have the same ANALYZE rel; or TRUNCATE rel; commands either. It's
better left to the implementation of the FdwRoutine for a particular
remote database.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Support for VACUUMing Foreign Tables

From
Laurenz Albe
Date:
On Thu, 2021-05-13 at 09:44 +0530, Bharath Rupireddy wrote:
> I think it will be useful to allow foreign tables to be VACUUMed if
> the underlying FDW supports, currently VACUUM doesn't support foreign
> tables, see [1]. In case of postgres_fdw, if foreign tables are
> specified in the local VACUUM command, a per-server remote VACUUM
> command can be prepared with the foreign tables that belong to the
> same server and sent to the foreign server. This design is similar to
> TRUNCATE on foreign tables committed as part of 8ff1c946. Although,
> this may not be much useful for FDWs that connect to remote non-MVCC
> databases where the concept of VACUUM may not apply, but for
> postgres_fdw and others it might help.
> 
> I would like to hear opinions from the hackers. If it's not
> possible/feasible to do this, please let me know the reasons. Thanks.

I see no value in this.

First, it wouldn't make sense for anything except postgres_fdw, so
I think it should not be part of the FDW API.  If anything, it should
mean that knowledge about postgres_fdw gets hardwired into VACUUM.

But I don't think that is a smart idea either.  Each database cluster
is in charge of vacuuming its own tables, so the better approach would
be to tune autovacuum on the remote side so that it does the right thing.

Yours,
Laurenz Albe




RE: Support for VACUUMing Foreign Tables

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> This can be useful in situations like where there are many remote
> postgres servers that are connected to a single coordinator on which
> foreign tables are defined for each of the remote tables. In this
> case, the DBA (or whoever is responsible to do that job) doesn't have
> to figure out which remote server should be logged onto to perform the
> VACUUM. They can issue VACUUM command on the foreign table from the
> coordinator server.

I thought the FDW mechanism was initially, and probably still is, designed to access data on other data sources that
areoperated independently, mostly for data integration.  Are you thinking of shared-nothing clustering that consist of
tightlycoupled servers, because you're referring to a coordinator server?  (Is EDB (re-)starting the sharding
scale-out?)


> There are other databases that have MVCC implemented for which the
> bloat clean up might be necessary at some point. They may not have the
> same terminology that postgres has for cleaning up the bloat. For
> instance, MySQL (instead of VACUUM they have OPTIMIZE TABLE command)
> which can be connected to postgres_fdw using supported fdw

MySQL's OPTIMIZE TABLE looks like VACUUM FULL, not plain VACUUM, although I'm not completely sure.

How would the various options map to the FDW interface, such as FREEZE, VERBOSE, vacuum_truncate, index_cleanup?  Also,
howwould the following GUC settings propagate to the foreign server?
 

SET vacuum_freeze_table_age = 0;
SET vacuum_freeze_min_age = 0;
VACUUM mytable;

I think people who want to run manual VACUUM will want to control VACUUM behavior.  But I'm afraid VACUUM is too
specificto Postgres to be not very good to be incorporated into the FDW interface.
 


What's our stance toward the FDW interface?  I've thought 1 so far.

1) We carefully choose FDW routines so that many other data sources can provide implementations for.  We want to allow
accessto various data sources through the frame of Postgres.
 

2) We don't care other data sources.  We are happy if multiple Postgres instances can interoperate with each other.
Otherdata source providers can choose to implement suitable routines when they can fit their implementations into the
Postgresworld.
 



Regards
Takayuki Tsunakawa