Re: postgres_fdw: using TABLESAMPLE to collect remote sample - Mailing list pgsql-hackers

From Tom Lane
Subject Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date
Msg-id 782558.1658169910@sss.pgh.pa.us
Whole thread Raw
In response to Re: postgres_fdw: using TABLESAMPLE to collect remote sample  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: postgres_fdw: using TABLESAMPLE to collect remote sample
List pgsql-hackers
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> Thanks. I'll switch this to "needs review" now.

OK, I looked through this, and attach some review suggestions in the
form of a delta patch.  (0001 below is your two patches merged, 0002
is my delta.)  A lot of the delta is comment-smithing, but not all.

After reflection I think that what you've got, ie use reltuples but
don't try to sample if reltuples <= 0, is just fine.  The remote
would only have reltuples <= 0 in a never-analyzed table, which
shouldn't be a situation that persists for long unless the table
is tiny.  Also, if reltuples is in error, the way to bet is that
it's too small thanks to the table having been enlarged.  But
an error in that direction doesn't hurt us: we'll overestimate
the required sample_frac and pull back more data than we need,
but we'll still end up with a valid sample of the right size.
So I doubt it's worth the complication to try to correct based
on relpages etc.  (Note that any such correction would almost
certainly end in increasing our estimate of reltuples.  But
it's safer to have an underestimate than an overestimate.)

I messed around with the sample_frac choosing logic slightly,
to make it skip pointless calculations if we decide right off
the bat to disable sampling.  That way we don't need to worry
about avoiding zero divide, nor do we have to wonder if any
of the later calculations could misbehave.

I left your logic about "disable if saving fewer than 100 rows"
alone, but I have to wonder if using an absolute threshold rather
than a relative one is well-conceived.  Sampling at a rate of
99.9 percent seems pretty pointless, but this code is perfectly
capable of accepting that if reltuples is big enough.  So
personally I'd do that more like

    if (sample_frac > 0.95)
        method = ANALYZE_SAMPLE_OFF;

which is simpler and would also eliminate the need for the previous
range-clamp step.  I'm not sure what the right cutoff is, but
your "100 tuples" constant is just as arbitrary.

I rearranged the docs patch too.  Where you had it, analyze_sampling
was between fdw_startup_cost/fdw_tuple_cost and the following para
discussing them, which didn't seem to me to flow well at all.  I ended
up putting analyze_sampling in its own separate list.  You could almost
make a case for giving it its own <sect3>, but I concluded that was
probably overkill.

One thing I'm not happy about, but did not touch here, is the expense
of the test cases you added.  On my machine, that adds a full 10% to
the already excessively long runtime of postgres_fdw.sql --- and I
do not think it's buying us anything.  It is not this module's job
to test whether bernoulli sampling works on partitioned tables.
I think you should just add enough to make sure we exercise the
relevant code paths in postgres_fdw itself.

With these issues addressed, I think this'd be committable.

            regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a9766f9734..ea2139fbfa 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2369,14 +2369,56 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
     appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
 }

+/*
+ * Construct SELECT statement to acquire a number of rows of a relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+    StringInfoData relname;
+
+    /* We'll need the remote relation name as a literal. */
+    initStringInfo(&relname);
+    deparseRelation(&relname, rel);
+
+    appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+    deparseStringLiteral(buf, relname.data);
+    appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
 /*
  * Construct SELECT statement to acquire sample rows of given relation.
  *
  * SELECT command is appended to buf, and list of columns retrieved
  * is returned to *retrieved_attrs.
+ *
+ * XXX We allow customizing the sampling method, but we only support methods
+ * we can decide based on server version. Allowing custom TSM modules (for
+ * example tsm_system_rows) might be useful, but it would require detecting
+ * which extensions are installed, to allow automatic fall-back. Moreover, the
+ * methods use different parameters (not sampling rate). So we don't do this
+ * for now, leaving it for future improvements.
+ *
+ * XXX Using remote random() to sample rows has advantages & disadvantages.
+ * The advantages are that this works on all PostgreSQL versions (unlike
+ * TABLESAMPLE), and that it does the sampling on the remote side (unlike
+ * the old approach, which transfers everything and then discards most data).
+ * We could also do "ORDER BY random() LIMIT x", which would always pick
+ * the expected number of rows, but it requires sorting so it's a bit more
+ * expensive.
+ *
+ * The disadvantage is that we still have to read all rows and evaluate the
+ * random(), while TABLESAMPLE skips most of the pages entirely.
+ *
+ * XXX What if we need only a subset of columns, e.g. ANALYZE t(a,b)?
  */
 void
-deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+deparseAnalyzeSql(StringInfo buf, Relation rel,
+                  PgFdwSamplingMethod sample_method, double sample_frac,
+                  List **retrieved_attrs)
 {
     Oid            relid = RelationGetRelid(rel);
     TupleDesc    tupdesc = RelationGetDescr(rel);
@@ -2424,10 +2466,35 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
         appendStringInfoString(buf, "NULL");

     /*
-     * Construct FROM clause
+     * Construct FROM clause, and perhaps WHERE clause too, depending on the
+     * selected sampling method.
      */
     appendStringInfoString(buf, " FROM ");
     deparseRelation(buf, rel);
+
+    switch (sample_method)
+    {
+        case ANALYZE_SAMPLE_OFF:
+            /* nothing to do here */
+            break;
+
+        case ANALYZE_SAMPLE_RANDOM:
+            appendStringInfo(buf, " WHERE pg_catalog.random() < %f", sample_frac);
+            break;
+
+        case ANALYZE_SAMPLE_SYSTEM:
+            appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+            break;
+
+        case ANALYZE_SAMPLE_BERNOULLI:
+            appendStringInfo(buf, " TABLESAMPLE BERNOULLI(%f)", (100.0 * sample_frac));
+            break;
+
+        case ANALYZE_SAMPLE_AUTO:
+            /* should have been resolved into actual method */
+            elog(ERROR, "unexpected sampling method");
+            break;
+    }
 }

 /*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ebf9ea3598..a37d824d44 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9575,7 +9575,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr,
port,options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout,
sslmode,sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version,
ssl_max_protocol_version,gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost,
fdw_tuple_cost,extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit,
keep_connections
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr,
port,options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout,
sslmode,sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version,
ssl_max_protocol_version,gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost,
fdw_tuple_cost,extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit,
keep_connections,analyze_sampling 
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -11339,3 +11339,45 @@ SELECT * FROM prem2;

 ALTER SERVER loopback OPTIONS (DROP parallel_commit);
 ALTER SERVER loopback2 OPTIONS (DROP parallel_commit);
+-- ===================================================================
+-- test for ANALYZE
+-- ===================================================================
+CREATE TABLE analyze_rtable1 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_rtable2 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_ptable (id int, a text, b bigint) PARTITION BY HASH(id);
+CREATE FOREIGN TABLE analyze_ftable__p1 PARTITION OF analyze_ptable
+                                    FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+       SERVER loopback OPTIONS (table_name 'analyze_rtable1');
+CREATE FOREIGN TABLE analyze_ftable__p2 PARTITION OF analyze_ptable
+                                    FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+       SERVER loopback OPTIONS (table_name 'analyze_rtable2');
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(1,5000) x);
+-- analyze the 'local' tables to update relpages/reltuples
+ANALYZE analyze_rtable1, analyze_rtable2;
+-- now analyze the remote tables directly - this expects to scan everything,
+-- so should not do any sampling
+ANALYZE analyze_ftable__p1;
+ANALYZE analyze_ftable__p2;
+-- now analyze the parent - this should scan everything too, because 30k
+-- sample requires everything to be scanned
+ANALYZE analyze_ptable;
+-- now lower the target to 10, which requires only 3k rows sample, so about
+-- 1500 rows from each partition, so sampling will kick in, by default with
+-- the 'bernoulli' tablesample method
+SET default_statistics_target = 10;
+ANALYZE analyze_ptable;
+-- now alter the method for remote server to 'system'
+ALTER SERVER loopback OPTIONS (analyze_sampling 'system');
+ANALYZE analyze_ptable;
+-- now alter the method for remote table to 'random', to not use tablesample
+-- but the 'legacy' sampling, and disable sampling for the other partition
+ALTER FOREIGN TABLE analyze_ftable__p1 OPTIONS (ADD analyze_sampling 'random');
+ALTER FOREIGN TABLE analyze_ftable__p2 OPTIONS (ADD analyze_sampling 'off');
+ANALYZE analyze_ptable;
+-- now add more data, so that each partition exceeds the statistics target
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(5001, 10000) x);
+ANALYZE analyze_rtable1, analyze_rtable2;
+ANALYZE analyze_ptable;
+-- cleanup
+DROP FOREIGN TABLE analyze_ftable__p1, analyze_ftable__p2;
+DROP TABLE analyze_ptable, analyze_rtable1, analyze_rtable2;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index cd2ef234d6..99f5099929 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -205,6 +205,23 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
                          errmsg("sslcert and sslkey are superuser-only"),
                          errhint("User mappings with the sslcert or sslkey options set may only be created or modified
bythe superuser"))); 
         }
+        else if (strcmp(def->defname, "analyze_sampling") == 0)
+        {
+            char       *value;
+
+            value = defGetString(def);
+
+            /* we recognize off/auto/random/system/bernoulli */
+            if (strcmp(value, "off") != 0 &&
+                strcmp(value, "auto") != 0 &&
+                strcmp(value, "random") != 0 &&
+                strcmp(value, "system") != 0 &&
+                strcmp(value, "bernoulli") != 0)
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("invalid value for string option \"%s\": %s",
+                                def->defname, value)));
+        }
     }

     PG_RETURN_VOID();
@@ -252,6 +269,10 @@ InitPgFdwOptions(void)
         {"keep_connections", ForeignServerRelationId, false},
         {"password_required", UserMappingRelationId, false},

+        /* sampling is available on both server and table */
+        {"analyze_sampling", ForeignServerRelationId, false},
+        {"analyze_sampling", ForeignTableRelationId, false},
+
         /*
          * sslcert and sslkey are in fact libpq options, but we repeat them
          * here to allow them to appear in both foreign server context (when
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f3b93954ee..e2df32830f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4969,6 +4969,68 @@ postgresAnalyzeForeignTable(Relation relation,
     return true;
 }

+/*
+ * postgresCountTuplesForForeignTable
+ *        Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * Note: It's unclear how accurate reltuples is, maybe size that using
+ * relpages and simple assumptions (1 tuples per page, ...)? Using
+ * tsm_system_rows wold make this somewhat unnecessary.
+ */
+static double
+postgresCountTuplesForForeignTable(Relation relation)
+{
+    ForeignTable *table;
+    UserMapping *user;
+    PGconn       *conn;
+    StringInfoData sql;
+    PGresult   *volatile res = NULL;
+    double        reltuples = -1;
+
+    /*
+     * Now we have to get the number of pages.  It's annoying that the ANALYZE
+     * API requires us to return that now, because it forces some duplication
+     * of effort between this routine and postgresAcquireSampleRowsFunc.  But
+     * it's probably not worth redefining that API at this point.
+     */
+
+    /*
+     * Get the connection to use.  We do the remote access as the table's
+     * owner, even if the ANALYZE was started by some other user.
+     */
+    table = GetForeignTable(RelationGetRelid(relation));
+    user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
+    conn = GetConnection(user, false, NULL);
+
+    /*
+     * Construct command to get page count for relation.
+     */
+    initStringInfo(&sql);
+    deparseAnalyzeTuplesSql(&sql, relation);
+
+    /* In what follows, do not risk leaking any PGresults. */
+    PG_TRY();
+    {
+        res = pgfdw_exec_query(conn, sql.data, NULL);
+        if (PQresultStatus(res) != PGRES_TUPLES_OK)
+            pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+        if (PQntuples(res) != 1 || PQnfields(res) != 1)
+            elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+        reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+    }
+    PG_FINALLY();
+    {
+        if (res)
+            PQclear(res);
+    }
+    PG_END_TRY();
+
+    ReleaseConnection(conn);
+
+    return reltuples;
+}
+
 /*
  * Acquire a random sample of rows from foreign table managed by postgres_fdw.
  *
@@ -4999,6 +5061,13 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
     unsigned int cursor_number;
     StringInfoData sql;
     PGresult   *volatile res = NULL;
+    ListCell   *lc;
+    int            server_version_num;
+
+    /* analyze sampling enabled by default, if available */
+    PgFdwSamplingMethod    method = ANALYZE_SAMPLE_AUTO;
+    double        sample_frac = -1.0;
+    double        reltuples;

     /* Initialize workspace state */
     astate.rel = relation;
@@ -5026,20 +5095,139 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
     user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
     conn = GetConnection(user, false, NULL);

+    /* We'll need server version, so fetch it now. */
+    server_version_num = PQserverVersion(conn);
+
+    /*
+     * Should we try do the sampling for analyze on the remote server?
+     */
+    foreach(lc, server->options)
+    {
+        DefElem    *def = (DefElem *) lfirst(lc);
+
+        if (strcmp(def->defname, "analyze_sampling") == 0)
+        {
+            char *value = defGetString(def);
+
+            if (strcmp(value, "off") == 0)
+                method = ANALYZE_SAMPLE_OFF;
+            else if (strcmp(value, "auto") == 0)
+                method = ANALYZE_SAMPLE_AUTO;
+            else if (strcmp(value, "random") == 0)
+                method = ANALYZE_SAMPLE_RANDOM;
+            else if (strcmp(value, "system") == 0)
+                method = ANALYZE_SAMPLE_SYSTEM;
+            else if (strcmp(value, "bernoulli") == 0)
+                method = ANALYZE_SAMPLE_BERNOULLI;
+
+            break;
+        }
+    }
+
+    foreach(lc, table->options)
+    {
+        DefElem    *def = (DefElem *) lfirst(lc);
+
+        if (strcmp(def->defname, "analyze_sampling") == 0)
+        {
+            char *value = defGetString(def);
+
+            if (strcmp(value, "off") == 0)
+                method = ANALYZE_SAMPLE_OFF;
+            else if (strcmp(value, "auto") == 0)
+                method = ANALYZE_SAMPLE_AUTO;
+            else if (strcmp(value, "random") == 0)
+                method = ANALYZE_SAMPLE_RANDOM;
+            else if (strcmp(value, "system") == 0)
+                method = ANALYZE_SAMPLE_SYSTEM;
+            else if (strcmp(value, "bernoulli") == 0)
+                method = ANALYZE_SAMPLE_BERNOULLI;
+
+            break;
+        }
+    }
+
+    /*
+     * Error-out if explicitly required one of the TABLESAMPLE methods, but
+     * the server does not support it.
+     */
+    if ((server_version_num < 95000) &&
+        (method == ANALYZE_SAMPLE_SYSTEM ||
+         method == ANALYZE_SAMPLE_BERNOULLI))
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("remote server does not support TABLESAMPLE feature")));
+
+    /*
+     * For "auto" method, pick the one we believe is best. For servers with
+     * TABLESAMPLE support we pick BERNOULLI, for old servers we fall-back to
+     * random() to at least reduce network transfer.
+     */
+    if (method == ANALYZE_SAMPLE_AUTO)
+    {
+        if (server_version_num < 95000)
+            method = ANALYZE_SAMPLE_RANDOM;
+        else
+            method = ANALYZE_SAMPLE_BERNOULLI;
+    }
+
+    /*
+     * If we've decided to do remote sampling, calculate the sampling rate. We
+     * need to get the number of tuples from the remote server, so we skip the
+     * network round-trip if not needed.
+     */
+    if (method != ANALYZE_SAMPLE_OFF)
+    {
+        reltuples = postgresCountTuplesForForeignTable(relation);
+
+        /*
+         * No rows or we expect to sample everything - disable sampling after
+         * all (and make sure we don't divide by 0 in sample_frac calculation.)
+         */
+        if ((reltuples <= 0) || (targrows >= reltuples))
+            method = ANALYZE_SAMPLE_OFF;
+
+        /* Make sure we don't divide by 0 when calculating the rate. */
+        sample_frac = targrows / Max(1.0, reltuples);
+
+        /*
+         * Let's sample a bit more (10%), we'll reduce the sample locally.
+         *
+         * XXX Not sure this is really necessary. If we don't trust the remote
+         * sampling to sample the right number of rows, we should not use it.
+         */
+        sample_frac *= 1.1;
+
+        /*
+         * Ensure the sampling rate is between 0.0 and 1.0, even after the
+         * 10% adjustment above.
+         */
+        sample_frac = Min(1.0, Max(0.0, sample_frac));
+
+        /*
+         * If we expect the sampling to reduce very few rows, just disable it
+         * and read the whole remote table. We decide based on the number of
+         * rows we expect to "eliminate" by sampling. If saving than 100 rows,
+         * we disable sampling.
+         */
+        if (reltuples * (1 - sample_frac) < 100.0)
+            method = ANALYZE_SAMPLE_OFF;
+    }
+
     /*
      * Construct cursor that retrieves whole rows from remote.
      */
     cursor_number = GetCursorNumber(conn);
     initStringInfo(&sql);
     appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
-    deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+    deparseAnalyzeSql(&sql, relation, method, sample_frac, &astate.retrieved_attrs);

     /* In what follows, do not risk leaking any PGresults. */
     PG_TRY();
     {
         char        fetch_sql[64];
         int            fetch_size;
-        ListCell   *lc;

         res = pgfdw_exec_query(conn, sql.data, NULL);
         if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -5126,8 +5314,15 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
     /* We assume that we have no dead tuple. */
     *totaldeadrows = 0.0;

-    /* We've retrieved all living tuples from foreign server. */
-    *totalrows = astate.samplerows;
+    /*
+     * Without ANALYZE sampling, we've retrieved all living tuples from foreign
+     * server, so just use that. Otherwise we have the reltuples estimate we
+     * got from the remote side.
+     */
+    if (method == ANALYZE_SAMPLE_OFF)
+        *totalrows = astate.samplerows;
+    else
+        *totalrows = reltuples;

     /*
      * Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 21f2b20ce8..1c2a6045a9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -134,6 +134,18 @@ typedef struct PgFdwConnState
     AsyncRequest *pendingAreq;    /* pending async request */
 } PgFdwConnState;

+/*
+ * Method used by ANALYZE to sample remote rows.
+ */
+typedef enum PgFdwSamplingMethod
+{
+    ANALYZE_SAMPLE_OFF,            /* no remote sampling */
+    ANALYZE_SAMPLE_AUTO,        /* choose by server version */
+    ANALYZE_SAMPLE_RANDOM,        /* remote random() */
+    ANALYZE_SAMPLE_SYSTEM,        /* TABLESAMPLE system */
+    ANALYZE_SAMPLE_BERNOULLI    /* TABLESAMPLE bernoulli */
+} PgFdwSamplingMethod;
+
 /* in postgres_fdw.c */
 extern int    set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
@@ -211,7 +223,10 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
                                    List *returningList,
                                    List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
+                              PgFdwSamplingMethod sample_method,
+                              double sample_frac,
                               List **retrieved_attrs);
 extern void deparseTruncateSql(StringInfo buf,
                                List *rels,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index b7817c5a41..d9bd800aa2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3655,3 +3655,57 @@ SELECT * FROM prem2;

 ALTER SERVER loopback OPTIONS (DROP parallel_commit);
 ALTER SERVER loopback2 OPTIONS (DROP parallel_commit);
+
+
+-- ===================================================================
+-- test for ANALYZE
+-- ===================================================================
+CREATE TABLE analyze_rtable1 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_rtable2 (id int primary key, a text, b bigint);
+
+CREATE TABLE analyze_ptable (id int, a text, b bigint) PARTITION BY HASH(id);
+CREATE FOREIGN TABLE analyze_ftable__p1 PARTITION OF analyze_ptable
+                                    FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+       SERVER loopback OPTIONS (table_name 'analyze_rtable1');
+CREATE FOREIGN TABLE analyze_ftable__p2 PARTITION OF analyze_ptable
+                                    FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+       SERVER loopback OPTIONS (table_name 'analyze_rtable2');
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(1,5000) x);
+
+-- analyze the 'local' tables to update relpages/reltuples
+ANALYZE analyze_rtable1, analyze_rtable2;
+
+-- now analyze the remote tables directly - this expects to scan everything,
+-- so should not do any sampling
+ANALYZE analyze_ftable__p1;
+ANALYZE analyze_ftable__p2;
+
+-- now analyze the parent - this should scan everything too, because 30k
+-- sample requires everything to be scanned
+ANALYZE analyze_ptable;
+
+-- now lower the target to 10, which requires only 3k rows sample, so about
+-- 1500 rows from each partition, so sampling will kick in, by default with
+-- the 'bernoulli' tablesample method
+SET default_statistics_target = 10;
+ANALYZE analyze_ptable;
+
+-- now alter the method for remote server to 'system'
+ALTER SERVER loopback OPTIONS (analyze_sampling 'system');
+ANALYZE analyze_ptable;
+
+-- now alter the method for remote table to 'random', to not use tablesample
+-- but the 'legacy' sampling, and disable sampling for the other partition
+ALTER FOREIGN TABLE analyze_ftable__p1 OPTIONS (ADD analyze_sampling 'random');
+ALTER FOREIGN TABLE analyze_ftable__p2 OPTIONS (ADD analyze_sampling 'off');
+ANALYZE analyze_ptable;
+
+-- now add more data, so that each partition exceeds the statistics target
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(5001, 10000) x);
+
+ANALYZE analyze_rtable1, analyze_rtable2;
+ANALYZE analyze_ptable;
+
+-- cleanup
+DROP FOREIGN TABLE analyze_ftable__p1, analyze_ftable__p2;
+DROP TABLE analyze_ptable, analyze_rtable1, analyze_rtable2;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bfd344cdc0..d44c8cdd71 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -305,6 +305,28 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>

+    <varlistentry>
+     <term><literal>analyze_sampling</literal> (<type>text</type>)</term>
+     <listitem>
+      <para>
+       This option determines if <command>ANALYZE</command> on a foreign
+       table samples the data on the remote node, or reads and transfers
+       all data and performs the sampling locally. The supported values
+       are <literal>off</literal>, <literal>random</literal>,
+       <literal>system</literal>, <literal>bernoulli</literal> and
+       <literal>auto</literal>. <literal>off</literal> disables remote
+       sampling, so all data are transferred and sampled locally.
+       <literal>random</literal> performs remote sampling using
+       <literal>random()</literal> function, while <literal>system</literal>
+       and <literal>bernoulli</literal> rely on built-in <literal>TABLESAMPLE</literal>
+       methods. <literal>random</literal> works on all server versions,
+       while <literal>TABLESAMPLE</literal> is supported only since 9.5.
+       <literal>auto</literal> checks the server version and picks the
+       best remote sampling method automatically.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>

    <para>
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ea2139fbfa..48bce9744d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2370,10 +2370,11 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 }

 /*
- * Construct SELECT statement to acquire a number of rows of a relation.
+ * Construct SELECT statement to acquire the number of rows of a relation.
  *
- * Note: Maybe this should compare relpages and current relation size
- * and adjust reltuples accordingly?
+ * Note: we just return the remote server's reltuples value, which might
+ * be off a good deal, but it doesn't seem worth working harder.  See
+ * comments in postgresAcquireSampleRowsFunc.
  */
 void
 deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e2df32830f..8c2a0c6ca1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4972,10 +4972,6 @@ postgresAnalyzeForeignTable(Relation relation,
 /*
  * postgresCountTuplesForForeignTable
  *        Count tuples in foreign table (just get pg_class.reltuples).
- *
- * Note: It's unclear how accurate reltuples is, maybe size that using
- * relpages and simple assumptions (1 tuples per page, ...)? Using
- * tsm_system_rows wold make this somewhat unnecessary.
  */
 static double
 postgresCountTuplesForForeignTable(Relation relation)
@@ -4985,14 +4981,7 @@ postgresCountTuplesForForeignTable(Relation relation)
     PGconn       *conn;
     StringInfoData sql;
     PGresult   *volatile res = NULL;
-    double        reltuples = -1;
-
-    /*
-     * Now we have to get the number of pages.  It's annoying that the ANALYZE
-     * API requires us to return that now, because it forces some duplication
-     * of effort between this routine and postgresAcquireSampleRowsFunc.  But
-     * it's probably not worth redefining that API at this point.
-     */
+    volatile double reltuples = -1;

     /*
      * Get the connection to use.  We do the remote access as the table's
@@ -5016,7 +5005,7 @@ postgresCountTuplesForForeignTable(Relation relation)
             pgfdw_report_error(ERROR, res, conn, false, sql.data);

         if (PQntuples(res) != 1 || PQnfields(res) != 1)
-            elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+            elog(ERROR, "unexpected result from deparseAnalyzeTuplesSql query");
         reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
     }
     PG_FINALLY();
@@ -5034,8 +5023,6 @@ postgresCountTuplesForForeignTable(Relation relation)
 /*
  * Acquire a random sample of rows from foreign table managed by postgres_fdw.
  *
- * We fetch the whole table from the remote side and pick out some sample rows.
- *
  * Selected rows are returned in the caller-allocated array rows[],
  * which must have at least targrows entries.
  * The actual number of rows selected is returned as the function result.
@@ -5058,16 +5045,14 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
     ForeignServer *server;
     UserMapping *user;
     PGconn       *conn;
+    int            server_version_num;
+    PgFdwSamplingMethod method = ANALYZE_SAMPLE_AUTO;    /* auto is default */
+    double        sample_frac = -1.0;
+    double        reltuples;
     unsigned int cursor_number;
     StringInfoData sql;
     PGresult   *volatile res = NULL;
     ListCell   *lc;
-    int            server_version_num;
-
-    /* analyze sampling enabled by default, if available */
-    PgFdwSamplingMethod    method = ANALYZE_SAMPLE_AUTO;
-    double        sample_frac = -1.0;
-    double        reltuples;

     /* Initialize workspace state */
     astate.rel = relation;
@@ -5099,7 +5084,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
     server_version_num = PQserverVersion(conn);

     /*
-     * Should we try do the sampling for analyze on the remote server?
+     * What sampling method should we use?
      */
     foreach(lc, server->options)
     {
@@ -5107,7 +5092,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,

         if (strcmp(def->defname, "analyze_sampling") == 0)
         {
-            char *value = defGetString(def);
+            char       *value = defGetString(def);

             if (strcmp(value, "off") == 0)
                 method = ANALYZE_SAMPLE_OFF;
@@ -5130,7 +5115,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,

         if (strcmp(def->defname, "analyze_sampling") == 0)
         {
-            char *value = defGetString(def);
+            char       *value = defGetString(def);

             if (strcmp(value, "off") == 0)
                 method = ANALYZE_SAMPLE_OFF;
@@ -5173,7 +5158,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,

     /*
      * If we've decided to do remote sampling, calculate the sampling rate. We
-     * need to get the number of tuples from the remote server, so we skip the
+     * need to get the number of tuples from the remote server, but skip that
      * network round-trip if not needed.
      */
     if (method != ANALYZE_SAMPLE_OFF)
@@ -5181,37 +5166,42 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
         reltuples = postgresCountTuplesForForeignTable(relation);

         /*
-         * No rows or we expect to sample everything - disable sampling after
-         * all (and make sure we don't divide by 0 in sample_frac calculation.)
+         * Remote's reltuples could be 0 or -1 if the table has never been
+         * vacuumed/analyzed.  In that case, or if we expect to need all the
+         * rows, disable sampling after all.
          */
         if ((reltuples <= 0) || (targrows >= reltuples))
             method = ANALYZE_SAMPLE_OFF;
+        else
+        {
+            sample_frac = targrows / reltuples;

-        /* Make sure we don't divide by 0 when calculating the rate. */
-        sample_frac = targrows / Max(1.0, reltuples);
-
-        /*
-         * Let's sample a bit more (10%), we'll reduce the sample locally.
-         *
-         * XXX Not sure this is really necessary. If we don't trust the remote
-         * sampling to sample the right number of rows, we should not use it.
-         */
-        sample_frac *= 1.1;
+            /*
+             * Let's sample a bit more (10%) to compensate for any small
+             * inaccuracy in the reltuples value.  We'll reduce the sample
+             * locally if it's too large.  Of course, this won't fix a big
+             * error in reltuples --- but big errors in it are most commonly
+             * underestimates due to the table having grown since the last
+             * ANALYZE.  That will lead to overestimating the required
+             * sample_frac, which is fine.
+             */
+            sample_frac *= 1.1;

-        /*
-         * Ensure the sampling rate is between 0.0 and 1.0, even after the
-         * 10% adjustment above.
-         */
-        sample_frac = Min(1.0, Max(0.0, sample_frac));
+            /*
+             * Ensure the sampling rate is between 0.0 and 1.0, even after the
+             * 10% adjustment above.  (Clamping to 0.0 is just paranoia.)
+             */
+            sample_frac = Min(1.0, Max(0.0, sample_frac));

-        /*
-         * If we expect the sampling to reduce very few rows, just disable it
-         * and read the whole remote table. We decide based on the number of
-         * rows we expect to "eliminate" by sampling. If saving than 100 rows,
-         * we disable sampling.
-         */
-        if (reltuples * (1 - sample_frac) < 100.0)
-            method = ANALYZE_SAMPLE_OFF;
+            /*
+             * If we expect sampling to remove very few rows, just disable it
+             * and read the whole remote table; the overhead won't be worth
+             * it.  We decide based on the number of rows we expect sampling
+             * to eliminate.  If saving fewer than 100 rows, disable sampling.
+             */
+            if (reltuples * (1 - sample_frac) < 100.0)
+                method = ANALYZE_SAMPLE_OFF;
+        }
     }

     /*
@@ -5315,9 +5305,9 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
     *totaldeadrows = 0.0;

     /*
-     * Without ANALYZE sampling, we've retrieved all living tuples from foreign
-     * server, so just use that. Otherwise we have the reltuples estimate we
-     * got from the remote side.
+     * Without sampling, we've retrieved all living tuples from foreign
+     * server, so report that as totalrows.  Otherwise use the reltuples
+     * estimate we got from the remote side.
      */
     if (method == ANALYZE_SAMPLE_OFF)
         *totalrows = astate.samplerows;
@@ -5330,7 +5320,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
     ereport(elevel,
             (errmsg("\"%s\": table contains %.0f rows, %d rows in sample",
                     RelationGetRelationName(relation),
-                    astate.samplerows, astate.numrows)));
+                    *totalrows, astate.numrows)));

     return astate.numrows;
 }
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index d44c8cdd71..d681d1675e 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -305,28 +305,6 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>

-    <varlistentry>
-     <term><literal>analyze_sampling</literal> (<type>text</type>)</term>
-     <listitem>
-      <para>
-       This option determines if <command>ANALYZE</command> on a foreign
-       table samples the data on the remote node, or reads and transfers
-       all data and performs the sampling locally. The supported values
-       are <literal>off</literal>, <literal>random</literal>,
-       <literal>system</literal>, <literal>bernoulli</literal> and
-       <literal>auto</literal>. <literal>off</literal> disables remote
-       sampling, so all data are transferred and sampled locally.
-       <literal>random</literal> performs remote sampling using
-       <literal>random()</literal> function, while <literal>system</literal>
-       and <literal>bernoulli</literal> rely on built-in <literal>TABLESAMPLE</literal>
-       methods. <literal>random</literal> works on all server versions,
-       while <literal>TABLESAMPLE</literal> is supported only since 9.5.
-       <literal>auto</literal> checks the server version and picks the
-       best remote sampling method automatically.
-      </para>
-     </listitem>
-    </varlistentry>
-
    </variablelist>

    <para>
@@ -348,6 +326,41 @@ OPTIONS (ADD password_required 'false');
     frequently updated, the local statistics will soon be obsolete.
    </para>

+   <para>
+    The following option controls how such an <command>ANALYZE</command>
+    operation behaves:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>analyze_sampling</literal> (<type>text</type>)</term>
+     <listitem>
+      <para>
+       This option, which can be specified for a foreign table or a foreign
+       server, determines if <command>ANALYZE</command> on a foreign table
+       samples the data on the remote side, or reads and transfers all data
+       and performs the sampling locally. The supported values
+       are <literal>off</literal>, <literal>random</literal>,
+       <literal>system</literal>, <literal>bernoulli</literal>
+       and <literal>auto</literal>. <literal>off</literal> disables remote
+       sampling, so all data are transferred and sampled locally.
+       <literal>random</literal> performs remote sampling using the
+       <literal>random()</literal> function to choose returned rows,
+       while <literal>system</literal> and <literal>bernoulli</literal> rely
+       on the built-in <literal>TABLESAMPLE</literal> methods of those
+       names. <literal>random</literal> works on all remote server versions,
+       while <literal>TABLESAMPLE</literal> is supported only since 9.5.
+       <literal>auto</literal> (the default) picks the recommended sampling
+       method automatically; currently it means
+       either <literal>bernoulli</literal> or <literal>random</literal>
+       depending on the remote server version.
+      </para>
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+
   </sect3>

   <sect3>

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: possibility to read dumped table's name from file