Re: postgres_fdw: using TABLESAMPLE to collect remote sample - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date
Msg-id 3e8cca01-f102-d4b9-abe9-ed7b5e938323@enterprisedb.com
Whole thread Raw
In response to Re: postgres_fdw: using TABLESAMPLE to collect remote sample  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: postgres_fdw: using TABLESAMPLE to collect remote sample
List pgsql-hackers

On 1/5/23 22:05, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> The second patch adds the relkind check, so that we only issue
>> TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure
>> we actually need it - the example presented earlier was foreign table
>> pointing to a sequence. But that actually works fine even without this
>> patch, because fore sequences we have reltuples=1, which disables
>> sampling due to the check mentioned above (because targrows >= 1).
> 
> Seems pretty accidental still.
> 
>> The other issue with this patch is that it seems wrong to check the
>> relkind value locally - instead we should check this on the remote
>> server, because that's where we'll run TABLESAMPLE. Currently there are
>> no differences between versions, but what if we implement TABLESAMPLE
>> for another relkind in the future? Then we'll either fail to use
>> sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind,
>> depending on which of the servers has newer version.
> 
> We don't have a way to do that, and I don't think we need one.  The
> worst case is that we don't try to use TABLESAMPLE when the remote
> server is a newer version that would allow it.  If we do extend the
> set of supported relkinds, we'd need a version-specific test in
> postgres_fdw so that we don't wrongly use TABLESAMPLE with an older
> remote server ... but that's hardly complicated, or a new concept.
> 
> On the other hand, if we let the code stand, the worst case is that
> ANALYZE fails because we try to apply TABLESAMPLE in an unsupported
> case, presumably with some remote relkind that doesn't exist today.
> I think that's clearly worse than not exploiting TABLESAMPLE
> although we could (with some other new remote relkind).
> 

OK

> As far as the patch details go: I'd write 0001 as
> 
> +            Assert(sample_frac >= 0.0 && sample_frac <= 1.0);
> 
> The way you have it seems a bit contorted.

Yeah, that's certainly cleaner.

> As for 0002, personally
> I'd rename the affected functions to reflect their expanded duties,
> and they *definitely* require adjustments to their header comments.
> Functionally it looks fine though.
> 

No argument about the header comments, ofc - I just haven't done that in
the WIP patch. As for the renaming, any suggestions for the new names?
The patch tweaks two functions in a way that affects what they return:

1) deparseAnalyzeTuplesSql - adds relkind to the "reltuples" SQL

2) postgresCountTuplesForForeignTable - adds can_sample flag

There are no callers outside postgresAcquireSampleRowsFunc, so what
about renaming them like this?

  deparseAnalyzeTuplesSql
    -> deparseAnalyzeInfoSql

  postgresCountTuplesForForeignTable
    -> postgresGetAnalyzeInfoForForeignTable


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Cygwin cleanup
Next
From: Matthias van de Meent
Date:
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE