Re: postgres_fdw: using TABLESAMPLE to collect remote sample - Mailing list pgsql-hackers

From Tom Lane
Subject Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date
Msg-id 3013665.1673024332@sss.pgh.pa.us
Whole thread Raw
In response to Re: postgres_fdw: using TABLESAMPLE to collect remote sample  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: postgres_fdw: using TABLESAMPLE to collect remote sample
List pgsql-hackers
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> The one difference is that I realized the relkind check does not
> actually say we can't do sampling - it just means we can't use
> TABLESAMPLE to do it. We could still use "random()" ...

> Furthermore, I don't think we should silently disable sampling when the
> user explicitly requests TABLESAMPLE by specifying bernoulli/system for
> the table - IMHO it's less surprising to just fail in that case.

Agreed on both points.  This patch looks good to me.

> Of course, all relkinds that don't support TABLESAMPLE currently have
> reltuples value that will disable sampling anyway (e.g. views have -1).
> So we won't actually fallback to random() anyway, because we can't
> calculate the sample fraction.
> That's a bit annoying for foreign tables pointing at a view, which is a
> more likely use case than table pointing at a sequence.

Right, that's a case worth being concerned about.

> But I realized we could actually still do "random()" sampling:
>     SELECT * FROM t ORDER BY random() LIMIT $X;

Hmm, interesting idea, but it would totally bollix our correlation
estimates.  Not sure that those are worth anything for remote views,
but still...

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Expand character set for ltree labels
Next
From: Michael Banck
Date:
Subject: Re: Support load balancing in libpq