TABLESAMPLE doesn't actually satisfy the SQL spec, does it? - Mailing list pgsql-hackers

From Tom Lane
Subject TABLESAMPLE doesn't actually satisfy the SQL spec, does it?
Date
Msg-id 9871.1436716927@sss.pgh.pa.us
Whole thread Raw
Responses Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?
Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?
List pgsql-hackers
As best I can tell (evidence below), the SQL standard requires that if a
single query reads a table with a TABLESAMPLE clause multiple times (say,
because it's on the inside of a nestloop), then the exact same set of
sampled rows are returned each time.  The BERNOULLI code, at least, fails
to achieve this.  Suppose that a concurrent session adds some tuples to
a page in between scans.  When that page is visited the next time, the
sampler RNG will be advanced more times than it was previously, because
it's called a number of times that depends on PageGetMaxOffsetNumber().
So the RNG state will be different at the end of the page, and then
the set of tuples selected from subsequent pages will be completely
different from what it was before.  This will happen even if none of the
added rows are committed yet, let alone visible to the query's snapshot;
which makes it a failure of serializability even if you don't believe
the argument that it violates the requirements for TABLESAMPLE.

Now, as for the language-lawyering aspect of it, I read in SQL2011
7.6 <table reference> general rule 5a:
 iv) Case:
   1) If <sample method> specifies BERNOULLI, then the result of TF is a     table containing approximately (N*S/100)
rowsof RT. The probability     of a row of RT being included in result of TF is S/100. Further,     whether a given row
ofRT is included in result of TF is independent     of whether other rows of RT are included in result of TF.
 
   2) Otherwise, result of TF is a table containing approximately     (N*S/100) rows of RT. The probability of a row of
RTbeing included in     result of TF is S/100.
 
 v) If TF contains outer references, then a table with identical rows is   generated every time TF is evaluated with a
givenset of values for   outer references.
 

This seems to me to clearly require that a fixed set of samples (ie, a
"table", whether real or virtual) is selected during any query.  Note that
this requirement is the same whether or not REPEATABLE is used.

Also, I found a description of IBM DB2's implementation of this feature,
http://www.almaden.ibm.com/cs/people/peterh/idugjbig.pdf
and in that they say:
 Semantically, sampling of a table occurs before any other query processing, such as applying predicates or performing
joins.One can envision the original tables referenced in a query being initially replaced by temporary “reduced” tables
containingsampled rows, and then normal query processing commencing on the reduced tables. (For performance reasons,
actualprocessing may not occur in exactly this manner.) It follows, for example, that repeated accesses of a sampled
tablewithin the same query, such as in a nested-loop join or a correlated subquery, will see the same sample of that
tablewithin a single execution of that query.
 

They do mention that alterations to a table may result in *successive*
queries giving different results, even when using REPEATABLE.  But I do
not think it's okay for this to happen within a single query.  That would
mean that vagaries of plan choice, eg nestloop vs some other join method,
would produce inconsistent results.

A possible way around this problem is to redefine the sampling rule so
that it is not history-dependent but depends only on the tuple TIDs.
For instance, one could hash the TID of a candidate tuple, xor that with
a hash of the seed being used for the current query, and then select the
tuple if (hash/MAXINT) < P.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Fixes for a couple of resource leaks
Next
From: Noah Misch
Date:
Subject: TRANSFORM modules vs. AIX