TABLESAMPLE doesn't actually satisfy the SQL spec, does it? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | TABLESAMPLE doesn't actually satisfy the SQL spec, does it? |
Date | |
Msg-id | 9871.1436716927@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?
Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it? |
List | pgsql-hackers |
As best I can tell (evidence below), the SQL standard requires that if a single query reads a table with a TABLESAMPLE clause multiple times (say, because it's on the inside of a nestloop), then the exact same set of sampled rows are returned each time. The BERNOULLI code, at least, fails to achieve this. Suppose that a concurrent session adds some tuples to a page in between scans. When that page is visited the next time, the sampler RNG will be advanced more times than it was previously, because it's called a number of times that depends on PageGetMaxOffsetNumber(). So the RNG state will be different at the end of the page, and then the set of tuples selected from subsequent pages will be completely different from what it was before. This will happen even if none of the added rows are committed yet, let alone visible to the query's snapshot; which makes it a failure of serializability even if you don't believe the argument that it violates the requirements for TABLESAMPLE. Now, as for the language-lawyering aspect of it, I read in SQL2011 7.6 <table reference> general rule 5a: iv) Case: 1) If <sample method> specifies BERNOULLI, then the result of TF is a table containing approximately (N*S/100) rowsof RT. The probability of a row of RT being included in result of TF is S/100. Further, whether a given row ofRT is included in result of TF is independent of whether other rows of RT are included in result of TF. 2) Otherwise, result of TF is a table containing approximately (N*S/100) rows of RT. The probability of a row of RTbeing included in result of TF is S/100. v) If TF contains outer references, then a table with identical rows is generated every time TF is evaluated with a givenset of values for outer references. This seems to me to clearly require that a fixed set of samples (ie, a "table", whether real or virtual) is selected during any query. Note that this requirement is the same whether or not REPEATABLE is used. Also, I found a description of IBM DB2's implementation of this feature, http://www.almaden.ibm.com/cs/people/peterh/idugjbig.pdf and in that they say: Semantically, sampling of a table occurs before any other query processing, such as applying predicates or performing joins.One can envision the original tables referenced in a query being initially replaced by temporary “reduced” tables containingsampled rows, and then normal query processing commencing on the reduced tables. (For performance reasons, actualprocessing may not occur in exactly this manner.) It follows, for example, that repeated accesses of a sampled tablewithin the same query, such as in a nested-loop join or a correlated subquery, will see the same sample of that tablewithin a single execution of that query. They do mention that alterations to a table may result in *successive* queries giving different results, even when using REPEATABLE. But I do not think it's okay for this to happen within a single query. That would mean that vagaries of plan choice, eg nestloop vs some other join method, would produce inconsistent results. A possible way around this problem is to redefine the sampling rule so that it is not history-dependent but depends only on the tuple TIDs. For instance, one could hash the TID of a candidate tuple, xor that with a hash of the seed being used for the current query, and then select the tuple if (hash/MAXINT) < P. regards, tom lane
pgsql-hackers by date: