Re: TABLESAMPLE patch is really in pretty sad shape - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: TABLESAMPLE patch is really in pretty sad shape |
Date | |
Msg-id | 8989.1436794757@sss.pgh.pa.us Whole thread Raw |
In response to | Re: TABLESAMPLE patch is really in pretty sad shape (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: TABLESAMPLE patch is really in pretty sad shape
Re: TABLESAMPLE patch is really in pretty sad shape Re: TABLESAMPLE patch is really in pretty sad shape |
List | pgsql-hackers |
Michael Paquier <michael.paquier@gmail.com> writes: > Regarding the fact that those two contrib modules can be part of a > -contrib package and could be installed, nuking those two extensions > from the tree and preventing the creating of custom tablesample > methods looks like a correct course of things to do for 9.5. TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. I'll send a separate message about high-level issues, but as far as code details go, I started to do some detailed code review last night and only got through contrib/tsm_system_rows/tsm_system_rows.c before deciding it was hopeless. Let's have a look at my notes: * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group Obsolete copyright date. * IDENTIFICATION* contrib/tsm_system_rows_rowlimit/tsm_system_rows.c Wrong filename. (For the moment, I'll refrain from any value judgements about the overall adequacy or quality of the comments in this patch, and just point out obvious errors that should have been caught in review.) typedef struct { SamplerRandomState randstate; uint32 seed; /* random seed */ BlockNumber nblocks; /* numberof block in relation */ int32 ntuples; /* number of tuples to return */ int32 donetuples; /* tuples already returned */ OffsetNumber lt; /* last tuple returned from current block */ BlockNumberstep; /* step size */ BlockNumber lb; /* last block visited */ BlockNumber doneblocks; /* number of already returned blocks */ } SystemSamplerData; This same typedef name is defined in three different places in the patch (tablesample/system.c, tsm_system_rows.c, tsm_system_time.c). While that might not amount to a bug, it's sure a recipe for confusion, especially since the struct definitions are all different. Datum tsm_system_rows_init(PG_FUNCTION_ARGS) { TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0); uint32 seed = PG_GETARG_UINT32(1); int32 ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2); This is rather curious coding. Why should this function check only one of its arguments for nullness? If it needs to defend against any of them being null, it really needs to check all. But in point of fact, this function is declared STRICT, which means it's a violation of the function call protocol if the core code ever passes a null to it, and so this test ought to be dead code. A similar pattern of ARGISNULL checks in declared-strict functions exists in all the tablesample modules, not just this one, showing that this is an overall design error not just a thinko here. My inclination would be to make the core code enforce non-nullness of all tablesample arguments so as to make it unnecessary to check strictness of the tsm*init functions, but in any case it is Not Okay to just pass nulls willy-nilly to strict C functions. Also, I find this coding pretty sloppy even without the strictness angle, because the net effect of this way of dealing with nulls is that an argument-must-not-be-null complaint is reported as "out of range", which is both confusing and the wrong ERRCODE. if (ntuples < 1) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("invalidsample size"), errhint("Sample size must be positive integer value."))); I don't find this to be good error message style. The secondary comment is not a "hint", it's an ironclad statement of what you did wrong, so if we wanted to phrase it like this it should be an errdetail not errhint. But the whole thing is overly cute anyway because there is no reason at all not to just say what we mean in the primary error message, eg ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("sample size must be greater than zero"))); While we're on the subject, what's the reason for disallowing a sample size of zero? Seems like a reasonable edge case. /* All blocks have been read, we're done */ if (sampler->doneblocks > sampler->nblocks || sampler->donetuples>= sampler->ntuples) PG_RETURN_UINT32(InvalidBlockNumber); Okay, I lied, I *am* going to complain about this comment. Comments that do not accurately describe the code they're attached to are worse than useless. /** Cleanup method.*/ Datum tsm_system_rows_end(PG_FUNCTION_ARGS) { TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0); pfree(tsdesc->tsmdata); PG_RETURN_VOID(); } This cleanup method is a waste of code space. There is no need to pfree individual allocations at query execution end. limitnode = linitial(args); limitnode = estimate_expression_value(root, limitnode); if (IsA(limitnode, RelabelType)) limitnode = (Node *) ((RelabelType *) limitnode)->arg; if (IsA(limitnode, Const)) ntuples = DatumGetInt32(((Const *) limitnode)->constvalue); else { /* Defaultntuples if the estimation didn't return Const. */ ntuples = 1000; } That RelabelType check is a waste of code space (estimate_expression_value would have collapsed RelabelType-atop-Const into just a Const). On the other hand, the failure to check for constisnull is a bug, and the failure to sanity-check the value (ie clamp zero or negative or larger-than-table-size to a valid rowcount estimate) is another bug, one that could easily cause a planner failure. static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate) { /* Pick random starting number, with some limits on what it can be. */ uint32 r = (uint32) sampler_random_fract(randstate)* n / 2 + n / 4, t; r is not terribly random; in fact, it's always exactly n/4, because careless parenthesization here causes the float8 result of sampler_random_fract() to be truncated to zero immediately on return. In any case, what's the rationale for not letting r cover the whole range from 1 to n-1? /* * This should only take 2 or 3 iterations as the probability of 2 numbers * being relatively prime is ~61%. */ while ((t = gcd(r, n)) > 1) { CHECK_FOR_INTERRUPTS(); r /= t; } Actually, that's an infinite loop if r is initially zero, which will always happen (given the previous bug) if n is initially 2 or 3. Also, because this coding decreases r whenever it's not immediately relatively-prime to n, I don't think that what we're getting is especially "random"; it's certainly not going to be uniformly distributed, but will have a bias towards smaller values. Perhaps a better technique would be to select an all-new random r each time the gcd test fails. In short, I'd do something more like uint32 r; /* Safety check to avoid infinite loop or zero result for small n. */ if (n <= 1) return 1; /* * This should only take 2 or 3 iterations as the probability of 2 numbers * being relatively prime is ~61%; butjust in case, we'll include a * CHECK_FOR_INTERRUPTS in the loop. */ do { CHECK_FOR_INTERRUPTS(); r = (uint32) (sampler_random_fract(randstate) * n); } while (r == 0 || gcd(r, n) > 1); Note however that this coding would result in an unpredictable number of iterations of the RNG, which might not be such a good thing if we're trying to achieve repeatability. It doesn't matter in the context of this module since the RNG is not used after initialization, but it would matter if we then went on to do Bernoulli-style sampling. (Possibly that could be dodged by reinitializing the RNG after the initialization steps.) As I said, I haven't specifically tried to read the tablesample patch other than this one contrib file. However, the bits of it that I've come across while doing other work have almost invariably made me squint and think "that needs to be rewritten". I think there is every reason to assume that all the rest of it is about as buggy as this file is. If it's to stay, it *must* get a line-by-line review from some committer-level person; and I think there are other more important things for us to be doing for 9.5. regards, tom lane
pgsql-hackers by date: