Re: TABLESAMPLE patch is really in pretty sad shape - Mailing list pgsql-hackers

From Tom Lane
Subject Re: TABLESAMPLE patch is really in pretty sad shape
Date
Msg-id 15899.1437609699@sss.pgh.pa.us
Whole thread Raw
In response to Re: TABLESAMPLE patch is really in pretty sad shape  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: TABLESAMPLE patch is really in pretty sad shape
List pgsql-hackers
I wrote:
> We could alternatively provide two scan-initialization callbacks,
> one that analyzes the parameters before we do heap_beginscan,
> and another that can do additional setup afterwards.  Actually,
> that second call would really not be meaningfully different from
> the ReScan call, so another solution would be to just automatically
> invoke ReScan after we've created the HeapScanDesc.  TSMs could work
> around not having this by complicating their NextBlock function a bit
> (so that it would do some initialization on first call) but perhaps
> it would be easier to have the additional init call.

Actually, there's another reason why this has to be redesigned: evaluating
the parameter expressions of TABLESAMPLE during ExecInitSampleScan is
not OK.  Consider this:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select count(*) from tenk1 tablesample bernoulli
(pct))ss;pct | count 
 
-----+-------  0 |     0100 |     0
(2 rows)

Surely the second execution of the subquery should've given me 10000.
It doesn't because the TABLESAMPLE parameters aren't recomputed during a
rescan.  Even if you're minded to claim that that's all right, this is
definitely not acceptable:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select * from tenk1 tablesample bernoulli (pct))
ss;
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Failed.

That one's crashing because "pct" is a Var that refers to an outer scan
tuple that doesn't exist yet.  I'm not real sure how come the case with
an aggregate manages not to crash, actually.

This needs to work more like LIMIT, which doesn't try to compute the
limit parameters until the first fetch.  So what we need is an Init
function that does very darn little indeed (maybe we don't even need
it at all), and then a ParamInspect function that is called at first fetch
or during a ReScan, and that one is the one that gets to look at the
evaluated parameter values.

If we wanted to let the method inspect the HeapScanDesc during setup
it would need still a third callback.  I'm not exactly sure if that's
worth the trouble.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: could not truncate directory "pg_subtrans": apparent wraparound
Next
From: Michael Paquier
Date:
Subject: Re: ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard