Re: TABLESAMPLE patch - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: TABLESAMPLE patch
Date
Msg-id CANP8+jJTY8NV5HoOcgp_jFcw6+NtfcnYwDwcZn+4vYm0gSj8zw@mail.gmail.com
Whole thread Raw
In response to Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
List pgsql-hackers
On 17 April 2015 at 14:54, Petr Jelinek <petr@2ndquadrant.com> wrote:
 
I agree that DDL patch is not that important to get in (and I made it last patch in the series now), which does not mean somebody can't write the extension with new tablesample method.


In any case attached another version.

Changes:
- I addressed the comments from Michael

- I moved the interface between nodeSampleScan and the actual sampling method to it's own .c file and added TableSampleDesc struct for it. This makes the interface cleaner and will make it more straightforward to extend for subqueries in the future (nothing really changes just some functions were renamed and moved). Amit suggested this at some point and I thought it's not needed at that time but with the possible future extension to subquery support I changed my mind.

- renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion with sync scan

- reworded some things and more typo fixes

- Added two sample contrib modules demonstrating row limited and time limited sampling. I am using linear probing for both of those as the builtin block sampling is not well suited for row limited or time limited sampling. For row limited I originally thought of using the Vitter's reservoir sampling but that does not fit well with the executor as it needs to keep the reservoir of all the output tuples in memory which would have horrible memory requirements if the limit was high. The linear probing seems to work quite well for the use case of "give me 500 random rows from table".

For me, the DDL changes are something we can leave out for now, as a way to minimize the change surface.

I'm now moving to final review of patches 1-5. Michael requested patch 1 to be split out. If I commit, I will keep that split, but I am considering all of this as a single patchset. I've already spent a few days reviewing, so I don't expect this will take much longer.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Moving on to close the current CF 2015-02
Next
From: Simon Riggs
Date:
Subject: Re: Replication identifiers, take 4