Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Gsoc2012 idea, tablesample
Date
Msg-id 20120417134949.GR1267@tamriel.snowman.net
Whole thread Raw
In response to Gsoc2012 idea, tablesample  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Gsoc2012 idea, tablesample  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Gsoc2012 idea, tablesample  (Greg Stark <stark@mit.edu>)
Re: Gsoc2012 idea, tablesample  (Qi Huang <huangqiyx@hotmail.com>)
List pgsql-hackers
* Heikki Linnakangas (heikki.linnakangas@enterprisedb.com) wrote:
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as
> the API, instead of extra SQL syntax.

Err, I missed that, and don't particularly agree with it..  Is there a
serious issue with the grammar defined in the SQL standard?  The other
DBs which provide this- do they use the SQL grammar or something else?

I'm not sure that I particularly *like* the SQL grammar, but if we're
going to implement this, we should really do it 'right'.

> 2. It's not very useful if it's just a dummy replacement for "WHERE
> random() < ?". It has to be more advanced than that. Quality of the
> sample is important, as is performance. There was also an
> interesting idea of on implementing monetary unit sampling.

In reviewing this, I got the impression (perhaps mistaken..), that
different sampling methods are defined by the SQL standard and that it
would simply be us to implement them according to what the standard
requires.

> I think this would be a useful project if those two points are taken
> care of.

Doing it 'right' certainly isn't going to be simply taking what Neil did
and updating it, and I understand Tom's concerns about having this be
more than a hack on seqscan, so I'm a bit nervous that this would turn
into something bigger than a GSoC project.

> Another idea that Robert Haas suggested was to add support doing a
> TID scan for a query like "WHERE ctid<  '(501,1)'". That's not
> enough work for GSoC project on its own, but could certainly be a
> part of it.

I don't think Robert's suggestion would be part of a 'tablesample'
patch.  Perhaps a completely different project which was geared towards
allowing hidden columns to be used in various ways in a WHERE clause..
Of course, we'd need someone to actually define that; I don't think
someone relatively new to the project is going to know what experienced
hackers want to do with system columns.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Gsoc2012 idea, tablesample
Next
From: Alex Shulgin
Date:
Subject: Re: Last gasp