Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers

From Sandro Santilli
Subject Re: Gsoc2012 idea, tablesample
Date
Msg-id 20120423133742.GO26868@gnash
Whole thread Raw
In response to Re: Gsoc2012 idea, tablesample  (Qi Huang <huangqiyx@hotmail.com>)
Responses Re: Gsoc2012 idea, tablesample  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
On Sat, Apr 21, 2012 at 02:28:52PM +0800, Qi Huang wrote:
> 
> Hi, Heikki
...
> > Another idea that Robert Haas suggested was to add support doing a TID 
> > scan for a query like "WHERE ctid< '(501,1)'". That's not enough work 
> > for GSoC project on its own, but could certainly be a part of it.
> 
> the first one and the last one are still not clear. 

The last one was the TID scan on filters like ctid < '(501,1)'.
TID "scans" are the fastest access method as they directly access
explicitly referenced addresses. Starting from this observation a sampling
function may select random pages and tuples within pages and directly
access them, optimizing accesses by grouping tuples within the same
page so to fetch them all togheter.

This is what the ANALYZE command already does when providing samples
for the type analyzers.

Unfortunately it looks like at SQL level only the equality operator triggers
a TID scan, so things like "WHERE ctid < '(501,1)'" won't be as fast as
fetching all visible tuples in the first 501 pages.

I think that's what Heikki was referring about.

I'd love to see enhanced CTID operators, to fetch all visible tuples in a page
using a tidscan.  Something like: WHERE ctid =~ '(501,*)' or a ctidrange.

--strk; 
 ,------o-.  |   __/  |    Delivering high quality PostGIS 2.0 ! |  / 2.0 |    http://strk.keybit.net -
http://vizzuality.com`-o------'
 



pgsql-hackers by date:

Previous
From: Marc Cousin
Date:
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework
Next
From: Krzysztof Nienartowicz
Date:
Subject: Namespace of array of user defined types is confused by the parser in insert?