Thread: ctid ranges

ctid ranges

From
Thomas Munro
Date:
Hi

In 9.1.3, this is fast, handled with a tid scan using the physical address:

SELECT ... FROM ... WHERE ctid = ...;

This is slow, handled with a seq scan (as are various rephrasing with
<, <=, etc):

SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;

Is there a way to retrieve the rows in a physical range quickly?

(I realise this is a pretty odd thing to want to do... I was
experimenting with a crackpot idea for storing some data in a known
physical order and finding the beginning of ends ranges by binary
chop, instead of using a btree.)

Thanks
Thomas Munro

Re: ctid ranges

From
Jeff Davis
Date:
On Fri, 2012-06-08 at 22:27 +0100, Thomas Munro wrote:
> This is slow, handled with a seq scan (as are various rephrasing with
> <, <=, etc):
>
> SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;
>
> Is there a way to retrieve the rows in a physical range quickly?

Interesting idea. However, as far as I know, there is no such support.

Regards,
    Jeff Davis




Re: ctid ranges

From
Merlin Moncure
Date:
On Mon, Jun 11, 2012 at 7:57 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2012-06-08 at 22:27 +0100, Thomas Munro wrote:
>> This is slow, handled with a seq scan (as are various rephrasing with
>> <, <=, etc):
>>
>> SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;
>>
>> Is there a way to retrieve the rows in a physical range quickly?
>
> Interesting idea. However, as far as I know, there is no such support.

yeah -- and I think it's a great thing to want to be able to do.  it
could be used in parallelizing tricks for example: divide up a table
into N approximately equal parts and hand each one off to a work
thread.

merlin

Re: ctid ranges

From
Bruce Momjian
Date:
On Wed, Jun 13, 2012 at 03:15:14PM -0500, Merlin Moncure wrote:
> On Mon, Jun 11, 2012 at 7:57 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Fri, 2012-06-08 at 22:27 +0100, Thomas Munro wrote:
> >> This is slow, handled with a seq scan (as are various rephrasing with
> >> <, <=, etc):
> >>
> >> SELECT ... FROM ... WHERE ctid BETWEEN ... AND ...;
> >>
> >> Is there a way to retrieve the rows in a physical range quickly?
> >
> > Interesting idea. However, as far as I know, there is no such support.
>
> yeah -- and I think it's a great thing to want to be able to do.  it
> could be used in parallelizing tricks for example: divide up a table
> into N approximately equal parts and hand each one off to a work
> thread.

Can we add this as a TODO?  It would basically be adding
less/greater-than comparisons for the 'tid' data type.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: ctid ranges

From
Merlin Moncure
Date:
On Wed, Jun 13, 2012 at 3:18 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Jun 13, 2012 at 03:15:14PM -0500, Merlin Moncure wrote:
>> yeah -- and I think it's a great thing to want to be able to do.  it
>> could be used in parallelizing tricks for example: divide up a table
>> into N approximately equal parts and hand each one off to a work
>> thread.
>
> Can we add this as a TODO?  It would basically be adding
> less/greater-than comparisons for the 'tid' data type.

IMNSHO, it's a no-brainer for the todo (but I think it's more
complicated than adding some comparisons -- which are working now):

postgres=# explain select ctid from foo where ctid >= '(3786,67)'::tid limit 1;
                            QUERY PLAN
------------------------------------------------------------------
 Limit  (cost=0.00..0.05 rows=1 width=6)
   ->  Seq Scan on foo  (cost=0.00..16422.00 rows=333333 width=6)
         Filter: (ctid >= '(3786,67)'::tid)
(3 rows)

merlin

Re: ctid ranges

From
Bruce Momjian
Date:
On Wed, Jun 13, 2012 at 03:21:17PM -0500, Merlin Moncure wrote:
> On Wed, Jun 13, 2012 at 3:18 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Wed, Jun 13, 2012 at 03:15:14PM -0500, Merlin Moncure wrote:
> >> yeah -- and I think it's a great thing to want to be able to do.  it
> >> could be used in parallelizing tricks for example: divide up a table
> >> into N approximately equal parts and hand each one off to a work
> >> thread.
> >
> > Can we add this as a TODO?  It would basically be adding
> > less/greater-than comparisons for the 'tid' data type.
>
> IMNSHO, it's a no-brainer for the todo (but I think it's more
> complicated than adding some comparisons -- which are working now):
>
> postgres=# explain select ctid from foo where ctid >= '(3786,67)'::tid limit 1;
>                             QUERY PLAN
> ------------------------------------------------------------------
>  Limit  (cost=0.00..0.05 rows=1 width=6)
>    ->  Seq Scan on foo  (cost=0.00..16422.00 rows=333333 width=6)
>          Filter: (ctid >= '(3786,67)'::tid)
> (3 rows)

I see.  Seems we have to add index smarts to those comparisons.  That
might be complicated.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: ctid ranges

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Jun 13, 2012 at 03:21:17PM -0500, Merlin Moncure wrote:
>> IMNSHO, it's a no-brainer for the todo (but I think it's more
>> complicated than adding some comparisons -- which are working now):

> I see.  Seems we have to add index smarts to those comparisons.  That
> might be complicated.

Uh, the whole point of a TID scan is to *not* need an index.

What would be needed is for tidpath.c to let through more kinds of TID
comparison quals than it does now, and then for nodeTidscan.c to know
what to do with them.  The latter logic might well look something like
btree indexscan qual preparation, but it wouldn't be the same code.

            regards, tom lane