Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Gsoc2012 idea, tablesample
Date
Msg-id CA+TgmoY1VLK=xJk7Pr++6Saw=OHpx-+ooUpEum6zjoPtzJDi2w@mail.gmail.com
Whole thread Raw
In response to Re: Gsoc2012 idea, tablesample  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Gsoc2012 idea, tablesample
Re: Gsoc2012 idea, tablesample
List pgsql-hackers
On Thu, May 10, 2012 at 10:28 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>> One problem I see with this approach is that its efficiency
>> depends on the average tuple length, at least with a naive
>> approach to random ctid generator. The simplest way to generate
>> those randomly without introducing bias is to generate a random
>> page index between 0 and the relation's size in pages,
>
> Right.
>
>> and then generate random tuple index between 0 and
>> MaxHeapTuplesPerPage, which is 291 on x86-64 assuming the standard
>> page size of 8k.
>
> I think we can do better than that without moving too far from "the
> simplest way".  It seems like we should be able to get a more
> accurate calculation of a minimum base tuple size based on the table
> definition, and calculate the maximum number of those which could
> fit on a page.  On the other hand, ctid uses a line pointer, doesn't
> it?  Do we need to worry about dead line pointers allowing higher
> tuple indexes than the calculated maximum number of tuples per page?

I wonder if you could do this with something akin to the Bitmap Heap
Scan machinery.  Populate a TID bitmap with a bunch of randomly chosen
TIDs, fetch them all in physical order, and if you don't get as many
rows as you need, rinse and repeat until you do.

I'm worried this project is getting so complicated that it will be
beyond the ability of a new hacker to get anything useful done.  Can
we simplify the requirements here to something that is reasonable for
a beginner?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Draft release notes complete
Next
From: Bruce Momjian
Date:
Subject: Re: Draft release notes complete