Home > mailing lists

Re: [GENERAL] Large DB - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [GENERAL] Large DB
Date	April 2, 2004 20:57:49
Msg-id	10036.1080953867@sss.pgh.pa.us Whole thread Raw
In response to	Re: [GENERAL] Large DB (Manfred Koizar <mkoi-pg@aon.at>)
Responses	Re: [GENERAL] Large DB
List	pgsql-hackers

Tree view

Manfred Koizar <mkoi-pg@aon.at> writes:
>> You'd run the Vitter
>> algorithm separately to decide whether to keep or discard each live row
>> you find in the blocks you read.

> You mean once a block is sampled we inspect it in any case?  This was
> not the way I had planned to do it, but I'll keep this idea in mind.

Well, once we've gone to the trouble of reading in a block we
definitely want to count the tuples in it, for the purposes of
extrapolating the total number of tuples in the relation.  Given
that, I think the most painless route is simply to use the Vitter
algorithm with the number-of-tuples-scanned as the count variable.
You could dump the logic in acquire_sample_rows that tries to estimate
where to read the N'th tuple from.

If you like I can send you the Vitter paper off-list (I have a PDF of
it).  The comments in the code are not really intended to teach someone
what it's good for ...
        regards, tom lane

pgsql-hackers by date:

From: jseymour@LinxNet.com (Jim Seymour)
Date: 02 April 2004, 20:52:41
Subject: Re: Problems Vacuum'ing

From: Tom Lane
Date: 02 April 2004, 21:06:32
Subject: Re: Function to kill backend

Re: [GENERAL] Large DB - Mailing list pgsql-hackers

Previous

Next