Home > mailing lists

Re: ANALYZE command progress checker - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: ANALYZE command progress checker
Date	April 4, 2017 14:57:54
Msg-id	f4e56064-e969-1735-d257-2218b54763c7@lab.ntt.co.jp Whole thread Raw
In response to	Re: ANALYZE command progress checker (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: ANALYZE command progress checker
List	pgsql-hackers

Tree view

On 2017/04/04 15:30, Masahiko Sawada wrote:
>> We can report progress in terms of individual blocks only inside
>> acquire_sample_rows(), which seems undesirable when one thinks that we
>> will be resetting the target for every child table.  We should have a
>> global target that considers all child tables in the inheritance
>> hierarchy, which maybe is possible if we count them beforehand in
>> acquire_inheritance_sample_rows().  But why not use target sample rows,
>> which remains the same for both when we're collecting sample rows from one
>> table and from the whole inheritance hierarchy.  We can keep the count of
>> already collected rows in a struct that is used across calls for all the
>> child tables and increment upward from that count when we start collecting
>> from a new child table.
> 
> An another option I came up with is that support new pgstat progress
> function, say pgstat_progress_incr_param, which increments index'th
> member in st_progress_param[]. That way we just need to report a delta
> using that function.

That's an interesting idea.  It could be made to work and would not
require changing the interface of AcquireSampleRowsFunc, which seems very
desirable.

>>>>     /*
>>>>      * The first targrows sample rows are simply copied into the
>>>>      * reservoir. Then we start replacing tuples in the sample
>>>>      * until we reach the end of the relation.  This algorithm is
>>>>      * from Jeff Vitter's paper (see full citation below). It
>>>>      * works by repeatedly computing the number of tuples to skip
>>>>      * before selecting a tuple, which replaces a randomly chosen
>>>>      * element of the reservoir (current set of tuples).  At all
>>>>      * times the reservoir is a true random sample of the tuples
>>>>      * we've passed over so far, so when we fall off the end of
>>>>      * the relation we're done.
>>>>      */
>>
>> It seems that we could use samplerows instead of numrows to count the
>> progress (if we choose to count progress in terms of sample rows collected).
>>
> 
> I guess it's hard to count progress in terms of sample rows collected
> even if we use samplerows instead, because samplerows can be
> incremented independently of the target number of sampling rows. The
> samplerows can be incremented up to the total number of rows of
> relation.

Hmm, you're right.  It could be counted with a separate variable
initialized to 0 and incremented every time we decide to add a row to the
final set of sampled rows, although different implementations of
AcquireSampleRowsFunc have different ways of deciding if a given row will
be part of the final set of sampled rows.

On the other hand, if we decide to count progress in terms of blocks as
you suggested afraid, I'm afraid that FDWs won't be able to report the
progress.

Thanks,
Amit

pgsql-hackers by date:

From: Antonin Houska
Date: 04 April 2017, 14:41:57
Subject: WIP: Aggregation push-down

From: Etsuro Fujita
Date: 04 April 2017, 15:01:36
Subject: Re: postgres_fdw bug in 9.6

Re: ANALYZE command progress checker - Mailing list pgsql-hackers

Previous

Next