Re: ANALYZE command progress checker - Mailing list pgsql-hackers

From Amit Langote
Subject Re: ANALYZE command progress checker
Date
Msg-id f4e56064-e969-1735-d257-2218b54763c7@lab.ntt.co.jp
Whole thread Raw
In response to Re: ANALYZE command progress checker  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: ANALYZE command progress checker
List pgsql-hackers
On 2017/04/04 15:30, Masahiko Sawada wrote:
>> We can report progress in terms of individual blocks only inside
>> acquire_sample_rows(), which seems undesirable when one thinks that we
>> will be resetting the target for every child table.  We should have a
>> global target that considers all child tables in the inheritance
>> hierarchy, which maybe is possible if we count them beforehand in
>> acquire_inheritance_sample_rows().  But why not use target sample rows,
>> which remains the same for both when we're collecting sample rows from one
>> table and from the whole inheritance hierarchy.  We can keep the count of
>> already collected rows in a struct that is used across calls for all the
>> child tables and increment upward from that count when we start collecting
>> from a new child table.
> 
> An another option I came up with is that support new pgstat progress
> function, say pgstat_progress_incr_param, which increments index'th
> member in st_progress_param[]. That way we just need to report a delta
> using that function.

That's an interesting idea.  It could be made to work and would not
require changing the interface of AcquireSampleRowsFunc, which seems very
desirable.

>>>>     /*
>>>>      * The first targrows sample rows are simply copied into the
>>>>      * reservoir. Then we start replacing tuples in the sample
>>>>      * until we reach the end of the relation.  This algorithm is
>>>>      * from Jeff Vitter's paper (see full citation below). It
>>>>      * works by repeatedly computing the number of tuples to skip
>>>>      * before selecting a tuple, which replaces a randomly chosen
>>>>      * element of the reservoir (current set of tuples).  At all
>>>>      * times the reservoir is a true random sample of the tuples
>>>>      * we've passed over so far, so when we fall off the end of
>>>>      * the relation we're done.
>>>>      */
>>
>> It seems that we could use samplerows instead of numrows to count the
>> progress (if we choose to count progress in terms of sample rows collected).
>>
> 
> I guess it's hard to count progress in terms of sample rows collected
> even if we use samplerows instead, because samplerows can be
> incremented independently of the target number of sampling rows. The
> samplerows can be incremented up to the total number of rows of
> relation.

Hmm, you're right.  It could be counted with a separate variable
initialized to 0 and incremented every time we decide to add a row to the
final set of sampled rows, although different implementations of
AcquireSampleRowsFunc have different ways of deciding if a given row will
be part of the final set of sampled rows.

On the other hand, if we decide to count progress in terms of blocks as
you suggested afraid, I'm afraid that FDWs won't be able to report the
progress.

Thanks,
Amit





pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: WIP: Aggregation push-down
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw bug in 9.6