On tor, 2010-09-16 at 15:47 +0900, Itagaki Takahiro wrote:
> On Tue, Aug 17, 2010 at 2:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> > VACUUM (lazy) (also autovacuum), table-rewriting ALTER TABLE
> We could also support VACUUM FULL, CLUSTER, CREATE INDEX and REINDEX.
Well, yeah, but those are a lot harder to do. ;-)
> > a very simple query.
> SELECT * FROM tbl;
> can report reasonable progress, but
> SELECT count(*) FROM tbl;
> cannot, because planned_tuple_count of the aggregation is 1.
> I hope better solutions for the grouping case because they are used
> in complex queries, where the progress counter is eagerly wanted.
I think that's a problem for a later day. Once we have the interfaces
to report the progress, someone (else) can investigate how to track
progress of arbitrary queries.
> > - Are the interfaces OK?
>
> I like the new column in pg_stat_activity to "pull" the progress.
> In addition, as previously discussed, we could also have "push"
> notifications; Ex. GUC parameter "notice_per_progress" (0.0-1.0),
> or periodical NOTIFY messages.
That's a three-line change in pgstat_report_progress() in the simplest
case. Maybe also something to consider later.
> > - How to handle commands that process multiple tables? For example,
> > lazy VACUUM on a single table is pretty easy to track (count the block
> > numbers), but what about a database-wide lazy VACUUM?
>
> Not only a database-wide lazy VACUUM but also some of maintenance
> commands have non-linear progress -- Progress of index scans in VACUUM
> is not linear. ALTER TABLE could have REINDEX after table rewrites.
>
> We might need to have arbitrary knowledges for the non-uniform commands;
> For example, "CREATE INDEX assigns 75% of the progress for table scan,
> and 25% for the final merging of tapes".
Maybe another approach is to forget about presenting progress
numerically. Instead, make it a string that saying something like, for
example for database-wide VACUUM, 'table 1/14 block 5/32'. That way you
can cover anything you want, and you give the user the most accurate
information available, but then you can't do things like sort
pg_stat_activitiy by expected end time, or display a progress bar. Or
of course we could do numerically and string, but that might be a bit
too much clutter.