Thread: Progress report removal of temp files and temp relation files using ereport_startup_progress

Hi,

At times, there can be many temp files (under pgsql_tmp) and temp
relation files (under removal which after crash may take longer during
which users have no clue about what's going on in the server before it
comes up online.

Here's a proposal to use ereport_startup_progress to report the
progress of the file removal.

Thoughts?

Regards,
Bharath Rupireddy.

Attachment
Hi Bharath,


On Sat, Apr 30, 2022 at 11:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> At times, there can be many temp files (under pgsql_tmp) and temp
> relation files (under removal which after crash may take longer during
> which users have no clue about what's going on in the server before it
> comes up online.
>
> Here's a proposal to use ereport_startup_progress to report the
> progress of the file removal.
>
> Thoughts?

The patch looks good to me.

With this patch, the user would at least know which directory is being
scanned and how much time has elapsed. It would be better to know how
much work is remaining. I could not find a way to estimate the number
of files in the directory so that we can extrapolate elapsed time and
estimate the remaining time. Well, we could loop the output of
opendir() twice, first to estimate and then for the actual work. This
might actually work, if the time to delete all the files is very high
compared to the time it takes to scan all the files/directories.

Another possibility is to scan the sorted output of opendir() thus
using the current file name to estimate remaining files in a very
crude and inaccurate way. That doesn't look attractive either. I can't
think of any better way to estimate the remaining time.

But at least with this patch, a user knows which files have been
deleted, guessing how far, in the directory structure, the process has
reached. S/he can then take a look at the remaining contents of the
directory to estimate how much it should wait.

-- 
Best Wishes,
Ashutosh Bapat



On Mon, May 2, 2022 at 6:26 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi Bharath,
>
>
> On Sat, Apr 30, 2022 at 11:08 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > At times, there can be many temp files (under pgsql_tmp) and temp
> > relation files (under removal which after crash may take longer during
> > which users have no clue about what's going on in the server before it
> > comes up online.
> >
> > Here's a proposal to use ereport_startup_progress to report the
> > progress of the file removal.
> >
> > Thoughts?
>
> The patch looks good to me.
>
> With this patch, the user would at least know which directory is being
> scanned and how much time has elapsed.

There's a problem with the patch, the timeout mechanism isn't being
used by the postmaster process. Postmaster doesn't
InitializeTimeouts() and doesn't register STARTUP_PROGRESS_TIMEOUT, I
tried to make postmaster do that (attached a v2 patch) but make check
fails.

Now, I'm thinking if it's a good idea to let postmaster use timeouts at all?

> It would be better to know how
> much work is remaining. I could not find a way to estimate the number
> of files in the directory so that we can extrapolate elapsed time and
> estimate the remaining time. Well, we could loop the output of
> opendir() twice, first to estimate and then for the actual work. This
> might actually work, if the time to delete all the files is very high
> compared to the time it takes to scan all the files/directories.
>
> Another possibility is to scan the sorted output of opendir() thus
> using the current file name to estimate remaining files in a very
> crude and inaccurate way. That doesn't look attractive either. I can't
> think of any better way to estimate the remaining time.

I think 'how much work/how many files remaining to process' is a
generic problem, for instance, snapshot, mapping files, old WAL file
processing and so on. I don't think we can do much about it.

> But at least with this patch, a user knows which files have been
> deleted, guessing how far, in the directory structure, the process has
> reached. S/he can then take a look at the remaining contents of the
> directory to estimate how much it should wait.

Not sure we will be able to use the timeout mechanism within
postmaster. Another idea is to have a generic GUC something like
log_file_processing_traffic = {none, medium, high} (similar idea is
proposed for WAL files processing while replaying/recovering at [1]),
default being none, when set to medium a log message gets emitted for
every say 128 or 256 (just a random number) files processed. when set
to high, log messages get emitted for every file processed (too
verbose). I think this generic GUC log_file_processing_traffic can be
used in many other file processing areas.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACVnhbx4pLZepvdqOfeOekvZXJ2F%3DwJeConGzok%2B6kgCVA%40mail.gmail.com

Regards,
Bharath Rupireddy.

Attachment
On Thu, May 5, 2022 at 12:11 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, May 2, 2022 at 6:26 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > Hi Bharath,
> >
> >
> > On Sat, Apr 30, 2022 at 11:08 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > At times, there can be many temp files (under pgsql_tmp) and temp
> > > relation files (under removal which after crash may take longer during
> > > which users have no clue about what's going on in the server before it
> > > comes up online.
> > >
> > > Here's a proposal to use ereport_startup_progress to report the
> > > progress of the file removal.
> > >
> > > Thoughts?
> >
> > The patch looks good to me.
> >
> > With this patch, the user would at least know which directory is being
> > scanned and how much time has elapsed.
>
> There's a problem with the patch, the timeout mechanism isn't being
> used by the postmaster process. Postmaster doesn't
> InitializeTimeouts() and doesn't register STARTUP_PROGRESS_TIMEOUT, I
> tried to make postmaster do that (attached a v2 patch) but make check
> fails.
>
> Now, I'm thinking if it's a good idea to let postmaster use timeouts at all?

Here's the v3 patch, which adds progress reports for temp file removal
under the pgsql_tmp directory and temporary relation files under the
pg_tblspc directory, regression tests pass with it.

Regards,
Bharath Rupireddy.

Attachment