Re: [PATCH] Initial progress reporting for COPY command - Mailing list pgsql-hackers

From Josef Šimánek
Subject Re: [PATCH] Initial progress reporting for COPY command
Date
Msg-id CAFp7QwqWSwhmEcCEoJqRJofURMQ2Sffu0+-Brt+LBUqU-ds-cw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Initial progress reporting for COPY command  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PATCH] Initial progress reporting for COPY command  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers


po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
>Thanks for all comments. I have updated code to support more options
>(including STDIN/STDOUT) and added some documentation.
>
>Patch is attached and can be found also at
>https://github.com/simi/postgres/pull/5.
>
>Diff version: https://github.com/simi/postgres/pull/5.diff
>Patch version: https://github.com/simi/postgres/pull/5.patch
>
>I'm also attaching screenshot of HTML documentation and html documentation
>file.
>
>I'll do my best to get this to commitfest now.
>

I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.

For COPY FROM file fstat is done and info is available already at https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934. It should be easy to update some param (param6 for example) with file size and expose it in report view. When not available, this column can be NULL.

Would that be enough?

On the other side everyone can check file size manually to get total value expected and just compare to reported bytes_processed. Alt. "wc -l" can be checked to get amount of lines and check lines_processed column to get progress. Should it check amount of lines and populate another column with lines total (using a configured separator) as well? AFAIK that would need full file scan which can be slow for huge files.
 
I wonder if it made sense to show some estimates in the other cases. For
example when exporting query result, maybe we could show the estimated
number of rows and size? Of course, that's prone to estimation errors
and it's more a wild idea for the future, I don't expect this patch to
implement that.

My plan here was to expose numbers not being currently available and let clients get the rest of info on their own.

For example:
- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be executed before to get the amount of expected rows
- for "COPY table FROM file" - file size or amount of lines in file can be inspected first to get amount of expected rows or bytes to be processed

I see the current system view in my patch (and also all other report views currently available) more as a scaffold to build own tools.

For example CLI tools can use this to provide some kind of progress.
 
regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Robert Haas
Date:
Subject: Re: suggest to rename enable_incrementalsort