Home > mailing lists

Re: [PATCH] Initial progress reporting for COPY command - Mailing list pgsql-hackers

From	vignesh C
Subject	Re: [PATCH] Initial progress reporting for COPY command
Date	June 22, 2020 10:15:19
Msg-id	CALDaNm00xgfn3vGskzRbL5WUVcJipC4bWo5=xQWJ6axfN0CDPg@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Initial progress reporting for COPY command (Josef Šimánek <josef.simanek@gmail.com>)
Responses	Re: [PATCH] Initial progress reporting for COPY command
List	pgsql-hackers

Tree view

On Sun, Jun 21, 2020 at 5:11 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
>
> Thanks for all comments. I have updated code to support more options (including STDIN/STDOUT) and added some
documentation.
>
> Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
>
> Diff version: https://github.com/simi/postgres/pull/5.diff
> Patch version: https://github.com/simi/postgres/pull/5.patch
>
> I'm also attaching screenshot of HTML documentation and html documentation file.
>
> I'll do my best to get this to commitfest now.
>
> ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek <josef.simanek@gmail.com> napsal:
>>
>> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist
(https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
Ihave prepared an initial patch for COPY command progress reporting. 
>>
>> Few examples first:
>>
>> "COPY (SELECT * FROM test) TO '/tmp/ids';"
>>
>> yr=# SELECT * from pg_stat_progress_copy;
>>    pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
>> ---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
>>  3347126 | 16384 | yr      |     0 | TO        | t    | f       |         3529943 |             24906226
>> (1 row)
>>
>> "COPY test FROM '/tmp/ids';
>>
>> yr=# SELECT * from pg_stat_progress_copy;
>>    pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
>> ---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
>>  3347126 | 16384 | yr      | 16385 | FROM      | t    | f       |       121591999 |            957218816
>> (1 row)
>>
>> Columns are inspired by CREATE INDEX progress report system view.
>>
>> pid - integer - PID of backend
>> datid - oid - OID of related database
>> datname - name - name of related database (this seems redundant, since oid should be enough, but it is the same in
CREATEINDEX) 
>> relid - oid - oid of table related to COPY command, when not known (for example when copying to file, it is 0)
>> direction - text - one of "FROM" or "TO" depends on command used
>> file - bool - is file is used?
>> program - bool - is program used?
>> lines_processed - bigint - amount of processed lines, works for both directions (FROM/TO)
>> file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
>> FROM/TO) when file is used (file = t)
>>
>> Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
>>

Few comments:
@@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int
minread, int maxread)
  break;
  }

+ CopyUpdateBytesProgress(cstate, bytesread);
+
  return bytesread;
 }

This is actually the read data, actual processing will happen later
like in CopyReadLineText, it would be better if
CopyUpdateBytesProgress is done later, if not it will give the same
value even though it does multiple inserts on the table.
lines_processed will keep getting updated but file_bytes_processed
will not be updated.

 +pg_stat_progress_copy| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'TO'::text
+            WHEN 1 THEN 'FROM'::text
+            ELSE NULL::text
+        END AS direction,
+    ((s.param2)::integer)::boolean AS file,
+    ((s.param3)::integer)::boolean AS program,
+    s.param4 AS lines_processed,
+    s.param5 AS file_bytes_processed

You could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: "movead.li@highgo.ca"
Date: 22 June 2020, 09:31:37
Subject: pg_resetwal --next-transaction-id may cause database failed to restart.

From: Josef Šimánek
Date: 22 June 2020, 11:21:13
Subject: Re: [PATCH] Initial progress reporting for COPY command

Re: [PATCH] Initial progress reporting for COPY command - Mailing list pgsql-hackers

Previous

Next