[PATCH] Simple progress reporting for COPY command - Mailing list pgsql-hackers

From Josef Šimánek
Subject [PATCH] Simple progress reporting for COPY command
Date
Msg-id CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
Whole thread Raw
Responses Re: [PATCH] Simple progress reporting for COPY command  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: [PATCH] Simple progress reporting for COPY command  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hello,

finally I had some time to revisit patch and all comments from
https://www.postgresql.org/message-id/CAFp7QwqMGEi4OyyaLEK9DR0%2BE%2BoK3UtA4bEjDVCa4bNkwUY2PQ%40mail.gmail.com
and I have prepared simple version of COPY command progress reporting.

To keep the patch small as possible, I have introduced only a minimum
set of columns. It could be extended later if needed.

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since
oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known
(for example when copying to file, it is 0)
bytes_processed - bigint - amount of bytes processed
bytes_total - bigint - file size in bytes if COPY FROM file (0 if not
COPY FROM file)
lines_processed - bigint - amount of tuples processed

example output of progress for common use case (import from CSV file):

first console:
yr=# COPY test FROM '/home/retro/test.csv' (FORMAT CSV);

second console:
yr=# SELECT * FROM pg_stat_progress_copy;
  pid   | datid | datname | relid | bytes_processed | bytes_total |
lines_processed
--------+-------+---------+-------+-----------------+-------------+-----------------
 803148 | 16384 | yr      | 16394 |       998965248 |  1777777796 |
    56730126
(1 row)

It is simple to get progress in percents for example by:

yr=# SELECT (bytes_processed/bytes_total::decimal)*100 FROM
pg_stat_progress_copy WHERE pid = 803148;
        ?column?
-------------------------
 50.04287948706048525800

^ ~50% of file processed already

I did some dead simple benchmarking as well. The difference is not
huge. Each command works with 100 millions of tuples. Times are in
seconds.

           test             with progress   master (32d6287)   difference
 ------------------------- --------------- ------------------ ------------
  COPY table TO                    46.102             47.499       -1.397
  COPY query TO                    52.168             49.822        2.346
  COPY table TO PROGRAM            52.345             51.882        0.463
  COPY query TO PROGRAM            54.141             52.763        1.378
  COPY table FROM                  88.970             85.161        3.809
  COPY table FROM PROGRAM          94.393             90.346        4.047

Properly formatted table (since I'm not sure everyone here would be
able to see the table formatted well) and the benchmark source is
present at https://github.com/simi/postgres/pull/6. I have also
included an example output in there.

I'll add this to the current commitfest as well.

Attachment

pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: Table AM modifications to accept column projection lists
Next
From: David Fetter
Date:
Subject: Re: Tid scan improvements