Hello,
finally I had some time to revisit patch and all comments from
https://www.postgresql.org/message-id/CAFp7QwqMGEi4OyyaLEK9DR0%2BE%2BoK3UtA4bEjDVCa4bNkwUY2PQ%40mail.gmail.com
and I have prepared simple version of COPY command progress reporting.
To keep the patch small as possible, I have introduced only a minimum
set of columns. It could be extended later if needed.
Columns are inspired by CREATE INDEX progress report system view.
pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since
oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known
(for example when copying to file, it is 0)
bytes_processed - bigint - amount of bytes processed
bytes_total - bigint - file size in bytes if COPY FROM file (0 if not
COPY FROM file)
lines_processed - bigint - amount of tuples processed
example output of progress for common use case (import from CSV file):
first console:
yr=# COPY test FROM '/home/retro/test.csv' (FORMAT CSV);
second console:
yr=# SELECT * FROM pg_stat_progress_copy;
pid | datid | datname | relid | bytes_processed | bytes_total |
lines_processed
--------+-------+---------+-------+-----------------+-------------+-----------------
803148 | 16384 | yr | 16394 | 998965248 | 1777777796 |
56730126
(1 row)
It is simple to get progress in percents for example by:
yr=# SELECT (bytes_processed/bytes_total::decimal)*100 FROM
pg_stat_progress_copy WHERE pid = 803148;
?column?
-------------------------
50.04287948706048525800
^ ~50% of file processed already
I did some dead simple benchmarking as well. The difference is not
huge. Each command works with 100 millions of tuples. Times are in
seconds.
test with progress master (32d6287) difference
------------------------- --------------- ------------------ ------------
COPY table TO 46.102 47.499 -1.397
COPY query TO 52.168 49.822 2.346
COPY table TO PROGRAM 52.345 51.882 0.463
COPY query TO PROGRAM 54.141 52.763 1.378
COPY table FROM 88.970 85.161 3.809
COPY table FROM PROGRAM 94.393 90.346 4.047
Properly formatted table (since I'm not sure everyone here would be
able to see the table formatted well) and the benchmark source is
present at https://github.com/simi/postgres/pull/6. I have also
included an example output in there.
I'll add this to the current commitfest as well.