Re: [PATCH] Simple progress reporting for COPY command - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: [PATCH] Simple progress reporting for COPY command
Date
Msg-id CAEze2Wj62YGOK_d67LvfGoL=ZobfmUhPn+WRGfEhMtGHBaM1Xg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Simple progress reporting for COPY command  (Josef Šimánek <josef.simanek@gmail.com>)
Responses Re: [PATCH] Simple progress reporting for COPY command  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Thu, 7 Jan 2021 at 23:00, Josef Šimánek <josef.simanek@gmail.com> wrote:
>
> čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
> <tomas.vondra@enterprisedb.com> napsal:
> >
> > I'm not particularly attached to the "lines" naming, it just seemed OK
> > to me. So if there's consensus to rename this somehow, I'm OK with it.
>
> The problem I do see here is it depends on the "way" of COPY. If
> you're copying from CSV file to table, those are actually lines (since
> 1 line = 1 tuple). But copying from DB to file is copying tuples (but
> 1 tuple = 1 file line). Line works better here for me personally.
>
> Once I'll fix the problem with triggers (and also another cases if
> found), I think we can consider it lines. It will represent amount of
> lines processed from file on COPY FROM and amount of lines written to
> file in COPY TO form (at least in CSV format). I'm not sure how BINARY
> format works, I'll check.

Counterexample that 1 tuple need not be 1 line, in csv/binary:

/*
 * create a table with one tuple containing 1 text field, which consists of
 * 10 newline characters.
 * If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n).
 */
# CREATE TABLE ttab (val) AS
  SELECT * FROM (values (
    repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text
  )) as v;

# -- indeed, one unix-style line, according to $ wc -l copy.txt
# COPY ttab TO 'copy.txt' (format text);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text);
COPY 1

# -- 11 lines
# COPY ttab TO 'copy.csv' (format csv);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv);
COPY 1

# -- 13 lines
# COPY ttab TO 'copy.bin' (format binary);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary);
COPY 1

All of the above copy statements would only report 'lines_processed = 1',
in the progress reporting, while csv/binary line counts are definatively
inconsistent with what the progress reporting shows, because progress
reporting counts tuples / table rows, not the amount of lines in the
external file.



pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: Add session statistics to pg_stat_database
Next
From: Andrew Dunstan
Date:
Subject: Re: WIP: System Versioned Temporal Table