Re: [PATCH] Simple progress reporting for COPY command - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: [PATCH] Simple progress reporting for COPY command |
Date | |
Msg-id | CALj2ACXstXE5TXazXkDmJAK-40XZsEYTRG0m4pUVf8nBGfHrMA@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Simple progress reporting for COPY command (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
List | pgsql-hackers |
On Fri, Jan 8, 2021 at 7:00 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > On Thu, 7 Jan 2021 at 23:00, Josef Šimánek <josef.simanek@gmail.com> wrote: > > > > čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra > > <tomas.vondra@enterprisedb.com> napsal: > > > > > > I'm not particularly attached to the "lines" naming, it just seemed OK > > > to me. So if there's consensus to rename this somehow, I'm OK with it. > > > > The problem I do see here is it depends on the "way" of COPY. If > > you're copying from CSV file to table, those are actually lines (since > > 1 line = 1 tuple). But copying from DB to file is copying tuples (but > > 1 tuple = 1 file line). Line works better here for me personally. > > > > Once I'll fix the problem with triggers (and also another cases if > > found), I think we can consider it lines. It will represent amount of > > lines processed from file on COPY FROM and amount of lines written to > > file in COPY TO form (at least in CSV format). I'm not sure how BINARY > > format works, I'll check. > > Counterexample that 1 tuple need not be 1 line, in csv/binary: Yes in binary format file, 1 tuple is definitely not 1 line. > /* > * create a table with one tuple containing 1 text field, which consists of > * 10 newline characters. > * If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n). > */ > # CREATE TABLE ttab (val) AS > SELECT * FROM (values ( > repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text > )) as v; > > # -- indeed, one unix-style line, according to $ wc -l copy.txt > # COPY ttab TO 'copy.txt' (format text); > COPY 1 > # TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text); > COPY 1 > > # -- 11 lines > # COPY ttab TO 'copy.csv' (format csv); > COPY 1 > # TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv); > COPY 1 > > # -- 13 lines > # COPY ttab TO 'copy.bin' (format binary); > COPY 1 > # TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary); > COPY 1 > > All of the above copy statements would only report 'lines_processed = 1', > in the progress reporting, while csv/binary line counts are definatively > inconsistent with what the progress reporting shows, because progress > reporting counts tuples / table rows, not the amount of lines in the > external file. I agree with that. +1 to change the name of 'lines_processed' column to 'tuples_processed'. So, the meaning of 'tuples_processed' can be - for COPY FROM, it's the number of tuples that are written to the target table, for COPY TO, it's the number of tuples from the source table or source plan that are written out to the file/stdin. IMHO, it's good to have at least the following info: the type of command COPY TO or COPY FROM, the format CSV,BINARY,TEXT and from where the input comes from or output goes to - FILE, STDIN, STDOUT or PROGRAM. Otherwise users will have to join pg_stat_progress_copy view with other views to get the type of copy command or the entire query. That's sometimes cumbersome to figure out what's the other view name and write join queries. Another point related to documentation, I see that the pg_stat_progress_copy view details are added to monitoring.sgml, but I didn't find any mention of it in the copy.sgml. For instance, we have pg_stat_progress_basebackup documentation both in monitoring.sgml and a mention of it and link to it in pg_basebackup.sgml. Usually, users will look at copy documentation to find any kind of information related to it, so better we have it mentioned there as well. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: