Re: [PATCH] Simple progress reporting for COPY command - Mailing list pgsql-hackers

From Josef Šimánek
Subject Re: [PATCH] Simple progress reporting for COPY command
Date
Msg-id CAFp7QwrWg9vJtRLtvKm2ZRMnf=YCeZ6Js81L8wpdjxwRccaC_Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Simple progress reporting for COPY command  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: [PATCH] Simple progress reporting for COPY command  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
<tomas.vondra@enterprisedb.com> napsal:
>
>
>
> On 1/7/21 7:56 PM, Josef Šimánek wrote:
> > čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
> > <boekewurm+postgres@gmail.com> napsal:
> >>
> >> On Wed, 6 Jan 2021 at 22:45, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> >>>
> >>> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> >>>> I'm attaching the whole patch since commitfest failed to ingest the
> >>>> last incremental on CI.
> >>>>
> >>>
> >>> Yeah, the whole patch needs to be attached for the commitfest tester to
> >>> work correctly - it can't apply pieces from multiple messages, etc.
> >>>
> >>> Anyway, I pushed this last version of patch, after a couple more tweaks,
> >>> mainly to the docs - one place used pg_stat_copy_progress, the section
> >>> was not indexed properly, and so on.
> >>
> >> Thank you all, I'd love to use this in the future to keep track of
> >> (e.g.) my backup/restore progress.
> >>
> >> For my previously mentioned extension to keep track of filtered tuples
> >> that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
> >> of that, in line with the current column name style of lines.
> >
> > If I understand it well, this column could be used on COPY TO to track
> > skipped lines because of BEFORE TRIGGER, right? I can include this in
> > my following patch keeping lines_processed incremented even for
> > skipped lines as well.
> >
>
> That's my understanding too.
>
> >> If so desired, I'll split this off into a different thread & CF entry.
> >>
> >>> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> >>> message after pushing, but I probably wouldn't make that change anyway.
> >>> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> >>> If not, we can change that.
> >>
> >> The CSV docs, sure. But copy doesn't only process CSVs; it also has
> >> text (which does have a # lines = # tuples / rows guarantee) and
> >> binary (in which the 'line' vocabulary doesn't make sense, and in
> >> which the 'tuples' vocabulary is used). Additionally, most mentions of
> >> postgres' logical rows/tuples in the COPY documentation use the 'rows'
> >> terminology ('tuples' for FORMAT BINARY), and use 'line' only for
> >> external format's textual representation's strings delimited by
> >> newlines (which I believe is not exactly what we're counting).
> >>
> >> One common user of COPY is the pg_dump tool, and that uses binary
> >> format by default (iirc).
> >>
> >> Additionally, all comments surrounding the *LINES_PROCESSED updates
> >> only mention 'tuples', so I'd like to strongly suggest (a variant of)
> >> attached patch 0002 to keep the vocabulary consistent by using
> >> 'tuples' instead of 'lines'.
> >>
>
> I'm not particularly attached to the "lines" naming, it just seemed OK
> to me. So if there's consensus to rename this somehow, I'm OK with it.

The problem I do see here is it depends on the "way" of COPY. If
you're copying from CSV file to table, those are actually lines (since
1 line = 1 tuple). But copying from DB to file is copying tuples (but
1 tuple = 1 file line). Line works better here for me personally.

Once I'll fix the problem with triggers (and also another cases if
found), I think we can consider it lines. It will represent amount of
lines processed from file on COPY FROM and amount of lines written to
file in COPY TO form (at least in CSV format). I'm not sure how BINARY
format works, I'll check.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Next
From: Peter Geoghegan
Date:
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)