Re: [PATCH] Simple progress reporting for COPY command - Mailing list pgsql-hackers

From Josef Šimánek
Subject Re: [PATCH] Simple progress reporting for COPY command
Date
Msg-id CAFp7QwoZ+nxp7Q1ycWtEe-3uw7Hgzb2ZY6ywTbfxNkz9hqM1eg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Simple progress reporting for COPY command  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
čt 7. 1. 2021 v 16:54 odesílatel Tomas Vondra
<tomas.vondra@enterprisedb.com> napsal:
>
>
>
> On 1/7/21 12:06 PM, Josef Šimánek wrote:
> > st 6. 1. 2021 v 22:44 odesílatel Tomas Vondra
> > <tomas.vondra@enterprisedb.com> napsal:
> >>
> >> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> >>> I'm attaching the whole patch since commitfest failed to ingest the
> >>> last incremental on CI.
> >>>
> >>
> >> Yeah, the whole patch needs to be attached for the commitfest tester to
> >> work correctly - it can't apply pieces from multiple messages, etc.
> >>
> >> Anyway, I pushed this last version of patch, after a couple more tweaks,
> >> mainly to the docs - one place used pg_stat_copy_progress, the section
> >> was not indexed properly, and so on.
> >>
> >> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> >> message after pushing, but I probably wouldn't make that change anyway.
> >> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> >> If not, we can change that.
> >>
> >> One more question, though - I now realize the lines_processed ignores
> >> rows skipped because of BEFORE INSERT triggers. I wonder if that's the
> >> right thing to do? Imagine you know the number of lines in a file. You
> >> can't really use (lines_processed / total_lines) to measure progress,
> >> because that may ignore many "skipped" rows. So maybe this should be
> >> changed to count all rows. OTOH we still have bytes_processed.
> >
> > I think that should be fixed. It is called "lines_processed" not
> > "lines_inserted". I'll take a look.
> >
>
> So we may either rename the column to "lines_inserted", or tweak the
> code to count all processed lines. Or track both and have two columns.

First I'll ensure lines_processed represents the actual amount of
processed lines. If reading from file and some lines are skipped due
to before insert trigger, I consider that one processed as well, even
if it is not inserted. If welcomed, I can add lines_inserted later as
well. But I'm not sure about the use case.

Also thanks to mentioning triggers, I think those could be used to
test the COPY progress (at least some variants). I'll check if I would
be able to add some test cases as well.

> regarss
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: list of extended statistics on psql
Next
From: Robert Eckhardt
Date:
Subject: Re: Zedstore - compressed in-core columnar storage