Re: [PATCH] Simple progress reporting for COPY command - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Simple progress reporting for COPY command
Date
Msg-id d6411811-7d4f-d162-3e51-7eb022e1fa63@enterprisedb.com
Whole thread Raw
In response to Re: [PATCH] Simple progress reporting for COPY command  (Josef Šimánek <josef.simanek@gmail.com>)
Responses Re: [PATCH] Simple progress reporting for COPY command  (Josef Šimánek <josef.simanek@gmail.com>)
List pgsql-hackers

On 1/7/21 7:56 PM, Josef Šimánek wrote:
> čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
> <boekewurm+postgres@gmail.com> napsal:
>>
>> On Wed, 6 Jan 2021 at 22:45, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>>>
>>> On 1/5/21 11:02 AM, Josef Šimánek wrote:
>>>> I'm attaching the whole patch since commitfest failed to ingest the
>>>> last incremental on CI.
>>>>
>>>
>>> Yeah, the whole patch needs to be attached for the commitfest tester to
>>> work correctly - it can't apply pieces from multiple messages, etc.
>>>
>>> Anyway, I pushed this last version of patch, after a couple more tweaks,
>>> mainly to the docs - one place used pg_stat_copy_progress, the section
>>> was not indexed properly, and so on.
>>
>> Thank you all, I'd love to use this in the future to keep track of
>> (e.g.) my backup/restore progress.
>>
>> For my previously mentioned extension to keep track of filtered tuples
>> that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
>> of that, in line with the current column name style of lines.
> 
> If I understand it well, this column could be used on COPY TO to track
> skipped lines because of BEFORE TRIGGER, right? I can include this in
> my following patch keeping lines_processed incremented even for
> skipped lines as well.
> 

That's my understanding too.

>> If so desired, I'll split this off into a different thread & CF entry.
>>
>>> I see Matthias proposed to change "lines" to "tuples" - I only saw the
>>> message after pushing, but I probably wouldn't make that change anyway.
>>> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
>>> If not, we can change that.
>>
>> The CSV docs, sure. But copy doesn't only process CSVs; it also has
>> text (which does have a # lines = # tuples / rows guarantee) and
>> binary (in which the 'line' vocabulary doesn't make sense, and in
>> which the 'tuples' vocabulary is used). Additionally, most mentions of
>> postgres' logical rows/tuples in the COPY documentation use the 'rows'
>> terminology ('tuples' for FORMAT BINARY), and use 'line' only for
>> external format's textual representation's strings delimited by
>> newlines (which I believe is not exactly what we're counting).
>>
>> One common user of COPY is the pg_dump tool, and that uses binary
>> format by default (iirc).
>>
>> Additionally, all comments surrounding the *LINES_PROCESSED updates
>> only mention 'tuples', so I'd like to strongly suggest (a variant of)
>> attached patch 0002 to keep the vocabulary consistent by using
>> 'tuples' instead of 'lines'.
>>

I'm not particularly attached to the "lines" naming, it just seemed OK 
to me. So if there's consensus to rename this somehow, I'm OK with it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Next
From: Tom Lane
Date:
Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode