Re: [PATCH] Initial progress reporting for COPY command - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Initial progress reporting for COPY command
Date
Msg-id 20200623111519.dtkknc5xoxicszq7@development
Whole thread Raw
In response to Re: [PATCH] Initial progress reporting for COPY command  (vignesh C <vignesh21@gmail.com>)
Responses Re: [PATCH] Initial progress reporting for COPY command
Re: [PATCH] Initial progress reporting for COPY command
Re: [PATCH] Initial progress reporting for COPY command
List pgsql-hackers
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
>On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
>>
>> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
>>
>> For now I have tested those cases:
>>
>> CREATE TABLE test(id int);
>> INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
>> COPY (SELECT * FROM test) TO '/tmp/ids';
>> COPY test FROM '/tmp/ids';
>>
>> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
>> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
>>
>> It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline
character).
>> I'll try to check more complex COPY commands to ensure everything is in sync.
>>
>> If you have any ideas for testing queries, feel free to suggest.
>
>For copy from statement you could attach the session, put a breakpoint
>at CopyReadLineText, execution will hit this breakpoint for every
>record it is doing COPY FROM and parallely check if
>pg_stat_progress_copy is getting updated correctly. I noticed it was
>showing the file read size instead of the actual processed bytes.
>
>>>  +pg_stat_progress_copy| SELECT s.pid,
>>> +    s.datid,
>>> +    d.datname,
>>> +    s.relid,
>>> +        CASE s.param1
>>> +            WHEN 0 THEN 'TO'::text
>>> +            WHEN 1 THEN 'FROM'::text
>>> +            ELSE NULL::text
>>> +        END AS direction,
>>> +    ((s.param2)::integer)::boolean AS file,
>>> +    ((s.param3)::integer)::boolean AS program,
>>> +    s.param4 AS lines_processed,
>>> +    s.param5 AS file_bytes_processed
>>>
>>> You could include pg_size_pretty for s.param5 like
>>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
>>> users to understand bytes_processed when the data size increases.
>>
>>
>> I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to
beused later in custom nice friendly human-readable views built on the client side.
 
>> For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.
>>
>> Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
>
>I felt we could add pg_size_pretty to make the view more user friendly.
>

Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead
Next
From: Surafel Temesgen
Date:
Subject: Re: Decomposing xml into table