Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch] - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]
Date
Msg-id CAFj8pRBD8bczZfGVam_V+36YCP_iMuX4Wa=G++gcZAdDmX4DGg@mail.gmail.com
Whole thread Raw
In response to Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]  (Amit Kapila <amit.kapila@huawei.com>)
Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
Hello

I reduced this patch as just plpgsql (SPI) problem solution.

Correct generic solution needs a discussion about enhancing our libpq
interface - we can take a number of returned rows, but we should to
get number of processed rows too. A users should to parse this number
from completationTag, but this problem is not too hot and usually is
not blocker, because anybody can process completationTag usually.

So this issue is primary for PL/pgSQL, where this information is not
possible. There is precedent - CREATE AS SELECT (thanks for sample
:)), and COPY should to have same impact on ROW_COUNT like this
statement (last patch implements it).

Regards

Pavel



2012/9/21 Amit Kapila <amit.kapila@huawei.com>:
> On Friday, September 21, 2012 1:23 AM Pavel Stehule wrote:
>
>>> Basic stuff:
>>> ------------
>>> - Patch applies OK. but offset difference in line numbers.
>>> - Compiles with errors in contrib [pg_stat_statements, sepgsql] modules
>>> - Regression failed; one test-case in COPY due to incomplete test-case
>>> attached patch. – same as reported by Heikki
>>
>>fixed patch is in attachment
>
> After modifications:
> ---------------------------
>     - Patch applies OK
>     - Compiles cleanly without any errors/warnings
>     - Regression tests pass.
>
>>>
>>> What it does:
>>> --------------
>>> Modification to get the number of processed rows evaluated via SPI. The
>>> changes are to add extra parameter in ProcessUtility to get the number of
>>> rows processed by COPY command.
>>>
>>> Code Review Comments:
>>> ---------------------
>>> 1. New parameter is added to ProcessUtility_hook_type function
>>>    but the functions which get assigned to these functions like
>>>    sepgsql_utility_command, pgss_ProcessUtility, prototype & definition is
>>> not modified.
>
> Functionality is not fixed correctly for hook functions, In function pgss_ProcessUtility
> for bellow snippet of code processed parameter is passed NULL, as well as not initialized.
> because of this when "pg_stat_statements" extention is utilized COPY command is giving garbage values.
>         if (prev_ProcessUtility)
>                 prev_ProcessUtility(parsetree, queryString, params,
>                                                         dest, completionTag, context, NULL);
>         else
>                 standard_ProcessUtility(parsetree, queryString, params,
>                                                         dest, completionTag, context, NULL);
>
>     Testcase is attached.
>     In this testcase table has only 1000 records but it show garbage value.
>         postgres=# show shared_preload_libraries ;
>          shared_preload_libraries
>         --------------------------
>          pg_stat_statements
>         (1 row)
>         postgres=# CREATE TABLE tbl (a int);
>         CREATE TABLE
>         postgres=# INSERT INTO tbl VALUES(generate_series(1,1000));
>         INSERT 0 1000
>         postgres=# do $$
>         declare r int;
>         begin
>           copy tbl to '/home/kiran/copytest.csv' csv;
>           get diagnostics r = row_count;
>           raise notice 'exported % rows', r;
>           truncate tbl;
>           copy tbl from '/home/kiran/copytest.csv' csv;
>           get diagnostics r = row_count;
>           raise notice 'imported % rows', r;
>         end;
>         $$ language plpgsql;
>         postgres$#
>         NOTICE:  exported 13281616 rows
>         NOTICE:  imported 13281616 rows
>         DO
>
>>>
>>> 2. Why to add the new parameter if completionTag hold the number of
>>> processed tuple information; can be extracted
>>>
>>>    from it as follows:
>>>                     _SPI_current->processed = strtoul(completionTag + 7,
>>> NULL, 10);
>>
>>this is basic question. I prefer a natural type for counter - uint64
>>instead text. And there are no simply way to get offset (7 in this
>>case)
>
>     I agree with your point, but currently in few other places we are parsing the completion tag for getting number
oftuples processed. So may be in future we can change those places as well. For example 

yes, this step can be done in future - together with libpq enhancing.
Is not adequate change API (and break lot of extensions) for this
isolated problem. So I propose fix plpgsql issue, and add to ToDo -
"enhance libpq to return a number of processed rows as number" - but
this change breaking compatibility.

Pavel

>
> pgss_ProcessUtility
> {
> ..
>
> /* parse command tag to retrieve the number of affected rows. */
> if (completionTag &&
>         sscanf(completionTag, "COPY " UINT64_FORMAT, &rows) != 1)
>         rows = 0;
>
> }
>
> _SPI_execute_plan
> {
> ..
> ..
> if (IsA(stmt, CreateTableAsStmt))
> {
>         Assert(strncmp(completionTag, "SELECT ", 7) == 0);
>         _SPI_current->processed = strtoul(completionTag + 7,
>                                                                           NULL, 10);
>
> ..
> }
>
>
> With Regards,
> Amit Kapila.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: ALTER command reworks
Next
From: Magnus Hagander
Date:
Subject: Re: Patch: incorrect array offset in backend replication tar header