Re: Extending copy_expert - Mailing list psycopg

From Adrian Klaver
Subject Re: Extending copy_expert
Date
Msg-id 543BD762.4070009@aklaver.com
Whole thread Raw
In response to Extending copy_expert  (Andrea Riciputi <andrea.riciputi@gmail.com>)
Responses Re: Extending copy_expert  (Andrea Riciputi <andrea.riciputi@gmail.com>)
List psycopg
On 10/12/2014 02:28 PM, Andrea Riciputi wrote:
> Hi all,
> a couple of weeks ago at work we had to produce a quite big CSV data file which should be used as input by another
pieceof software. 
>
> Since the file must be produced on a daily basis, is big (let say half a TB), and it contains data stored in our PG
database,letting PG produce the file itself seemed the right approach. Thanks to psycopg the whole operation is
performedin C, resulting fast enough for our purpose. 
>
> However the target software for which the file is produced, is, let say, “legacy” software and can only accept CRLF
asEOL character. However by calling COPY TO STDOUT from psycopg ends up in a CSV file with LF as EOL forcing us to pass
thefile a second time to convert EOL, which is inconvenient. Plus, doing it in Python, make it a little bit to slow. 
>
> My first attempt was to ask the pgsql-hackers ML for extending the COPY TO syntax to allow a “FORCE_EOL” parameter,
butthey kindly rejected my proposal. They also suggested to me to use the result of PQgetCopyData() and convert there
theLF character with whatever is suitable for me. 
>
> So I studied the psycopg codebase and spotted out where and how to change it to allow such an use case. My intent was
toadd a new keyword argument to the copy_expert() method, let me call it “eol” with a default of “\n”. If the user
decidesto override it using a different EOL (i.e. “\r\n” or “\r”) every EOL returned by PQgetCopyData() in
_pq_copy_out_v3()can be converted. 
>
> However I’m a little bit concerned with this solution, and before going on with a pull request, I’d like to have your
feedbackhere. My main concern is that extending the copy_expert() method in psycopg leaves the user completely alone
aboutusing this new keyword argument in the right way. 
>
> We can easily allow only CR, LF, and CRLF as the values for that argument, but what if the user uses the “eol” kwarg
andfor example issues a “COPY TO … AS BINARY” query? In that case the resulting output file can end up being corrupted
withoutthe user can even notice that. Of course psycopg can parse the “COPY TO” query (by means of PG’s
ProcessCopyOptions())and check if the “eol” kwarg is consistent with the issued query. But, frankly this is seems to
becomea little bit too complex  to me. 
>
> So I’m asking to you, what’s your take on this, what do you think about that? Do you see any better way to get it
done?Anyone here also involved in pgsql-hackers ML can support my idea to extend the COPY TO syntax directly in PG? 
>
> Thanks for you help, and apologies for the long email.

Alright to follow up on my previous post about open. In Python 2 newline
is available in the io module, so a simple example:

f = io.open('io_newline.csv', 'w',  newline='\r\n')

cur = con.cursor()

cur.copy_expert("COPY cell_per TO STDOUT WITH CSV HEADER", f)

f.close()

aklaver@panda:~/software_projects> file io_newline.csv
io_newline.csv: ASCII text, with CRLF line terminators

> a.
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


psycopg by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Extending copy_expert
Next
From: Andrea Riciputi
Date:
Subject: Re: Extending copy_expert