Thread: COPY command & binary format

COPY command & binary format

From
Nicolas Paris
Date:
Hello,

What is the way to build a binary format (instead of a csv) ? Is there specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html

Could I create such format from java ?

I guess this would be far faster, and maybe safer than CSVs

Thanks by advance,

Re: [SPAM] COPY command & binary format

From
Moreno Andreo
Date:
<div class="moz-cite-prefix">Il 10/05/2016 12:56, Nicolas Paris ha scritto:<br /></div><blockquote
cite="mid:CA+ssMORTVs9cnvaS6wOOuPcmmsT5MJ6sfDYs9h3peXqW4zOA5A@mail.gmail.com"type="cite"><div dir="ltr"><div
class="gmail_default"style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">Hello,<br /><br /></div><div
class="gmail_default"style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">What is the way to build a binary format
(insteadof a csv) ? Is there specification for this file ?<br /><a
href="http://www.postgresql.org/docs/9.5/static/sql-copy.html"
moz-do-not-send="true">http://www.postgresql.org/docs/9.5/static/sql-copy.html</a><br/></div></div></blockquote> I
alwayscreate binary files with<br /> COPY table TO 'path/to/file' WITH BINARY<br /><br /> Cheers<br /> Moreno.-<br /> 

Re: COPY command & binary format

From
Sameer Kumar
Date:


On Tue, May 10, 2016 at 4:26 PM Nicolas Paris <niparisco@gmail.com> wrote:
Hello,

What is the way to build a binary format (instead of a csv) ? Is there specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html 


Could I create such format from java ?

You can use COPY JDBC API to copy to STDOUT and then compress it before you use usual Java file operations to write it to a file. You will have to follow the reverse process while reading from this file and LOADING to a table.

But why would you want to do that?
 

I guess this would be far faster, and maybe safer than CSVs

I don't think assumption is right. COPY is not meant for backup, it is for LOAD and UN-LOAD. 

What you probably need is pg_dump with -Fc format.
 

Thanks by advance,
--
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com

Re: COPY command & binary format

From
Sameer Kumar
Date:


On Tue, May 10, 2016 at 4:36 PM Sameer Kumar <sameer.kumar@ashnik.com> wrote:
On Tue, May 10, 2016 at 4:26 PM Nicolas Paris <niparisco@gmail.com> wrote:
Hello,

What is the way to build a binary format (instead of a csv) ? Is there specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html 


Could I create such format from java ?

You can use COPY JDBC API to copy to STDOUT and then compress it before you use usual Java file operations to write it to a file. You will have to follow the reverse process while reading from this file and LOADING to a table.

But why would you want to do that?
 

I guess this would be far faster, and maybe safer than CSVs

I don't think assumption is right. COPY is not meant for backup, it is for LOAD and UN-LOAD. 

What you probably need is pg_dump with -Fc format.
 

Like someone else suggested upthread you can use Binary format in COPY command (default is text)
 

Thanks by advance,
--
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com

--
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com

Re: [SPAM] COPY command & binary format

From
Nicolas Paris
Date:
2016-05-10 13:04 GMT+02:00 Moreno Andreo <moreno.andreo@evolu-s.it>:
Il 10/05/2016 12:56, Nicolas Paris ha scritto:
Hello,

What is the way to build a binary format (instead of a csv) ? Is there specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html
I always create binary files with
COPY table TO 'path/to/file' WITH BINARY

 
​Fine, this works in this way :
postgresql -> binary
binary -> postgresql

The way I want is :
csv -> binary -> postgresql

And if possible, transforming csv to binary throught java​.

Use case is ETL process.

Re: COPY command & binary format

From
Pujol Mathieu
Date:



Le 10/05/2016 à 12:56, Nicolas Paris a écrit :
Hello,

What is the way to build a binary format (instead of a csv) ? Is there specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html

Could I create such format from java ?

I guess this would be far faster, and maybe safer than CSVs

Thanks by advance,
Hi
Making a driver that do what you want is not difficult. You will achieve better performances than than loading data from CSV, and you also will have better precision for floating values (there is no text conversion).
In the link you provide there is a description of the file format in section Binary Format.
Mathieu Pujol

Re: [SPAM] COPY command & binary format

From
Moreno Andreo
Date:
Il 10/05/2016 13:38, Nicolas Paris ha
      scritto:

    <blockquote
cite="mid:CA+ssMOSVC-Yu+yQXeLHL3c0+iz4g1PEtRM49cVvQxyNFN3d4sA@mail.gmail.com"
      type="cite">

        2016-05-10 13:04 GMT+02:00 Moreno
          Andreo <<a moz-do-not-send="true"
              href="mailto:moreno.andreo@evolu-s.it" target="_blank">moreno.andreo@evolu-s.it>:

            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  Il 10/05/2016 12:56, Nicolas Paris ha scritto:



                      <div
                        style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">Hello,


                      <div
                        style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">What

                        is the way to build a binary format (instead of
                        a csv) ? Is there specification for this file ?
                        <a moz-do-not-send="true"
                          href="http://www.postgresql.org/docs/9.5/static/sql-copy.html"
                          target="_blank">http://www.postgresql.org/docs/9.5/static/sql-copy.html



                 I always create binary files with
                COPY table TO 'path/to/file' WITH BINARY



             

          <div class="gmail_default"
            style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">​
            Fine, this works in this way :
            postgresql -> binary

          <div class="gmail_default"
            style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">binary
            -> postgresql


          <div class="gmail_default"
            style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">The
            way I want is :

          <div class="gmail_default"
            style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">csv
            -> binary -> postgresql




    Is this just to be quicker or are you going to add some business
    logic while converting CSV data?
    As you mentioned ETL, I assume the second, as I don't think that
    converting CSV to binary and then loading it to PostgreSQL will be
    more convenient than loading directly from CSV... as quicker as it
    can be, you have anyway to load data from CSV.

    Binary file format is briefly described in the last part of the doc
    you linked, under "Binary format", and there's also reference to
    source files.
    <blockquote
cite="mid:CA+ssMOSVC-Yu+yQXeLHL3c0+iz4g1PEtRM49cVvQxyNFN3d4sA@mail.gmail.com"
      type="cite">


          <div class="gmail_default"
            style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">

          <div class="gmail_default"
            style="font-family:tahoma,sans-serif;color:rgb(0,51,51)">And
            if possible, transforming csv to binary throught java​.




    This is beyond my knowledge, ATM. I'm just starting with Java and
    JDBC is still in the TODO list, sorry... :-)

    Cheers
    Moreno.-

Re: [SPAM] COPY command & binary format

From
Nicolas Paris
Date:
2016-05-10 14:47 GMT+02:00 Moreno Andreo <moreno.andreo@evolu-s.it>:
Il 10/05/2016 13:38, Nicolas Paris ha scritto:
2016-05-10 13:04 GMT+02:00 Moreno Andreo <moreno.andreo@evolu-s.it>:
Il 10/05/2016 12:56, Nicolas Paris ha scritto:
Hello,

What is the way to build a binary format (instead of a csv) ? Is there specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html
I always create binary files with
COPY table TO 'path/to/file' WITH BINARY

 
​ Fine, this works in this way :
postgresql -> binary
binary -> postgresql

The way I want is :
csv -> binary -> postgresql
Is this just to be quicker or are you going to add some business logic while converting CSV data?
As you mentioned ETL, I assume the second, as I don't think that converting CSV to binary and then loading it to PostgreSQL will be more convenient than loading directly from CSV... as quicker as it can be, you have anyway to load data from CSV.

Right, ETL process means huge business logic.
get the data (csv or other) -> transform it -> produce a binary -> copy from binary from stdin ​

Producing 100GO CSVs, is a waste of time.

 
Binary file format is briefly described in the last part of the doc you linked, under "Binary format", and there's also reference to source files.

And if possible, transforming csv to binary throught java​.
This is beyond my knowledge, ATM. I'm just starting with Java and JDBC is still in the TODO list, sorry... :-)

Cheers
Moreno.-

​Documentation explains a bit. Moreover, I have found a detailled answer here :
My ultimate goal is to encapsulate it in a Talend component. (talend is an open-source java based ETL software).

Thanks, I ll keep you aware.

Re: [SPAM] COPY command & binary format

From
Cat
Date:
On Tue, May 10, 2016 at 01:38:12PM +0200, Nicolas Paris wrote:
> The way I want is :
> csv -> binary -> postgresql
>
> And if possible, transforming csv to binary throught java​.
>
> Use case is ETL process.

Not sure what the point would be tbh if the data is already in CSV.
You might aswell submit the CSV to postgres and let it deal with it.
It'll probably be faster. It'll also be more portable. The BINARY
format is what Postgres uses internally (more or less). I had to
look at the source code to figure out how to insert a timestamp
(FYI: Postgres stores timestamps as epoch based off the year 2000 not
1970 amongst other fun things).


--
  "A search of his car uncovered pornography, a homemade sex aid, women's
  stockings and a Jack Russell terrier."
    - http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-1111118083480


Re: [SPAM] COPY command & binary format

From
Cat
Date:
On Tue, May 10, 2016 at 03:00:55PM +0200, Nicolas Paris wrote:
> > The way I want is :
> > csv -> binary -> postgresql
> >
> > Is this just to be quicker or are you going to add some business logic
> > while converting CSV data?
> > As you mentioned ETL, I assume the second, as I don't think that
> > converting CSV to binary and then loading it to PostgreSQL will be more
> > convenient than loading directly from CSV... as quicker as it can be, you
> > have anyway to load data from CSV.
> >
> ​
> Right, ETL process means huge business logic.
> get the data (csv or other) -> transform it -> produce a binary -> copy
> from binary from stdin ​
>
> Producing 100GO CSVs, is a waste of time.

Ah. You need to fiddle with the data. Then you need to weigh the pros of
something agnostic to Postgres's internals to something that needs to be
aware of them.

You will need to delve into the source code for data types more complex
than INTEGER, TEXT and BYTEA (which was the majority of my data when I
was just looking into it).

--
  "A search of his car uncovered pornography, a homemade sex aid, women's
  stockings and a Jack Russell terrier."
    - http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-1111118083480


Re: [SPAM] COPY command & binary format

From
Nicolas Paris
Date:
Well the job is done. The talend component is working (https://github.com/parisni/talend/tree/master/tPostgresqlOutputBulkAPHP). It allows creating a file (binary or csv) locally, and then use the COPY function with "FROM STDIN" that does not need to push the file on a remote database server.

I have made a little comparison test:

column1: character varying
column2: integer
column3: boolean
10 000 000 tuples

Type    | Create file time       | Bulk load time           | Total Time              |   File size
Binary | 11137 milliseconds |  21661 milliseconds | 32798 milliseconds |   250 MO
CSV     | 23226 milliseconds |  22192 milliseconds |  45418 milliseconds |  179 MO


Binary format is definitely faster and safer
- faster because writing binary is faster than text file. I guess the bulk load time bottleneck is the network, then this is equivalent for both format. It is two time faster to load a binary when the file is on the database server.
- safer thanks to the format (each value is preceded by its lenght) more robust thant CSV and separators (that can be present in the text).


Thanks,

2016-05-10 15:08 GMT+02:00 Cat <cat@zip.com.au>:
On Tue, May 10, 2016 at 03:00:55PM +0200, Nicolas Paris wrote:
> > The way I want is :
> > csv -> binary -> postgresql
> >
> > Is this just to be quicker or are you going to add some business logic
> > while converting CSV data?
> > As you mentioned ETL, I assume the second, as I don't think that
> > converting CSV to binary and then loading it to PostgreSQL will be more
> > convenient than loading directly from CSV... as quicker as it can be, you
> > have anyway to load data from CSV.
> >
> ​
> Right, ETL process means huge business logic.
> get the data (csv or other) -> transform it -> produce a binary -> copy
> from binary from stdin ​
>
> Producing 100GO CSVs, is a waste of time.

Ah. You need to fiddle with the data. Then you need to weigh the pros of
something agnostic to Postgres's internals to something that needs to be
aware of them.

You will need to delve into the source code for data types more complex
than INTEGER, TEXT and BYTEA (which was the majority of my data when I
was just looking into it).

--
  "A search of his car uncovered pornography, a homemade sex aid, women's
  stockings and a Jack Russell terrier."
    - http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-1111118083480