Thread: Good Delimiter for copy command

Good Delimiter for copy command

From

SHARMILA JOTHIRAJAH

Date:

10 February 2009, 11:07:51

Hi,
What is a good delimiter to use for COPY command. Im trying to use COPY command to copy data from one table to another in 2 different databases.
Can you suggest a unique delimiter that I can use for this COPY command

Thanks
Sharmial

Re: Good Delimiter for copy command

From

Sam Mason

Date:

10 February 2009, 15:41:15

On Tue, Feb 10, 2009 at 07:07:42AM -0800, SHARMILA JOTHIRAJAH wrote:
> What is a good delimiter to use for COPY command. Im trying to use
> COPY command to copy data from one table to another in 2 different
> databases.
> Can you suggest a unique delimiter that I can use for this COPY command

If you're fixed with using COPY then what about just using CSV mode?

  COPY table TO STDOUT WITH CSV;

If not then why not use pg_dump?


  Sam

Re: Good Delimiter for copy command

From

SHARMILA JOTHIRAJAH

Date:

12 February 2009, 16:51:44

My data fields contains commas, tabs,'|' etc. So I cant use t hem as delimiters..so I need a unique may be non-character to use as a delimiter...
-Sharmila

--- On Tue, 2/10/09, Sam Mason <sam@samason.me.uk> wrote:

From: Sam Mason <sam@samason.me.uk>
Subject: Re: [GENERAL] Good Delimiter for copy command
To: pgsql-general@postgresql.org
Date: Tuesday, February 10, 2009, 2:41 PM

On Tue, Feb 10, 2009 at 07:07:42AM -0800, SHARMILA JOTHIRAJAH wrote:
> What is a good delimiter to use for COPY command. Im trying to use
> COPY command to copy data from one table to another in 2 different
> databases.
> Can you suggest a unique delimiter that I can use for this COPY command

Ifyou're fixed with using COPY then what about just using CSV mode?

  COPY table TO STDOUT WITH CSV;

If not then why not use pg_dump?


  Sam

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Good Delimiter for copy command

From

Raymond O'Donnell

Date:

12 February 2009, 17:11:54

On 12/02/2009 20:51, SHARMILA JOTHIRAJAH wrote:
> My data fields contains commas, tabs,'|'  etc. So I cant use t hem as
> delimiters..so I need a unique may be non-character to use as a
> delimiter...

Indeed you do... and a quick RTFM reveals that you can set a character
of your choice:

  http://www.postgresql.org/docs/8.3/interactive/sql-copy.html

...look for the "delimiter" parameter.

Ray.


------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: Good Delimiter for copy command

From

Andrew Gould

Date:

12 February 2009, 17:15:16

On Thu, Feb 12, 2009 at 2:51 PM, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:

My data fields contains commas, tabs,'|' etc. So I cant use t hem as delimiters..so I need a unique may be non-character to use as a delimiter...
-Sharmila

How are you backing up your data? If your backup method deals appropriately with the characters mentioned above, then backup the data and reload them into the other tables.

If your backup method does not deal appropriately with the characters you mention, you may have larger problems on your hands.

To the list: Does pg_dump escape characters that are the same as the delimiter?

Thanks,

Andrew

Re: Good Delimiter for copy command

From

SHARMILA JOTHIRAJAH

Date:

12 February 2009, 17:32:55



--- On Thu, 2/12/09, Andrew Gould <andrewlylegould@gmail.com> wrote:

> From: Andrew Gould <andrewlylegould@gmail.com>
> Subject: Re: [GENERAL] Good Delimiter for copy command
> To: sharmi_jo@yahoo.com
> Cc: pgsql-general@postgresql.org, "Sam Mason" <sam@samason.me.uk>
> Date: Thursday, February 12, 2009, 4:15 PM
> On Thu, Feb 12, 2009 at 2:51 PM, SHARMILA JOTHIRAJAH
> <sharmi_jo@yahoo.com>wrote:
>
> > My data fields contains commas, tabs,'|'  etc.
> So I cant use t hem as
> > delimiters..so I need a unique may be non-character to
> use as a delimiter...
> > -Sharmila
> >
>
>
> How are you backing up your data?  If your backup method
> deals appropriately
> with the characters mentioned above, then backup the data
> and reload them
> into the other tables.

Im not using this for loading postgres data to postgres. Im trying this method to load my Oracle data to postgresql
tables....just trying to migrate my data from oracle to postgresql 

Thanks
Sharmila
>
> If your backup method does not deal appropriately with the
> characters you
> mention, you may have larger problems on your hands.
>
> To the list:  Does pg_dump escape characters that are the
> same as the
> delimiter?
>
> Thanks,
>
> Andrew

Re: Good Delimiter for copy command

From

Tom Lane

Date:

12 February 2009, 17:41:50

Andrew Gould <andrewlylegould@gmail.com> writes:
> To the list:  Does pg_dump escape characters that are the same as the
> delimiter?

Yes.  The OP has not actually explained why he needs to pick a
nondefault delimiter, unless maybe it is that he wants to feed the
dump to some program that is too dumb to deal with escaping.

            regards, tom lane

Re: Good Delimiter for copy command

From

Mark Roberts

Date:

12 February 2009, 18:19:41

On Thu, 2009-02-12 at 12:51 -0800, SHARMILA JOTHIRAJAH wrote:
>
> My data fields contains commas, tabs,'|'  etc. So I cant use t hem as
> delimiters..so I need a unique may be non-character to use as a
> delimiter...
> -Sharmila

Is this a theoretical problem or an actual one?  I haven't had any
problems with the default (tab).  In fact, copying from one database to
another is an exceedingly common task that I do, all done with tab.

dev=> create table foo (a text);
CREATE TABLE
Time: 385.967 ms
dev=> insert into foo values ('     '); -- literal tab
INSERT 0 1
Time: 0.536 ms
dev=> insert into foo values ('\t'); -- special character, parsed.
INSERT 0 1
Time: 0.224 ms
dev=> insert into foo values ('\\t'); -- backslash, t
INSERT 0 1
Time: 0.183 ms
dev=> copy foo to stdout;
\t
\t
\\t
Time: 0.188 ms
dev=> select * from foo;
  a
------
 \x09
 \x09
 \t
(3 rows)

Time: 0.239 ms
dev=>


-Mark

Re: Good Delimiter for copy command

From

Greg Smith

Date:

12 February 2009, 19:04:05

On Thu, 12 Feb 2009, SHARMILA JOTHIRAJAH wrote:

> Im not using this for loading postgres data to postgres. Im trying this
> method to load my Oracle data to postgresql tables.... just trying to
> migrate my data from oracle to postgresql

The point other posts here were trying to make to you is that the ideal
situation is one where the source and destination databases support
escaping the delimiter in the same fashion.  You might be able to arrange
that, depends on how you're dumping the data from Oracle.

If not, when you can't get matching delimiter escapes, there is no such
thing as a universal delimiter for moves between databases like this.
You have to look at your source data to figure out what characters aren't
used.

For example, "~" and "`" are good characters to consider because they
don't show up much in regular text.  But if your database contained
programming source code, those would be awful delimiter choices.  "@" used
to be a potential delimiter choice, nowadays too many databases have
e-mail addresses in them for that to work anymore.

Here's what you can do:  pick one delimiter.  Start with "~" say.  Dump
your data.  Now, search the result for the *other* delimiter you might use
(maybe "`"), using something like grep.  If it doesn't show up, that other
delimiter will work for you.  Otherwise, swap delimiters and try again.

The other option here that can avoid all sorts of headaches is to use
pgloader:  http://pgfoundry.org/projects/pgloader/ which will save the
rows that are rejected for some reason, which is usually what happens when
there's a delimiter issue.  You can then edit those by hand to work around
random odd delimiter problems.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Good Delimiter for copy command

From

Marco Colombo

Date:

17 February 2009, 12:50:55

Tom Lane wrote:
> Andrew Gould <andrewlylegould@gmail.com> writes:
>> To the list:  Does pg_dump escape characters that are the same as the
>> delimiter?
>
> Yes.  The OP has not actually explained why he needs to pick a
> nondefault delimiter, unless maybe it is that he wants to feed the
> dump to some program that is too dumb to deal with escaping.
>
>             regards, tom lane
>

Which makes me wonder, does copy accept UTF-8 input? Is it possibile
to use some unicode character which is unlikely to appear in the data
set as delimiter? Something like U+FFFC.

Here I'm using U+25CF for it's much more likely you can see it
(it's a black circle).

$ python -c 'print u";".join("ABCD").encode("utf8")' > 1.txt
$ cat 1.txt
A;B;C;D

$ python -c 'print u"\u25cf".join("ABCD").encode("utf8")' > 2.txt
$ cat 2.txt
A●B●C●D

$ psql -c "\d test"
    Table "public.test"
 Column | Type | Modifiers
--------+------+-----------
 a      | text |
 b      | text |
 c      | text |
 d      | text |

$ psql -c "\copy test from '1.txt' delimiter ;"
$ psql -c "\copy test from '2.txt' delimiter ●"
ERROR:  COPY delimiter must be a single character
\copy: ERROR:  COPY delimiter must be a single character


It doesn't seem to work, but I need to stress that I'm using
$ psql --version
psql (PostgreSQL) 8.2.9

.TM.

Re: Good Delimiter for copy command

From

Mark Roberts

Date:

17 February 2009, 13:47:59

On Tue, 2009-02-17 at 17:17 +0100, Marco Colombo wrote:
>
> Which makes me wonder, does copy accept UTF-8 input? Is it possibile
> to use some unicode character which is unlikely to appear in the data
> set as delimiter? Something like U+FFFC.

I'm also not able to get unicode characters to copy in like that, but I
also haven't had any problems with data containing the *tab* character.
It seems to be properly escaped (see my other email in this thread) so
it doesn't seem to matter if it appears in the data stream.  The *only*
reason I see to switch from tab is if the receiving application requires
it in a different format.

-Mark

Re: Good Delimiter for copy command

From

David Fetter

Date:

17 February 2009, 13:55:33

On Tue, Feb 17, 2009 at 05:17:40PM +0100, Marco Colombo wrote:
> Tom Lane wrote:
> > Andrew Gould <andrewlylegould@gmail.com> writes:
> >> To the list:  Does pg_dump escape characters that are the same as the
> >> delimiter?
> >
> > Yes.  The OP has not actually explained why he needs to pick a
> > nondefault delimiter, unless maybe it is that he wants to feed the
> > dump to some program that is too dumb to deal with escaping.
> >
> >             regards, tom lane
> >
>
> Which makes me wonder, does copy accept UTF-8 input?

Yes, but...

> Is it possibile to use some unicode character which is unlikely to
> appear in the data set as delimiter? Something like U+FFFC.

No.  The delimiter needs to be one byte long at the moment.  The error
message you're getting probably needs an update.  Bug?

> $ psql -c "\copy test from '2.txt' delimiter ●"
> ERROR:  COPY delimiter must be a single character
> \copy: ERROR:  COPY delimiter must be a single character

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Good Delimiter for copy command

From

Tom Lane

Date:

17 February 2009, 16:15:16

David Fetter <david@fetter.org> writes:
> On Tue, Feb 17, 2009 at 05:17:40PM +0100, Marco Colombo wrote:
>> Is it possibile to use some unicode character which is unlikely to
>> appear in the data set as delimiter? Something like U+FFFC.

> No.  The delimiter needs to be one byte long at the moment.  The error
> message you're getting probably needs an update.  Bug?

>> $ psql -c "\copy test from '2.txt' delimiter ●"
>> ERROR:  COPY delimiter must be a single character
>> \copy: ERROR:  COPY delimiter must be a single character

In 8.3 and CVS HEAD these messages are phrased as "must be a single
ASCII character" which I suppose is someone's attempt at improving the
situation; but IMHO this has replaced an imprecision with a lie.  It
works fine with non-ASCII characters, if you're working in a single-byte
encoding (eg LATIN-n).

I think it should say "must be a single one-byte character" and not try
to talk down to the reader.

            regards, tom lane

Re: Good Delimiter for copy command

From

Jasen Betts

Date:

19 February 2009, 08:58:30

On 2009-02-12, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
> --0-1509090113-1234471898=:21817
> Content-Type: text/plain; charset=iso-8859-1
> Content-Transfer-Encoding: quoted-printable
>
> My data fields contains commas, tabs,'|'=A0 etc. So I cant use them as del=
> imiters..so I need a unique may be non-character to use as a delimiter...

posgres will escape the delimiters (tabs come out at \t)

csv will work too (but with a different quoting convention)

Re: Good Delimiter for copy command

From

Jasen Betts

Date:

19 February 2009, 09:06:38

On 2009-02-12, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
>
> Im not using this for loading postgres data to postgres.
> Im trying this method to load my Oracle data to postgresql tables....
> just trying to migrate my data from oracle to postgresql

dump as CSV, properly implemented CSV isn't daunted by , " or newline in the data.
(but the latter can make the files hard for humans to parse)

else dump as named column inserts,