Thread: Good Delimiter for copy command
Hi, What is a good delimiter to use for COPY command. Im trying to use COPY command to copy data from one table to another in 2 different databases. Can you suggest a unique delimiter that I can use for this COPY command Thanks Sharmial |
On Tue, Feb 10, 2009 at 07:07:42AM -0800, SHARMILA JOTHIRAJAH wrote: > What is a good delimiter to use for COPY command. Im trying to use > COPY command to copy data from one table to another in 2 different > databases. > Can you suggest a unique delimiter that I can use for this COPY command If you're fixed with using COPY then what about just using CSV mode? COPY table TO STDOUT WITH CSV; If not then why not use pg_dump? Sam
My data fields contains commas, tabs,'|' etc. So I cant use t hem as delimiters..so I need a unique may be non-character to use as a delimiter... -Sharmila --- On Tue, 2/10/09, Sam Mason <sam@samason.me.uk> wrote: From: Sam Mason <sam@samason.me.uk> |
On 12/02/2009 20:51, SHARMILA JOTHIRAJAH wrote: > My data fields contains commas, tabs,'|' etc. So I cant use t hem as > delimiters..so I need a unique may be non-character to use as a > delimiter... Indeed you do... and a quick RTFM reveals that you can set a character of your choice: http://www.postgresql.org/docs/8.3/interactive/sql-copy.html ...look for the "delimiter" parameter. Ray. ------------------------------------------------------------------ Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland rod@iol.ie Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals ------------------------------------------------------------------
On Thu, Feb 12, 2009 at 2:51 PM, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
My data fields contains commas, tabs,'|' etc. So I cant use t hem as delimiters..so I need a unique may be non-character to use as a delimiter...
-Sharmila
How are you backing up your data? If your backup method deals appropriately with the characters mentioned above, then backup the data and reload them into the other tables.
If your backup method does not deal appropriately with the characters you mention, you may have larger problems on your hands.
To the list: Does pg_dump escape characters that are the same as the delimiter?
Thanks,
Andrew
--- On Thu, 2/12/09, Andrew Gould <andrewlylegould@gmail.com> wrote: > From: Andrew Gould <andrewlylegould@gmail.com> > Subject: Re: [GENERAL] Good Delimiter for copy command > To: sharmi_jo@yahoo.com > Cc: pgsql-general@postgresql.org, "Sam Mason" <sam@samason.me.uk> > Date: Thursday, February 12, 2009, 4:15 PM > On Thu, Feb 12, 2009 at 2:51 PM, SHARMILA JOTHIRAJAH > <sharmi_jo@yahoo.com>wrote: > > > My data fields contains commas, tabs,'|' etc. > So I cant use t hem as > > delimiters..so I need a unique may be non-character to > use as a delimiter... > > -Sharmila > > > > > How are you backing up your data? If your backup method > deals appropriately > with the characters mentioned above, then backup the data > and reload them > into the other tables. Im not using this for loading postgres data to postgres. Im trying this method to load my Oracle data to postgresql tables....just trying to migrate my data from oracle to postgresql Thanks Sharmila > > If your backup method does not deal appropriately with the > characters you > mention, you may have larger problems on your hands. > > To the list: Does pg_dump escape characters that are the > same as the > delimiter? > > Thanks, > > Andrew
Andrew Gould <andrewlylegould@gmail.com> writes: > To the list: Does pg_dump escape characters that are the same as the > delimiter? Yes. The OP has not actually explained why he needs to pick a nondefault delimiter, unless maybe it is that he wants to feed the dump to some program that is too dumb to deal with escaping. regards, tom lane
On Thu, 2009-02-12 at 12:51 -0800, SHARMILA JOTHIRAJAH wrote: > > My data fields contains commas, tabs,'|' etc. So I cant use t hem as > delimiters..so I need a unique may be non-character to use as a > delimiter... > -Sharmila Is this a theoretical problem or an actual one? I haven't had any problems with the default (tab). In fact, copying from one database to another is an exceedingly common task that I do, all done with tab. dev=> create table foo (a text); CREATE TABLE Time: 385.967 ms dev=> insert into foo values (' '); -- literal tab INSERT 0 1 Time: 0.536 ms dev=> insert into foo values ('\t'); -- special character, parsed. INSERT 0 1 Time: 0.224 ms dev=> insert into foo values ('\\t'); -- backslash, t INSERT 0 1 Time: 0.183 ms dev=> copy foo to stdout; \t \t \\t Time: 0.188 ms dev=> select * from foo; a ------ \x09 \x09 \t (3 rows) Time: 0.239 ms dev=> -Mark
On Thu, 12 Feb 2009, SHARMILA JOTHIRAJAH wrote: > Im not using this for loading postgres data to postgres. Im trying this > method to load my Oracle data to postgresql tables.... just trying to > migrate my data from oracle to postgresql The point other posts here were trying to make to you is that the ideal situation is one where the source and destination databases support escaping the delimiter in the same fashion. You might be able to arrange that, depends on how you're dumping the data from Oracle. If not, when you can't get matching delimiter escapes, there is no such thing as a universal delimiter for moves between databases like this. You have to look at your source data to figure out what characters aren't used. For example, "~" and "`" are good characters to consider because they don't show up much in regular text. But if your database contained programming source code, those would be awful delimiter choices. "@" used to be a potential delimiter choice, nowadays too many databases have e-mail addresses in them for that to work anymore. Here's what you can do: pick one delimiter. Start with "~" say. Dump your data. Now, search the result for the *other* delimiter you might use (maybe "`"), using something like grep. If it doesn't show up, that other delimiter will work for you. Otherwise, swap delimiters and try again. The other option here that can avoid all sorts of headaches is to use pgloader: http://pgfoundry.org/projects/pgloader/ which will save the rows that are rejected for some reason, which is usually what happens when there's a delimiter issue. You can then edit those by hand to work around random odd delimiter problems. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Tom Lane wrote: > Andrew Gould <andrewlylegould@gmail.com> writes: >> To the list: Does pg_dump escape characters that are the same as the >> delimiter? > > Yes. The OP has not actually explained why he needs to pick a > nondefault delimiter, unless maybe it is that he wants to feed the > dump to some program that is too dumb to deal with escaping. > > regards, tom lane > Which makes me wonder, does copy accept UTF-8 input? Is it possibile to use some unicode character which is unlikely to appear in the data set as delimiter? Something like U+FFFC. Here I'm using U+25CF for it's much more likely you can see it (it's a black circle). $ python -c 'print u";".join("ABCD").encode("utf8")' > 1.txt $ cat 1.txt A;B;C;D $ python -c 'print u"\u25cf".join("ABCD").encode("utf8")' > 2.txt $ cat 2.txt A●B●C●D $ psql -c "\d test" Table "public.test" Column | Type | Modifiers --------+------+----------- a | text | b | text | c | text | d | text | $ psql -c "\copy test from '1.txt' delimiter ;" $ psql -c "\copy test from '2.txt' delimiter ●" ERROR: COPY delimiter must be a single character \copy: ERROR: COPY delimiter must be a single character It doesn't seem to work, but I need to stress that I'm using $ psql --version psql (PostgreSQL) 8.2.9 .TM.
On Tue, 2009-02-17 at 17:17 +0100, Marco Colombo wrote: > > Which makes me wonder, does copy accept UTF-8 input? Is it possibile > to use some unicode character which is unlikely to appear in the data > set as delimiter? Something like U+FFFC. I'm also not able to get unicode characters to copy in like that, but I also haven't had any problems with data containing the *tab* character. It seems to be properly escaped (see my other email in this thread) so it doesn't seem to matter if it appears in the data stream. The *only* reason I see to switch from tab is if the receiving application requires it in a different format. -Mark
On Tue, Feb 17, 2009 at 05:17:40PM +0100, Marco Colombo wrote: > Tom Lane wrote: > > Andrew Gould <andrewlylegould@gmail.com> writes: > >> To the list: Does pg_dump escape characters that are the same as the > >> delimiter? > > > > Yes. The OP has not actually explained why he needs to pick a > > nondefault delimiter, unless maybe it is that he wants to feed the > > dump to some program that is too dumb to deal with escaping. > > > > regards, tom lane > > > > Which makes me wonder, does copy accept UTF-8 input? Yes, but... > Is it possibile to use some unicode character which is unlikely to > appear in the data set as delimiter? Something like U+FFFC. No. The delimiter needs to be one byte long at the moment. The error message you're getting probably needs an update. Bug? > $ psql -c "\copy test from '2.txt' delimiter ●" > ERROR: COPY delimiter must be a single character > \copy: ERROR: COPY delimiter must be a single character Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Tue, Feb 17, 2009 at 05:17:40PM +0100, Marco Colombo wrote: >> Is it possibile to use some unicode character which is unlikely to >> appear in the data set as delimiter? Something like U+FFFC. > No. The delimiter needs to be one byte long at the moment. The error > message you're getting probably needs an update. Bug? >> $ psql -c "\copy test from '2.txt' delimiter ●" >> ERROR: COPY delimiter must be a single character >> \copy: ERROR: COPY delimiter must be a single character In 8.3 and CVS HEAD these messages are phrased as "must be a single ASCII character" which I suppose is someone's attempt at improving the situation; but IMHO this has replaced an imprecision with a lie. It works fine with non-ASCII characters, if you're working in a single-byte encoding (eg LATIN-n). I think it should say "must be a single one-byte character" and not try to talk down to the reader. regards, tom lane
On 2009-02-12, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote: > --0-1509090113-1234471898=:21817 > Content-Type: text/plain; charset=iso-8859-1 > Content-Transfer-Encoding: quoted-printable > > My data fields contains commas, tabs,'|'=A0 etc. So I cant use them as del= > imiters..so I need a unique may be non-character to use as a delimiter... posgres will escape the delimiters (tabs come out at \t) csv will work too (but with a different quoting convention)
On 2009-02-12, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote: > > Im not using this for loading postgres data to postgres. > Im trying this method to load my Oracle data to postgresql tables.... > just trying to migrate my data from oracle to postgresql dump as CSV, properly implemented CSV isn't daunted by , " or newline in the data. (but the latter can make the files hard for humans to parse) else dump as named column inserts,