Thread: Apparent Problem With NULL in Restoring pg_dump

Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
   The .sql file produced by pg_dump is properly terminated with '\.' as the
last line, yet I continue to encounter this error:

ERROR:  invalid input syntax for type real: "     "
CONTEXT:  COPY chemistry, line 47363, column quant: "     "

when trying to re-create the table.

   It appears that this error is generated when a row has a missing value in
the 'quant' column and the column contains '\N' in the text file. For
example,

\N    GW-22    2005-03-09    Depth to Water        \N    Feet    \N
\N    \N    \N

(It is a single line in emacs but wraps in alpine.)

   Is there a different way for me to represent NULL in this text file?
Replacing \N with NULL produces the same error.

   I'm not certain that this line is the real source of the problem but I'm
quite confident that it is the one throwing the error.

Rich

Re: Apparent Problem With NULL in Restoring pg_dump [UPDATE]

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Rich Shepard wrote:

>  It appears that this error is generated when a row has a missing value in
> the 'quant' column and the column contains '\N' in the text file. For
> example,
>
> \N    GW-22    2005-03-09    Depth to Water        \N    Feet    \N
> \N    \N    \N

   Something strange is going on here. I moved the above line from row 47363
(within the COPY block) to row 2. This produced this error:

  ERROR:  extra data after last expected column
CONTEXT:  COPY chemistry, line 2: "\N   GW-22   2005-03-09      Depth to
Water \N       Feet    \N      \N      \N      \N"

   I created a blank line just beneath this one and entered the same data on
this new row, ensuring that there are no extra data after column #10. The
results are the same:

ERROR:  extra data after last expected column
CONTEXT:  COPY chemistry, line 2: "\N   GW-22   2005-03-09      Depth to
Water \N       Feet    \N      \N      \N      \N"

   What could possibly be going on here? This row was not modified by me
after running pg_dump on the chemistry table, so why will it not be
restored?

Totally confused,

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Andy Colson
Date:
On 9/15/2011 12:31 PM, Rich Shepard wrote:
> The .sql file produced by pg_dump is properly terminated with '\.' as the
> last line, yet I continue to encounter this error:
>
> ERROR: invalid input syntax for type real: " "
> CONTEXT: COPY chemistry, line 47363, column quant: " "
>
> when trying to re-create the table.
>
> It appears that this error is generated when a row has a missing value in
> the 'quant' column and the column contains '\N' in the text file. For
> example,
>
> \N GW-22 2005-03-09 Depth to Water \N Feet \N
> \N \N \N
>
> (It is a single line in emacs but wraps in alpine.)
>
> Is there a different way for me to represent NULL in this text file?
> Replacing \N with NULL produces the same error.
>
> I'm not certain that this line is the real source of the problem but I'm
> quite confident that it is the one throwing the error.
>
> Rich
>


Can you pg_dump again, but use --inserts?  Maybe it'll offer some hints.

Also, do you have the right line endings for your os?

To restore, you are using:  psql dbname < filename
correct?

-Andy

Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Andy Colson wrote:

> Can you pg_dump again, but use --inserts?  Maybe it'll offer some hints.

Andy,

   Only if I restore /usr/local/pgsql/data/* from the backup tape of a few
days ago. I need to drop the table before trying to insert it.

> Also, do you have the right line endings for your os?

   Well, linux uses only \n, and that's the only environment in which I work.

> To restore, you are using:  psql dbname < filename
> correct?

   No, I'm running within pgsql and using the \i option. I can try it from
the command line before restoring from the backup and trying again.

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Andy Colson
Date:
On 9/15/2011 2:38 PM, Rich Shepard wrote:
> On Thu, 15 Sep 2011, Andy Colson wrote:
>
>> Can you pg_dump again, but use --inserts? Maybe it'll offer some hints.
>
> Andy,
>
> Only if I restore /usr/local/pgsql/data/* from the backup tape of a few
> days ago. I need to drop the table before trying to insert it.
>
>> Also, do you have the right line endings for your os?
>
> Well, linux uses only \n, and that's the only environment in which I work.
>
>> To restore, you are using: psql dbname < filename
>> correct?
>
> No, I'm running within pgsql and using the \i option. I can try it from
> the command line before restoring from the backup and trying again.
>
> Rich
>

Ah, I see there was a prior thread about this problem.  You said you'd
missed the \. and said it was resolved.  So is this a same file or a
different one?

-Andy


Re: Apparent Problem With NULL in Restoring pg_dump

From
Alban Hertroys
Date:
On 15 Sep 2011, at 19:31, Rich Shepard wrote:

>  The .sql file produced by pg_dump is properly terminated with '\.' as the
> last line, yet I continue to encounter this error:
>
> ERROR:  invalid input syntax for type real: "     "
> CONTEXT:  COPY chemistry, line 47363, column quant: "     "
>
> when trying to re-create the table.
>
>  It appears that this error is generated when a row has a missing value in
> the 'quant' column and the column contains '\N' in the text file. For
> example,
>
> \N    GW-22    2005-03-09    Depth to Water        \N    Feet    \N
> \N    \N    \N
>
> (It is a single line in emacs but wraps in alpine.)


You appear to have two tabs after "Depth to Water", which would be one too many.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.



Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Andy Colson wrote:

> To restore, you are using:  psql dbname < filename correct?

Andy,

   Same error.

   BTW, what prompted this was my discovery that about 1400 rows with site_id
= GW-22 had a newline appended to that string. Using emac's
search-and-replace I took those off and new that I would probably have
duplicate records when trying to replace the table. But, I did not expect
these errors of extra characters after the last datum or something about
blanks in real columns.

   If there's a better way for me to drop the \n versions and elimiate one of
the resulting duplicates, please teach me how and I'll go that route.

Thanks,

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Andy Colson wrote:

> Ah, I see there was a prior thread about this problem.  You said you'd
> missed the \. and said it was resolved.  So is this a same file or a
> different one?

Andy,

   Same file, unfortunately.

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Alban Hertroys wrote:

> You appear to have two tabs after "Depth to Water", which would be one too many.

Alban,

   I thought that I had caught all the double tabs. Thanks for seeing this
one.

   Now I'm back to the tabs-in-real-columns issue:

  ERROR:  invalid input syntax for type real: "     "
CONTEXT:  COPY chemistry, line 47364, column quant: "     "

and here's the referenced line:

\N    GW-22    2004-12-17    Calcium                 9.69999981    mg/L
\N    \N    \N      \N

Removing the extra tab between 'Calcium' and 9.69999981 produces:

  ERROR:  invalid input syntax for type numeric: "N      N"
CONTEXT:  COPY chemistry, line 47364, column northing: "N      N"

   There must be a better way to clean this table, but I don't know what it
is.

Thanks again,

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Andy Colson
Date:
On 9/15/2011 3:10 PM, Rich Shepard wrote:
> On Thu, 15 Sep 2011, Andy Colson wrote:
>
>> To restore, you are using: psql dbname < filename correct?
>
> Andy,
>
> Same error.
>
> BTW, what prompted this was my discovery that about 1400 rows with site_id
> = GW-22 had a newline appended to that string. Using emac's
> search-and-replace I took those off and new that I would probably have
> duplicate records when trying to replace the table. But, I did not expect
> these errors of extra characters after the last datum or something about
> blanks in real columns.
>
> If there's a better way for me to drop the \n versions and elimiate one of
> the resulting duplicates, please teach me how and I'll go that route.
>
> Thanks,
>
> Rich
>

It's simpler to use sql to do this.  Can you restore the table?

First you need to trim the \n and spaces:

andy=# insert into junk values (E'GW-22');
INSERT 0 1
andy=# insert into junk values (E'GW-22  \n');
INSERT 0 1
andy=# insert into junk values (E'GW-22       \n');


Here are three records, with spaces and CR's.

Trim it up:

andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;
  ?column?
----------
  [GW-22]
  [GW-22]
  [GW-22]
(3 rows)


If you have a unique index you'll wanna drop it first.  Once you get
that done, we can remove the dups.

-Andy



Re: Apparent Problem With NULL in Restoring pg_dump

From
Andy Colson
Date:
On 9/15/2011 3:17 PM, Rich Shepard wrote:
> On Thu, 15 Sep 2011, Alban Hertroys wrote:
>
>> You appear to have two tabs after "Depth to Water", which would be one
>> too many.
>
> Alban,
>
> I thought that I had caught all the double tabs. Thanks for seeing this
> one.
>
> Now I'm back to the tabs-in-real-columns issue:
>
> ERROR: invalid input syntax for type real: " "
> CONTEXT: COPY chemistry, line 47364, column quant: " "
>
> and here's the referenced line:
>
> \N GW-22 2004-12-17 Calcium 9.69999981 mg/L
> \N \N \N \N
>
> Removing the extra tab between 'Calcium' and 9.69999981 produces:
>
> ERROR: invalid input syntax for type numeric: "N N"
> CONTEXT: COPY chemistry, line 47364, column northing: "N N"
>
> There must be a better way to clean this table, but I don't know what it
> is.
>
> Thanks again,
>
> Rich
>

Yeah... I think your emacs foo messed up the file.  You should have used
VIM!

just kidding... had to be done :-)

-Andy

Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Andy Colson wrote:

> It's simpler to use sql to do this.  Can you restore the table?

Andy,

   OK. I need to provide a new client with filled in paperwork so I can get
paid. I'll return to this as soon as that's done.

   Yes, I'll restore from the backup drive (yea, dirvish!) from a couple of
days ago since I started this distraction only yesterday.

   Back when I have results.

Many thanks,

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Adrian Klaver
Date:
On Thursday, September 15, 2011 1:17:06 pm Rich Shepard wrote:
> On Thu, 15 Sep 2011, Alban Hertroys wrote:
> > You appear to have two tabs after "Depth to Water", which would be one
> > too many.
>
> Alban,
>
>    I thought that I had caught all the double tabs. Thanks for seeing this
> one.
>
>    Now I'm back to the tabs-in-real-columns issue:
>
>   ERROR:  invalid input syntax for type real: "     "
> CONTEXT:  COPY chemistry, line 47364, column quant: "     "
>
> and here's the referenced line:
>
> \N    GW-22    2004-12-17    Calcium                 9.69999981    mg/L
> \N    \N    \N      \N
>
> Removing the extra tab between 'Calcium' and 9.69999981 produces:
>
>   ERROR:  invalid input syntax for type numeric: "N      N"
> CONTEXT:  COPY chemistry, line 47364, column northing: "N      N"
>
>    There must be a better way to clean this table, but I don't know what it
> is.

This is one of those things I find spreadsheets actually useful for. Do a COPY or
/copy, in CSV format from the table and import it into a spreadsheet. I find the
grid layout of a spreadsheet very useful in picking out misplaced fields.

>
> Thanks again,
>
> Rich

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Apparent Problem With NULL in Restoring pg_dump

From
Alban Hertroys
Date:
On 16 September 2011 03:31, Adrian Klaver <adrian.klaver@gmail.com> wrote:
This is one of those things I find spreadsheets actually useful for. Do a COPY or
/copy, in CSV format from the table and import it into a spreadsheet. I find the
grid layout of a spreadsheet very useful in picking out misplaced fields.

Now that you mention it, it would probably be useful to have some kind of syntax highlighting for different white-space characters (for example, a background-color and a character; 'S' for space, 'T' for tab, 'C' for CR, 'L' for LF) for editors such as Vim and emacs.

It's not even a big step up from there to highlight odd and even columns in tab-delimited, csv, etc. data with a different background-color or some-such.

Perhaps this is what I described above: http://vim.wikia.com/wiki/Working_with_CSV_files (for Vim, not emacs)

Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Andy Colson wrote:

> First you need to trim the \n and spaces:
>
> andy=# insert into junk values (E'GW-22');
> INSERT 0 1
> andy=# insert into junk values (E'GW-22  \n');
> INSERT 0 1
> andy=# insert into junk values (E'GW-22       \n');

Andy,

   Here's what worked for me:

nevada=# \i junk.sql
CREATE TABLE
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22');
INSERT 0 803
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22 \n');
INSERT 0 0
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22     \n');
INSERT 0 0
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22\n');
INSERT 0 1409
nevada=# select '['|| rtrim(trim(trailing E'\n' from site_id)) || ']' from junk;

  ?column?
----------
  [GW-22]
  [GW-22]

and so on for 2212 rows.

> Trim it up:
>
> andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;

> If you have a unique index you'll wanna drop it first.  Once you get that
> done, we can remove the dups.

   No index on junk; I can remove it from chemistry prior to reinserting the
cleaned rows.

   Also, where can I read about the select syntax you use? I find nothing
about it in Rick van der Lans' 4th edition, the most comprehensive language
reference I've read.

Thanks,

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Thu, 15 Sep 2011, Andy Colson wrote:

> Trim it up:
> andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;

Andy,

   Scrolling through the table with rows ordered by date and chemical I find
no duplicates ... so far. However, what I do find is that the above did not
work:

  GW-22   | 1998-12-16  | Zinc                    |  0.01
  GW-22   | 1998-12-16  | pH                      |  7.83
  GW-22  +| 1999-03-09  | Alkalinity, Bicarbonate |    97
          |             |                         |
  GW-22  +| 1999-03-09  | Alkalinity, Total       |    80
          |             |                         |
  GW-22  +| 1999-03-09  | Aluminum                |  0.01
          |             |                         |
  GW-22  +| 1999-03-09  | Antimony                |

   As my previous message noted, selecting rows with two spaces before the
newline returned 0 rows.

   I'll continue looking for duplicates, but I suspect there are none. I just
need to get rid of the newlines, drop all rows for site_id from the
chemistry table, then insert the cleaned rows from the junk table.

Regards,

Rich

Re: Apparent Problem With NULL in Restoring pg_dump [SOLVED]

From
Rich Shepard
Date:
On Fri, 16 Sep 2011, Rich Shepard wrote:

>  Scrolling through the table with rows ordered by date and chemical I find
> no duplicates ... so far. However, what I do find is that the above did not
> work:

   Turns out there was 1 duplicate. Reading the psql man page and making an
error in the \copy command syntax taught me the (E' ') escape syntax. Kool!

   To close this thread, here's what I did to remove the table rows with
embedded newlines:

   1)  Made a copy of the chemistry table named 'junk'.
   2)  insert into junk select * from chemistry where site_id = 'GW-22';
   3)  insert into junk select * from chemistry where site_id = (E'GW-22\n');
   4)  \copy from junk to '</full/path/to/foo.sql>'
   5)  In emacs, use global search and replace to remove unwanted '\n'.
   6)  delete from junk where site_id = 'GW-22';
   7)  delete from junk where site_id = (E'GW-22\n');
   8)  \copy junk from '<full/path/to/foo.sql'>
   9)  delete from chemistry where site_id = 'GW-22';
   10) delete from chemistry where site_id = (E'GW-22\n');
   11) insert into chemistry select * from junk;

   The latter found the one duplicate so I fixed that in emacs, then dropped
junk and repeated steps 8-11.

   This message is as much for my future reference as it is for others who
might face the same problem.

Rich

Re: Apparent Problem With NULL in Restoring pg_dump

From
Andy Colson
Date:
On 09/16/2011 04:42 PM, Rich Shepard wrote:
> On Thu, 15 Sep 2011, Andy Colson wrote:
>
>> First you need to trim the \n and spaces:
>>
>> andy=# insert into junk values (E'GW-22');
>> INSERT 0 1
>> andy=# insert into junk values (E'GW-22 \n');
>> INSERT 0 1
>> andy=# insert into junk values (E'GW-22 \n');
>
> Andy,
>
> Here's what worked for me:
>
> nevada=# \i junk.sql
> CREATE TABLE
> nevada=# insert into junk select * from chemistry where site_id = (E'GW-22');
> INSERT 0 803
> nevada=# insert into junk select * from chemistry where site_id = (E'GW-22 \n');
> INSERT 0 0
> nevada=# insert into junk select * from chemistry where site_id = (E'GW-22 \n');
> INSERT 0 0
> nevada=# insert into junk select * from chemistry where site_id = (E'GW-22\n');
> INSERT 0 1409
> nevada=# select '['|| rtrim(trim(trailing E'\n' from site_id)) || ']' from junk;
>
> ?column? ----------
> [GW-22]
> [GW-22]
>
> and so on for 2212 rows.
>
>> Trim it up:
>>
>> andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;
>
>> If you have a unique index you'll wanna drop it first. Once you get that done, we can remove the dups.
>
> No index on junk; I can remove it from chemistry prior to reinserting the
> cleaned rows.
>
> Also, where can I read about the select syntax you use? I find nothing
> about it in Rick van der Lans' 4th edition, the most comprehensive language
> reference I've read.
>
> Thanks,
>
> Rich
>

The fine online manual:

http://www.postgresql.org/docs/current/interactive/index.html

Especially the string ops:

http://www.postgresql.org/docs/current/interactive/functions-string.html

>> Trim it up:
>> andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;
>
> Andy,
>
> Scrolling through the table with rows ordered by date and chemical I find
> no duplicates ... so far. However, what I do find is that the above did not
> work:


No, it wasnt supposed to.  A select statement builds a new result set and returns it to you, it wont update a table.
Thatselect statement was meant as an example for writing an update statement. 

Like:

update chemistry set side_id = rtrim(trim(trailing E'\n' from site_id));

If there was a unique index on chemistry(site_id), the above would throw an error, so I was warning you to drop it.

Once the site_id was trimmed, you could then delete the dups, with:

delete from chemistry where site_id = 'GW-22' and ctid <> (select min(ctid) from chemistry site_id = 'GW-22');

Those 11 steps you had... I was thinking two steps.  The update and the delete above.

Sorry, I should have been a little more clear, but, at least you got things cleaned up.  PG has a huge number of data
manipulationfunctions.  If you have to export data out of a database in order to massage it, then that's a failure of a
database. PG (and sql) were meant for just this kind of job. 


-Andy

Re: Apparent Problem With NULL in Restoring pg_dump

From
Rich Shepard
Date:
On Fri, 16 Sep 2011, Andy Colson wrote:

> Sorry, I should have been a little more clear, but, at least you got
> things cleaned up. PG has a huge number of data manipulation functions.
> If you have to export data out of a database in order to massage it, then
> that's a failure of a database. PG (and sql) were meant for just this kind
> of job.

Andy,

   Your comments are appropriate and the URLs well noted. Most list members,
particularly those who offer sound advice based on long experience and great
expertise, most likely work with postgres on a daily basis. This is true of
most application-specific mail lists.

   I'm almost certainly not alone in using postgres sporatically because my
project needs require a diverse set of tools and I don't use any one of them
day in and day out. Well, perhaps except for emacs and LaTeX/LyX since I do
most of my writing with the latter. This means that if I've not used an
application for a while I need to relearn information pushed aside by more
current project needs.

   The constant variety of needs in my business (spatial modeling with GRASS,
advanced statistical analyses with R, data storage and retrieval with
PostgreSQL, critical reviews of proposed legislation concerning water
quantity and quality, fish and macroinvertebrates, etc.) is stimulating and
satisfying. I'd almost certainly fail in a position where I'm doing the same
things (or mostly the same things) every day. The downside is that I don't
know each application intimately and completely. So I am grateful for the
patient guidance and explanations you, and others like you, provide. I save
all these messages and grep them to find specific content when I again
encounter a problem I resolved in the past.

Many thanks,

Rich