Thread: pg_dump problem

pg_dump problem

From

Noel Faux

Date:

12 May 2004, 04:00:41

Hi all,

We are trying to do a dump of our database using the command line.
However, the dump falls over during a dump of a particular table.

blast=> \d psiblast_round
            Table "public.psiblast_round"
        Column        |       Type        | Modifiers
----------------------+-------------------+-----------
 id                   | integer           | not null
 region_db_comparison | integer           |
 round_number         | smallint          |
 matrix               | character varying |
 runtime              | integer           |
Indexes: psiblast_round_pkey primary key btree (id),
         psiblast_round_region_db_comp btree (region_db_comparison),
         psiblast_round_round_number btree (round_number)
Foreign Key constraints: $1 FOREIGN KEY (region_db_comparison)
REFERENCES region_db_comparison(id) ON UPDATE NO ACTION ON DELETE NO ACTION

The matrix field holds a string which is very long,  in some cases ~30K+
char long.

The erorr throw is:
pg_dump: message type 0x44 arrived from server while idle
pg_dump: dumpClasses(): SQL command failed
pg_dump: Error message from server: server closed the connection
unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor

We are using version:    psql (PostgreSQL) 7.3.4-RH

Is the var char the best type to hold such a long string and is this
likely to be the source of the problem? There is another table which
hold similar info, including the matrix field and we are able to dump
that without any problems.

Any help would be much appreciated. If there is any further info
required, I'll do my best to get it :)


Cheers
Noel

ORDER BY and Unicode

From

"M. Bastin"

Date:

12 May 2004, 10:26:03

There seems to be a big problem with Unicode for
which a solution might already exist.  Somebody
had the following problem on another mailing
list.  My suggestion is at the bottom of this
message but if another solution already exists
I'd like to hear about it.

The problem is that special characters aren't
treated right under Unicode.  Here are a few
examples:

1.   "UPPER('é')" doesn't work.
(That's an accented "e" in there if it doesn't
come through in your e-mail application)

The implication of this is that

     SELECT ... WHERE UPPER(mycolumn) LIKE UPPER('my search string')

doesn't give the functionality you'd want.
UPPER and LOWER seem to work on ASCII only.  The
Greek, French, etc. are out of luck.


2.  "ORDER BY mycolumn" gives a wrong sort order.

Uppercase ASCII characters come first, then
lowercase ASCII, then accented characters...
This really isn't what a human would like to see.


I think the two examples above illustrate this
Unicode problem quite well.  Is there an existing
solution?  If not could we work together on
creating one, as suggested at the very bottom of
this message?

Thanks,

Marc

------------------------------------------------------

You can use the translate function to solve your problem.
<http://www.postgresql.org/docs/7.4/interactive/functions-string.html>

e.g. for the letter "a":

    SELECT * FROM mytable ORDER BY
translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA');

Then you build an index like this to speed things up:

   CREATE INDEX MyIndex ON MyTable
(translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA'));

And your select queries will also be case and
accent independent from then on e.g:

    SELECT * FROM mytable WHERE
translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA')
LIKE translate('my search string', 'àáâäÀÁÂÄa',
'AAAAAAAAA');


Since the alphabet is very long you're better off
creating your own function that does this
translation for the whole alphabet and then you'd
get something like this:

    SELECT * FROM mytable WHERE
MySimpleABC(textcolumn) LIKE MySimpleABC('my
search string') ORDER BY MySimpleABC(textcolumn);

Your index would be like this:

   CREATE INDEX MyIndex ON MyTable (MySimpleABC(textcolumn));

Cheers,

Marc

PS:  Maybe we should work together to create the
mother of all functions that would do this for
all of Unicode?  Anybody else up to this?

Re: ORDER BY and Unicode

From

Stephan Szabo

Date:

12 May 2004, 10:51:29

On Wed, 12 May 2004, M. Bastin wrote:

> There seems to be a big problem with Unicode for
> which a solution might already exist.  Somebody
> had the following problem on another mailing
> list.  My suggestion is at the bottom of this
> message but if another solution already exists
> I'd like to hear about it.
>
> The problem is that special characters aren't
> treated right under Unicode.  Here are a few
> examples:
>
> 1.   "UPPER('�')" doesn't work.

IIRC, right now upper and lower only work correctly in
single byte encodings. I think when full sql collation
and character set behavior is done this problem will
go away.

> 2.  "ORDER BY mycolumn" gives a wrong sort order.
>
> Uppercase ASCII characters come first, then
> lowercase ASCII, then accented characters...
> This really isn't what a human would like to see.

This is driven by locale, what LC_COLLATE value
was the database created with (if you don't know then
pg_controldata should give that to you)?

It sounds like the locale is "C" locale which means
sort by byte value or perhaps the locale is one that isn't
for the correct encoding.

Re: ORDER BY and Unicode

From

"M. Bastin"

Date:

12 May 2004, 12:40:12

>  > 2.  "ORDER BY mycolumn" gives a wrong sort order.
>>
>>  Uppercase ASCII characters come first, then
>>  lowercase ASCII, then accented characters...
>>  This really isn't what a human would like to see.
>
>This is driven by locale, what LC_COLLATE value
>was the database created with (if you don't know then
>pg_controldata should give that to you)?
>
>It sounds like the locale is "C" locale which means
>sort by byte value or perhaps the locale is one that isn't
>for the correct encoding.

I've found this:
<http://www.postgresql.org/docs/7.4/interactive/charset.html#LOCALE>

"locale -a" isn't recognized on OS X.  How else can I find the
possible locales?

And how can I do an initdb so that sorting on Unicode will work for
French, Greek, Japanase, etc. users of a single database?

Thanks,

Marc

Re: ORDER BY and Unicode

From

Stephan Szabo

Date:

12 May 2004, 13:33:37

On Wed, 12 May 2004, M. Bastin wrote:

> >  > 2.  "ORDER BY mycolumn" gives a wrong sort order.
> >>
> >>  Uppercase ASCII characters come first, then
> >>  lowercase ASCII, then accented characters...
> >>  This really isn't what a human would like to see.
> >
> >This is driven by locale, what LC_COLLATE value
> >was the database created with (if you don't know then
> >pg_controldata should give that to you)?
> >
> >It sounds like the locale is "C" locale which means
> >sort by byte value or perhaps the locale is one that isn't
> >for the correct encoding.
>
> I've found this:
> <http://www.postgresql.org/docs/7.4/interactive/charset.html#LOCALE>
>
> "locale -a" isn't recognized on OS X.  How else can I find the
> possible locales?

Well, you can try looking around the system, maybe /usr/share/locale
or such would have the locale set.

> And how can I do an initdb so that sorting on Unicode will work for
> French, Greek, Japanase, etc. users of a single database?

AFAIK, you can't really at this time.  With an appropriately crafted
locale, you could probably get reasonably close, but I've never actually
tried to work with creating one so I don't know what's involved. And, if
two languages had different rules for two characters you'd not be
supporting both.

Re: ORDER BY and Unicode

From

Tom Lane

Date:

12 May 2004, 13:39:14

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> IIRC, right now upper and lower only work correctly in
> single byte encodings. I think when full sql collation
> and character set behavior is done this problem will
> go away.

That unfortunately isn't looking like something that will happen soon.
What might be more practical to do in the short run is to make use of
libc's <wctype.h> routines to do upper/lower case conversion, rather
than <ctype.h>.  On machines that have mbstowcs() and wcstombs() I think
this might be fairly straightforward, but I don't have the time or
background to code it up and test it.  Any volunteers out there?

Note that this would amount to assuming that LC_CTYPE matches the
database encoding, but the truth of the matter is that the existing
code fails if that's not true anyway.

            regards, tom lane

Re: pg_dump problem

From

Tom Lane

Date:

12 May 2004, 14:06:18

Noel Faux <noel.faux@med.monash.edu.au> writes:
> The erorr throw is:
> pg_dump: message type 0x44 arrived from server while idle
> pg_dump: dumpClasses(): SQL command failed
> pg_dump: Error message from server: server closed the connection
> unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
> pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor

What shows up in the postmaster log when this happens?

If you're not keeping the postmaster's log output, now would be a good
time to start.  Personally I'd suggest configuring it to log to syslog;
you'll need to enable this in postgresql.conf and possibly adjust your
syslogd configuration to determine exactly where Postgres messages go.
If that seems like too much trouble, adjust the init.d start script to
send the postmaster's stderr to some file instead of /dev/null.

            regards, tom lane

Re: ORDER BY and Unicode

From

Oliver Fromme

Date:

12 May 2004, 14:42:23

I'd like to make a small comment on this one ...

M. Bastin wrote:
 > "locale -a" isn't recognized on OS X.  How else can I find the
 > possible locales?

On most UNIX-like systems they're under /usr/share/locale
or similar.  Maybe the setlocale(3) manpage will help you
(at least on BSD it mentions the default location of the
locales).

 > And how can I do an initdb so that sorting on Unicode will work for
 > French, Greek, Japanase, etc. users of a single database?

You can't do that in general, because different languages
can use different rules for sorting and case conversion of
the same characters.

For example, in the Turkish language there is a character
"i" without a dot, and "I" with a dot (I only have an ISO-
8859-15 set right now, so I can't demonstrate them), where
upper("i") == "I" with dot, and lower("I") == "i" without
dot.  Most other languages have upper("i") == "I" and
lower("I") == "i", so it's not possible to have a locale
setting that supports both at the same time.

I guess that PostgreSQL's ORDER BY and case conversions
ignore the client's locale setting completely, and only
respect the locale of the database (when it was created).
Beware, I'm not a PostgreSQL developer, so don't take that
as an authoritative statement.  ;-)

Regards
   Oliver

--
Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"The scanf() function is a large and complex beast that often does
something almost but not quite entirely unlike what you desired."
        -- Chris Torek

Re: ORDER BY and Unicode

From

"M. Bastin"

Date:

12 May 2004, 16:39:00

>  > And how can I do an initdb so that sorting on Unicode will work for
>>  French, Greek, Japanase, etc. users of a single database?
>
>AFAIK, you can't really at this time.  With an appropriately crafted
>locale, you could probably get reasonably close, but I've never actually
>tried to work with creating one so I don't know what's involved. And, if
>two languages had different rules for two characters you'd not be
>supporting both.

Thanks Stephan!  I've found my list of locales. It's a pity only one
language can be used at a time but as you say there are conflicting
rules anyway.

The docs say there is a speed penalty on using locales.  Does anyone
have any idea on how severe this is?  I'm wondering wether I should
use the translate() function after all because of this.  It would
solve multilingual issues to a certain level and there wouldn't be a
speed penalty since the indexes would be build on the translate()
function too.

Marc

Re: ORDER BY and Unicode

From

Stephan Szabo

Date:

12 May 2004, 18:52:27

On Wed, 12 May 2004, M. Bastin wrote:

> >  > And how can I do an initdb so that sorting on Unicode will work for
> >>  French, Greek, Japanase, etc. users of a single database?
> >
> >AFAIK, you can't really at this time.  With an appropriately crafted
> >locale, you could probably get reasonably close, but I've never actually
> >tried to work with creating one so I don't know what's involved. And, if
> >two languages had different rules for two characters you'd not be
> >supporting both.
>
> Thanks Stephan!  I've found my list of locales. It's a pity only one
> language can be used at a time but as you say there are conflicting
> rules anyway.
>
> The docs say there is a speed penalty on using locales.  Does anyone
> have any idea on how severe this is?  I'm wondering wether I should

I'm not an expert really, but since you're already doing unicode I think
it's not going to be major with the one caveat that if you're doing LIKE
queries, you should look at the Operator Classes section of the
documentation about the *_pattern_ops operator classes.

> use the translate() function after all because of this.  It would
> solve multilingual issues to a certain level and there wouldn't be a
> speed penalty since the indexes would be build on the translate()
> function too.

The translate version would presumably work for cases where you want
multiple characters to sort to the same position, but if you want say an
accented A to follow a regular A I think it might be difficult to
formulate.

Re: pg_dump problem

From

Noel Faux

Date:

17 May 2004, 04:35:19

Hi Tom,

We have go postmaster writing to log now :)

Here are the errors received and the entries in the log.
pg_dump: connection not open
pg_dump: lost synchronization with server, resetting connection
pg_dump: SQL command to dump the contents of table "psiblast_round" failed: PQendcopy() failed.
pg_dump: Error message from server: FATAL: The database system is starting up
pg_dump: The command was: COPY public.psiblast_round (id, region_db_comparison, round_number, matrix, runtime) TO stdout;

//-------- LOG ENTRY ------------------------
LOG: server process (pid 7298) was terminated by signal 11
LOG: terminating any other active server processes
WARNING: Message from PostgreSQL backend:
    The Postmaster has informed me that some other backend
    died abnormally and possibly corrupted shared memory.
    I have rolled back the current transaction and am
    going to terminate your database system connection and exit.
    Please reconnect to the database system and repeat your query.
LOG: all server processes terminated; reinitializing shared memory and semaphores
LOG: connection received: host=[local]
FATAL: The database system is starting up
LOG: database system was interrupted at 2004-05-14 14:19:34 EST
LOG: checkpoint record is at 45B/7D696774
LOG: redo record is at 45B/7D696774; undo record is at 0/0; shutdown TRUE
LOG: next transaction id: 1565578013; next oid: 983864340
LOG: database system was not properly shut down; automatic recovery in progress
LOG: ReadRecord: record with zero length at 45B/7D6967B4
LOG: redo is not required
LOG: database system is ready

//-----------------------------------
The error message is different so we repeated the dump and got the same error message and the message in the log file is:

LOG: connection received: host=[local]
LOG: connection authorized: user=blast database=blast
LOG: server process (pid 14195) was terminated by signal 11
LOG: terminating any other active server processes
WARNING: Message from PostgreSQL backend:
    The Postmaster has informed me that some other backend
    died abnormally and possibly corrupted shared memory.
    I have rolled back the current transaction and am
    going to terminate your database system connection and exit.
    Please reconnect to the database system and repeat your query.
LOG: all server processes terminated; reinitializing shared memory and semaphores
LOG: database system was interrupted at 2004-05-17 15:29:40 EST
LOG: connection received: host=[local]
FATAL: The database system is starting up
LOG: checkpoint record is at 45B/7D6B9A8C
LOG: redo record is at 45B/7D6B9A8C; undo record is at 0/0; shutdown FALSE
LOG: next transaction id: 1565603031; next oid: 983872532
LOG: database system was not properly shut down; automatic recovery in progress
LOG: ReadRecord: record with zero length at 45B/7D6B9ACC
LOG: redo is not required
LOG: database system is ready

Almost identical to the previous message however the checkpoint recod, transaction and oids have changed. Has this to do with the dump process and not the entry being dumped at the time ?
Is there a way which to find out if it is a particular recod in the table which is causing this error and which one it is?
Any help would be great :)

Many thanks
Noel

Tom Lane wrote:

Noel Faux <noel.faux@med.monash.edu.au> writes:

The erorr throw is:
pg_dump: message type 0x44 arrived from server while idle
pg_dump: dumpClasses(): SQL command failed
pg_dump: Error message from server: server closed the connection 
unexpectedly      This probably means the server terminated abnormally      before or while processing the request.
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor

What shows up in the postmaster log when this happens?

If you're not keeping the postmaster's log output, now would be a good
time to start.  Personally I'd suggest configuring it to log to syslog;
you'll need to enable this in postgresql.conf and possibly adjust your
syslogd configuration to determine exactly where Postgres messages go.
If that seems like too much trouble, adjust the init.d start script to
send the postmaster's stderr to some file instead of /dev/null.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
              http://archives.postgresql.org

Re: pg_dump problem

From

Noel Faux

Date:

19 May 2004, 22:21:25

Hi all,

We found the problem. In the table psiblast_round, there was one entry which the feild matrix was corrupt. If we select all for this entry the database died :(. If we selected all but the matrix the select finished. If we selected only that entry and the matrix, the server died. So we set the feild to "" and the problem was fixed. As to why that matrix caused an error??? Don't know!!

Cheers
Noel

Noel Faux wrote:

Noel Faux <noel.faux@med.monash.edu.au> writes:

The erorr throw is:
pg_dump: message type 0x44 arrived from server while idle
pg_dump: dumpClasses(): SQL command failed
pg_dump: Error message from server: server closed the connection 
unexpectedly      This probably means the server terminated abnormally      before or while processing the request.
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor

What shows up in the postmaster log when this happens?

If you're not keeping the postmaster's log output, now would be a good
time to start.  Personally I'd suggest configuring it to log to syslog;
you'll need to enable this in postgresql.conf and possibly adjust your
syslogd configuration to determine exactly where Postgres messages go.
If that seems like too much trouble, adjust the init.d start script to
send the postmaster's stderr to some file instead of /dev/null.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
              http://archives.postgresql.org