Thread: ODBC query problem

ODBC query problem

From
Luis Magaña
Date:
Hi,

yesterday I ran vaccumdb on one of our postgresql databases.  As of
today, whenever I try to connect to the postmaster from Windows using
the latest ODBC driver the postmaster crashes and restarts itself again,
here the log output:

2003-07-16 11:26:48 [1629]   LOG:  connection authorized: user=sentinell
database=thot
2003-07-16 11:26:48 [1629]   LOG:  query: select version()
2003-07-16 11:26:48 [1629]   LOG:  duration: 0.007115 sec
2003-07-16 11:26:48 [1629]   LOG:  query: set DateStyle to 'ISO'
2003-07-16 11:26:48 [1629]   LOG:  duration: 0.001107 sec
2003-07-16 11:26:48 [1629]   LOG:  query: set geqo to 'OFF'
2003-07-16 11:26:48 [1629]   LOG:  duration: 0.000814 sec
2003-07-16 11:26:48 [1629]   LOG:  query: select oid from pg_type where
typname='lo'
2003-07-16 11:26:48 [1629]   LOG:  duration: 0.018918 sec
2003-07-16 11:26:48 [1629]   LOG:  query: select pg_client_encoding()
2003-07-16 11:26:48 [1629]   LOG:  duration: 0.002753 sec
2003-07-16 11:26:48 [1629]   LOG:  query: select relname, nspname,
relkind from pg_catalog.pg_class, pg_catalog.pg_namespace where relkind
in ('r', 'v') and nspname like 'public' and relname like
'diario_factura_embarque' and relname !~ '^pg_|^dd_' and
pg_namespace.oid = relnamespace order by nspname, relname
2003-07-16 11:26:49 [1623]   LOG:  server process (pid 1629) was
terminated by signal 11
2003-07-16 11:26:49 [1623]   LOG:  terminating any other active server
processes
2003-07-16 11:26:49 [1623]   LOG:  all server processes terminated;
reinitializing shared memory and semaphores
2003-07-16 11:26:49 [1630]   LOG:  database system was interrupted at
2003-07-16 11:26:27 CDT
2003-07-16 11:26:49 [1630]   LOG:  checkpoint record is at 0/501001E0
2003-07-16 11:26:49 [1630]   LOG:  redo record is at 0/501001E0; undo
record is at 0/0; shutdown TRUE
2003-07-16 11:26:49 [1630]   LOG:  next transaction id: 1973685; next
oid: 1644084
2003-07-16 11:26:49 [1630]   LOG:  database system was not properly shut
down; automatic recovery in progress
2003-07-16 11:26:49 [1630]   LOG:  ReadRecord: record with zero length
at 0/50100220
2003-07-16 11:26:49 [1630]   LOG:  redo is not required
2003-07-16 11:26:51 [1630]   LOG:  database system is ready

as I can see the long query crashes the post master for some unknown
reason.  I've tried this on 7.3.2 and 7.3.3 with the same results.

There is a lot of space in the hard disks so I assume that is not the
problem. Also is to note that I can connect with no problems from PHP,
psql and some other programs running on linux. But, If I run that query
from psql the postmaster also crashes.

Best Regards.

--
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


Re: ODBC query problem

From
Andrew Sullivan
Date:
On Wed, Jul 16, 2003 at 11:32:41AM -0500, Luis Maga?a wrote:
> 2003-07-16 11:26:48 [1629]   LOG:  query: select relname, nspname,
> relkind from pg_catalog.pg_class, pg_catalog.pg_namespace where relkind
> in ('r', 'v') and nspname like 'public' and relname like
> 'diario_factura_embarque' and relname !~ '^pg_|^dd_' and
> pg_namespace.oid = relnamespace order by nspname, relname
> 2003-07-16 11:26:49 [1623]   LOG:  server process (pid 1629) was
> terminated by signal 11

You don't say what platform you're running on, but I think for more
UNIXes sig 11 is SIGSEGV.  SInce you always get it with the same
query, I'd suspect (1) a bad library, which is causing buffer
overflows (2) bad disk, which has put a bad block on your disk for
one of the requested tables (which then causes the crash) or (3), and
much less likely, bad RAM.  Check your hardware, and let us know
whether you have a core file somewhere whence you can get a backtrace
(actually, since it's reproducible, you ought to be able to get a
backtrace while running the query).

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: ODBC query problem

From
Luis Magaña
Date:
Thanks,

yes, it's running solaris 8 on a SPARC with SCSI disk.

It will be really hard for me to turn the system off in order to check
the disk for surface errors since is a system in production.

Howver, I have a core file dumped whenever this happens. Let me find out
how to do the backtrace and I'll show it to you gladly.

Thanks for the help.

On Wed, 2003-07-16 at 11:47, Andrew Sullivan wrote:
> On Wed, Jul 16, 2003 at 11:32:41AM -0500, Luis Maga?a wrote:
> > 2003-07-16 11:26:48 [1629]   LOG:  query: select relname, nspname,
> > relkind from pg_catalog.pg_class, pg_catalog.pg_namespace where relkind
> > in ('r', 'v') and nspname like 'public' and relname like
> > 'diario_factura_embarque' and relname !~ '^pg_|^dd_' and
> > pg_namespace.oid = relnamespace order by nspname, relname
> > 2003-07-16 11:26:49 [1623]   LOG:  server process (pid 1629) was
> > terminated by signal 11
>
> You don't say what platform you're running on, but I think for more
> UNIXes sig 11 is SIGSEGV.  SInce you always get it with the same
> query, I'd suspect (1) a bad library, which is causing buffer
> overflows (2) bad disk, which has put a bad block on your disk for
> one of the requested tables (which then causes the crash) or (3), and
> much less likely, bad RAM.  Check your hardware, and let us know
> whether you have a core file somewhere whence you can get a backtrace
> (actually, since it's reproducible, you ought to be able to get a
> backtrace while running the query).
>
> A
--
Luis Magaña.
Invernadero Santa Rita.
www.santarita.com.mx
--
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


Re: ODBC query problem

From
Andrew Sullivan
Date:
On Wed, Jul 16, 2003 at 12:04:25PM -0500, Luis Maga?a wrote:
> Thanks,
>
> yes, it's running solaris 8 on a SPARC with SCSI disk.

Hmm.  64 bit?  There've been plenty of bugs in the 64 bit libraries,
AFAICT.  We've even discovered some of 'em ourselves, and I got so
leery that we still use a 32-bit-compiled Postgres for production
work.  IIRC, there's a package floating about for gcc 3.2.x that
compiles 64 bit binaries by default, so you may want to check this
out.

> Howver, I have a core file dumped whenever this happens. Let me find out
> how to do the backtrace and I'll show it to you gladly.

If you're using the 64 bit libraries, the bad news is that gdb is
pretty flakey with them.  So you have to use adb instead.  It's not
compeltely transparent in its operation, but the man page actually
does tell you everything you need, sort of, if you read it all about
60 times.  I have suppressed most of my knowledge of it, so I'm sorry
not to be much help.  Anyway, if you get no symbols, you probably
need to re-compile the postmaster with debugging symbols in.  If you
use gcc, there's _supposed_ to be no cost to that.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: ODBC query problem

From
"Maksim Likharev"
Date:
It very much looks like a problem I had before
do you have back trace, I will say for sure.

-----Original Message-----
From: Luis Magaña [mailto:joe666@gnovus.com]
Sent: Wednesday, July 16, 2003 10:12 AM
To: Andrew Sullivan
Cc: Postgresql General Mail List
Subject: Re: [GENERAL] ODBC query problem


Thanks, 

yes, it's running solaris 8 on a SPARC with SCSI disk.

It will be really hard for me to turn the system off in order to check
the disk for surface errors since is a system in production. 

Howver, I have a core file dumped whenever this happens. Let me find out
how to do the backtrace and I'll show it to you gladly.

Thanks for the help.

On Wed, 2003-07-16 at 11:47, Andrew Sullivan wrote:
> On Wed, Jul 16, 2003 at 11:32:41AM -0500, Luis Maga?a wrote:
> > 2003-07-16 11:26:48 [1629]   LOG:  query: select relname, nspname,
> > relkind from pg_catalog.pg_class, pg_catalog.pg_namespace where
relkind
> > in ('r', 'v') and nspname like 'public' and relname like
> > 'diario_factura_embarque' and relname !~ '^pg_|^dd_' and
> > pg_namespace.oid = relnamespace order by nspname, relname
> > 2003-07-16 11:26:49 [1623]   LOG:  server process (pid 1629) was
> > terminated by signal 11
> 
> You don't say what platform you're running on, but I think for more
> UNIXes sig 11 is SIGSEGV.  SInce you always get it with the same
> query, I'd suspect (1) a bad library, which is causing buffer
> overflows (2) bad disk, which has put a bad block on your disk for
> one of the requested tables (which then causes the crash) or (3), and
> much less likely, bad RAM.  Check your hardware, and let us know
> whether you have a core file somewhere whence you can get a backtrace
> (actually, since it's reproducible, you ought to be able to get a
> backtrace while running the query).
> 
> A
-- 
Luis Magaña.
Invernadero Santa Rita.
www.santarita.com.mx
-- 
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: ODBC query problem

From
Luis Magaña
Date:
I've compiled postgresql in 32bit mode, although I'm using the
--enable-integer-datetimes flag when configuring.

I've already moved database to a diferent partition nad got the same
results, I wonder, if it is a hardware problem, why can I access both
tables individually with no problem at all ?.

any other Ideas ?

thanks.

On Wed, 2003-07-16 at 12:16, Andrew Sullivan wrote:
> On Wed, Jul 16, 2003 at 12:04:25PM -0500, Luis Maga?a wrote:
> > Thanks,
> >
> > yes, it's running solaris 8 on a SPARC with SCSI disk.
>
> Hmm.  64 bit?  There've been plenty of bugs in the 64 bit libraries,
> AFAICT.  We've even discovered some of 'em ourselves, and I got so
> leery that we still use a 32-bit-compiled Postgres for production
> work.  IIRC, there's a package floating about for gcc 3.2.x that
> compiles 64 bit binaries by default, so you may want to check this
> out.
>
> > Howver, I have a core file dumped whenever this happens. Let me find out
> > how to do the backtrace and I'll show it to you gladly.
>
> If you're using the 64 bit libraries, the bad news is that gdb is
> pretty flakey with them.  So you have to use adb instead.  It's not
> compeltely transparent in its operation, but the man page actually
> does tell you everything you need, sort of, if you read it all about
> 60 times.  I have suppressed most of my knowledge of it, so I'm sorry
> not to be much help.  Anyway, if you get no symbols, you probably
> need to re-compile the postmaster with debugging symbols in.  If you
> use gcc, there's _supposed_ to be no cost to that.
>
> A
--
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


Re: ODBC query problem

From
Andrew Sullivan
Date:
On Wed, Jul 16, 2003 at 12:36:34PM -0500, Luis Maga?a wrote:
> I've already moved database to a diferent partition nad got the same
> results, I wonder, if it is a hardware problem, why can I access both
> tables individually with no problem at all ?.

Doesn't sound like hardware, then.  I think you really do need a
backtrace.  gdb works fine in 32 bit, but you'll need the debugging
symbols in place.  Also, what version of Postgres and which compiler?

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: ODBC query problem

From
"Maksim Likharev"
Date:
Forgotten to ask, what is locale for DB cluster?

-----Original Message-----
From: Luis Magaña [mailto:joe666@gnovus.com]
Sent: Wednesday, July 16, 2003 10:37 AM
To: Andrew Sullivan
Cc: Postgresql General Mail List
Subject: Re: [GENERAL] ODBC query problem


I've compiled postgresql in 32bit mode, although I'm using the
--enable-integer-datetimes flag when configuring.

I've already moved database to a diferent partition nad got the same
results, I wonder, if it is a hardware problem, why can I access both
tables individually with no problem at all ?.

any other Ideas ?

thanks.

On Wed, 2003-07-16 at 12:16, Andrew Sullivan wrote:
> On Wed, Jul 16, 2003 at 12:04:25PM -0500, Luis Maga?a wrote:
> > Thanks, 
> > 
> > yes, it's running solaris 8 on a SPARC with SCSI disk.
> 
> Hmm.  64 bit?  There've been plenty of bugs in the 64 bit libraries,
> AFAICT.  We've even discovered some of 'em ourselves, and I got so
> leery that we still use a 32-bit-compiled Postgres for production
> work.  IIRC, there's a package floating about for gcc 3.2.x that
> compiles 64 bit binaries by default, so you may want to check this
> out.
> 
> > Howver, I have a core file dumped whenever this happens. Let me find
out
> > how to do the backtrace and I'll show it to you gladly.
> 
> If you're using the 64 bit libraries, the bad news is that gdb is
> pretty flakey with them.  So you have to use adb instead.  It's not
> compeltely transparent in its operation, but the man page actually
> does tell you everything you need, sort of, if you read it all about
> 60 times.  I have suppressed most of my knowledge of it, so I'm sorry
> not to be much help.  Anyway, if you get no symbols, you probably
> need to re-compile the postmaster with debugging symbols in.  If you
> use gcc, there's _supposed_ to be no cost to that.
> 
> A
-- 
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: ODBC query problem

From
Luis Magaña
Date:
Locale is es_MX

and it seems that the hardware issue wil be the reason.  I've moved it
to another partition and now is working fine. I'll do more testing.

Thanks for the help.

Regards.

On Wed, 2003-07-16 at 12:52, Maksim Likharev wrote:
> Forgotten to ask, what is locale for DB cluster?
>
> -----Original Message-----
> From: Luis Magaña [mailto:joe666@gnovus.com]
> Sent: Wednesday, July 16, 2003 10:37 AM
> To: Andrew Sullivan
> Cc: Postgresql General Mail List
> Subject: Re: [GENERAL] ODBC query problem
>
>
> I've compiled postgresql in 32bit mode, although I'm using the
> --enable-integer-datetimes flag when configuring.
>
> I've already moved database to a diferent partition nad got the same
> results, I wonder, if it is a hardware problem, why can I access both
> tables individually with no problem at all ?.
>
> any other Ideas ?
>
> thanks.
>
> On Wed, 2003-07-16 at 12:16, Andrew Sullivan wrote:
> > On Wed, Jul 16, 2003 at 12:04:25PM -0500, Luis Maga?a wrote:
> > > Thanks,
> > >
> > > yes, it's running solaris 8 on a SPARC with SCSI disk.
> >
> > Hmm.  64 bit?  There've been plenty of bugs in the 64 bit libraries,
> > AFAICT.  We've even discovered some of 'em ourselves, and I got so
> > leery that we still use a 32-bit-compiled Postgres for production
> > work.  IIRC, there's a package floating about for gcc 3.2.x that
> > compiles 64 bit binaries by default, so you may want to check this
> > out.
> >
> > > Howver, I have a core file dumped whenever this happens. Let me find
> out
> > > how to do the backtrace and I'll show it to you gladly.
> >
> > If you're using the 64 bit libraries, the bad news is that gdb is
> > pretty flakey with them.  So you have to use adb instead.  It's not
> > compeltely transparent in its operation, but the man page actually
> > does tell you everything you need, sort of, if you read it all about
> > 60 times.  I have suppressed most of my knowledge of it, so I'm sorry
> > not to be much help.  Anyway, if you get no symbols, you probably
> > need to re-compile the postmaster with debugging symbols in.  If you
> > use gcc, there's _supposed_ to be no cost to that.
> >
> > A
--
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


Re: ODBC query problem

From
Luis Magaña
Date:
I've moved the database to a third location in the same disk using
pg_dumpall, the new location works with no errors, the initdb was made
without localization.

The production db was inited with es_MX locale, may that has something
to do with the problem ?.

On Wed, 2003-07-16 at 12:40, Andrew Sullivan wrote:
> On Wed, Jul 16, 2003 at 12:36:34PM -0500, Luis Maga?a wrote:
> > I've already moved database to a diferent partition nad got the same
> > results, I wonder, if it is a hardware problem, why can I access both
> > tables individually with no problem at all ?.
>
> Doesn't sound like hardware, then.  I think you really do need a
> backtrace.  gdb works fine in 32 bit, but you'll need the debugging
> symbols in place.  Also, what version of Postgres and which compiler?
>
> A
--
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


Re: ODBC query problem

From
"Maksim Likharev"
Date:
Yes it is most likely locale,
if you have core, 
try to use on Solaris
pstack you_core_file, should be "pretty distinct" back trace

-----Original Message-----
From: Luis Magaña [mailto:joe666@gnovus.com]
Sent: Wednesday, July 16, 2003 11:06 AM
To: Andrew Sullivan
Cc: Postgresql General Mail List
Subject: Re: [GENERAL] ODBC query problem


I've moved the database to a third location in the same disk using
pg_dumpall, the new location works with no errors, the initdb was made
without localization.

The production db was inited with es_MX locale, may that has something
to do with the problem ?.

On Wed, 2003-07-16 at 12:40, Andrew Sullivan wrote:
> On Wed, Jul 16, 2003 at 12:36:34PM -0500, Luis Maga?a wrote:
> > I've already moved database to a diferent partition nad got the same
> > results, I wonder, if it is a hardware problem, why can I access
both
> > tables individually with no problem at all ?.
> 
> Doesn't sound like hardware, then.  I think you really do need a
> backtrace.  gdb works fine in 32 bit, but you'll need the debugging
> symbols in place.  Also, what version of Postgres and which compiler?
> 
> A
-- 
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
      joining column's datatypes do not match

Re: ODBC query problem

From
Tom Lane
Date:
Luis =?ISO-8859-1?Q?Maga=F1a?= <joe666@gnovus.com> writes:
> I've moved the database to a third location in the same disk using
> pg_dumpall, the new location works with no errors, the initdb was made
> without localization.

> The production db was inited with es_MX locale, may that has something
> to do with the problem ?.

Yeah, according to recent reports from Maksim Likharev, there are some
bugs in Solaris' locale libraries.  It appears that strxfrm() will
sometimes write more bytes than it is supposed to, thereby clobbering
nearby data structures.  The critical code is in
src/backend/utils/adt/selfuncs.c, around line 2360 in 7.3.3:

        /* Guess that transformed string is not much bigger than original */
        xfrmsize = strlen(val) + 32;    /* arbitrary pad value here... */
        xfrmstr = (char *) palloc(xfrmsize);
        xfrmlen = strxfrm(xfrmstr, val, xfrmsize);
        if (xfrmlen >= xfrmsize)
        {
            /* Oops, didn't make it */
            pfree(xfrmstr);
            xfrmstr = (char *) palloc(xfrmlen + 1);
            xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
        }

We've been debating what to do to work around this bug.  I'd suggest
changing
        xfrmstr = (char *) palloc(xfrmsize);
to
        xfrmstr = (char *) palloc(xfrmsize + 32);
so that there is more free space available than we tell strxfrm about.
Perhaps also change
            xfrmstr = (char *) palloc(xfrmlen + 1);
to
            xfrmstr = (char *) palloc(xfrmlen + 1 + 32);
(although in theory that one should not be needed...)

If that doesn't improve matters, try 100 extra bytes instead of 32.
Please let us know how it goes.

            regards, tom lane

Re: ODBC query problem

From
"Maksim Likharev"
Date:
I would suggest, if I may, following:

replace
xfrmsize = strlen(val) + 32;
to
xfrmlen = strxfrm(NULL, val, 0) + 32

due to following reasons:
1. strxfrm(NULL, val, 0) + 1 is a correct way to determine size of
transformed buffer on
   Solaris and glibc2, addition of 32 bytes just an extra cushion.

2. xfrmlen = strxfrm(NULL, val, 0) + 32 combination works for me, 6 days
without crash, QAed ;)

Regards.


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, July 16, 2003 2:32 PM
To: joe666@gnovus.com
Cc: Andrew Sullivan; Maksim Likharev; Postgresql General Mail List
Subject: Re: [GENERAL] ODBC query problem


Luis =?ISO-8859-1?Q?Maga=F1a?= <joe666@gnovus.com> writes:
> I've moved the database to a third location in the same disk using
> pg_dumpall, the new location works with no errors, the initdb was made
> without localization.

> The production db was inited with es_MX locale, may that has something
> to do with the problem ?.

Yeah, according to recent reports from Maksim Likharev, there are some
bugs in Solaris' locale libraries.  It appears that strxfrm() will
sometimes write more bytes than it is supposed to, thereby clobbering
nearby data structures.  The critical code is in
src/backend/utils/adt/selfuncs.c, around line 2360 in 7.3.3:

        /* Guess that transformed string is not much bigger than
original */
        xfrmsize = strlen(val) + 32;    /* arbitrary pad value here...
*/
        xfrmstr = (char *) palloc(xfrmsize);
        xfrmlen = strxfrm(xfrmstr, val, xfrmsize);
        if (xfrmlen >= xfrmsize)
        {
            /* Oops, didn't make it */
            pfree(xfrmstr);
            xfrmstr = (char *) palloc(xfrmlen + 1);
            xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
        }

We've been debating what to do to work around this bug.  I'd suggest
changing
        xfrmstr = (char *) palloc(xfrmsize);
to
        xfrmstr = (char *) palloc(xfrmsize + 32);
so that there is more free space available than we tell strxfrm about.
Perhaps also change
            xfrmstr = (char *) palloc(xfrmlen + 1);
to
            xfrmstr = (char *) palloc(xfrmlen + 1 + 32);
(although in theory that one should not be needed...)

If that doesn't improve matters, try 100 extra bytes instead of 32.
Please let us know how it goes.

            regards, tom lane

Re: ODBC query problem

From
Luis Magaña
Date:
I've moved all data to a new location using pg_dum/pg_restore,

Things seems to be working properly so far, data seems to be where it
should be and complete, I did the initdb once again using es_MX as
locale.

This was the first time I face such a problem with this Solaris
installation after 14 months of production time.

If anything else came up I'll let you know, thanks for the help.

On Wed, 2003-07-16 at 16:32, Tom Lane wrote:
> Luis =?ISO-8859-1?Q?Maga=F1a?= <joe666@gnovus.com> writes:
> > I've moved the database to a third location in the same disk using
> > pg_dumpall, the new location works with no errors, the initdb was made
> > without localization.
>
> > The production db was inited with es_MX locale, may that has something
> > to do with the problem ?.
>
> Yeah, according to recent reports from Maksim Likharev, there are some
> bugs in Solaris' locale libraries.  It appears that strxfrm() will
> sometimes write more bytes than it is supposed to, thereby clobbering
> nearby data structures.  The critical code is in
> src/backend/utils/adt/selfuncs.c, around line 2360 in 7.3.3:
>
>         /* Guess that transformed string is not much bigger than original */
>         xfrmsize = strlen(val) + 32;    /* arbitrary pad value here... */
>         xfrmstr = (char *) palloc(xfrmsize);
>         xfrmlen = strxfrm(xfrmstr, val, xfrmsize);
>         if (xfrmlen >= xfrmsize)
>         {
>             /* Oops, didn't make it */
>             pfree(xfrmstr);
>             xfrmstr = (char *) palloc(xfrmlen + 1);
>             xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
>         }
>
> We've been debating what to do to work around this bug.  I'd suggest
> changing
>         xfrmstr = (char *) palloc(xfrmsize);
> to
>         xfrmstr = (char *) palloc(xfrmsize + 32);
> so that there is more free space available than we tell strxfrm about.
> Perhaps also change
>             xfrmstr = (char *) palloc(xfrmlen + 1);
> to
>             xfrmstr = (char *) palloc(xfrmlen + 1 + 32);
> (although in theory that one should not be needed...)
>
> If that doesn't improve matters, try 100 extra bytes instead of 32.
> Please let us know how it goes.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
--
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


Re: ODBC query problem

From
Tom Lane
Date:
"Maksim Likharev" <mlikharev@aurigin.com> writes:
> I would suggest, if I may, following:

Okay, okay, already ;-)

I've patched it per attached for 7.3.4.

            regards, tom lane

*** src/backend/utils/adt/selfuncs.c.orig    Wed Apr 16 00:38:05 2003
--- src/backend/utils/adt/selfuncs.c    Thu Jul 17 16:49:07 2003
***************
*** 2313,2321 ****
  convert_string_datum(Datum value, Oid typid)
  {
      char       *val;
-     char       *xfrmstr;
-     size_t        xfrmsize;
-     size_t        xfrmlen;

      switch (typid)
      {
--- 2313,2318 ----
***************
*** 2355,2371 ****

      if (!lc_collate_is_c())
      {
!         /* Guess that transformed string is not much bigger than original */
!         xfrmsize = strlen(val) + 32;    /* arbitrary pad value here... */
!         xfrmstr = (char *) palloc(xfrmsize);
!         xfrmlen = strxfrm(xfrmstr, val, xfrmsize);
!         if (xfrmlen >= xfrmsize)
!         {
!             /* Oops, didn't make it */
!             pfree(xfrmstr);
!             xfrmstr = (char *) palloc(xfrmlen + 1);
!             xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
!         }
          pfree(val);
          val = xfrmstr;
      }
--- 2352,2372 ----

      if (!lc_collate_is_c())
      {
!         char       *xfrmstr;
!         size_t        xfrmlen;
!         size_t        xfrmlen2;
!
!         /*
!          * Note: originally we guessed at a suitable output buffer size,
!          * and only needed to call strxfrm twice if our guess was too small.
!          * However, it seems that some versions of Solaris have buggy
!          * strxfrm that can write past the specified buffer length in that
!          * scenario.  So, do it the dumb way for portability.
!          */
!         xfrmlen = strxfrm(NULL, val, 0);
!         xfrmstr = (char *) palloc(xfrmlen + 1);
!         xfrmlen2 = strxfrm(xfrmstr, val, xfrmlen + 1);
!         Assert(xfrmlen2 == xfrmlen);
          pfree(val);
          val = xfrmstr;
      }



Re: ODBC query problem

From
"Maksim Likharev"
Date:
I was not pushy, wasn't I?

but honestly, better to have it there cause that would simplify
deployment,
just compile ( not patch and compile ).
Thank you.



-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, July 17, 2003 2:00 PM
To: Maksim Likharev
Cc: joe666@gnovus.com; Andrew Sullivan; Postgresql General Mail List
Subject: Re: [GENERAL] ODBC query problem


"Maksim Likharev" <mlikharev@aurigin.com> writes:
> I would suggest, if I may, following:

Okay, okay, already ;-)

I've patched it per attached for 7.3.4.

            regards, tom lane

*** src/backend/utils/adt/selfuncs.c.orig    Wed Apr 16 00:38:05 2003
--- src/backend/utils/adt/selfuncs.c    Thu Jul 17 16:49:07 2003
***************
*** 2313,2321 ****
  convert_string_datum(Datum value, Oid typid)
  {
      char       *val;
-     char       *xfrmstr;
-     size_t        xfrmsize;
-     size_t        xfrmlen;

      switch (typid)
      {
--- 2313,2318 ----
***************
*** 2355,2371 ****

      if (!lc_collate_is_c())
      {
!         /* Guess that transformed string is not much bigger than
original */
!         xfrmsize = strlen(val) + 32;    /* arbitrary pad value
here... */
!         xfrmstr = (char *) palloc(xfrmsize);
!         xfrmlen = strxfrm(xfrmstr, val, xfrmsize);
!         if (xfrmlen >= xfrmsize)
!         {
!             /* Oops, didn't make it */
!             pfree(xfrmstr);
!             xfrmstr = (char *) palloc(xfrmlen + 1);
!             xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
!         }
          pfree(val);
          val = xfrmstr;
      }
--- 2352,2372 ----

      if (!lc_collate_is_c())
      {
!         char       *xfrmstr;
!         size_t        xfrmlen;
!         size_t        xfrmlen2;
!
!         /*
!          * Note: originally we guessed at a suitable output
buffer size,
!          * and only needed to call strxfrm twice if our guess
was too small.
!          * However, it seems that some versions of Solaris have
buggy
!          * strxfrm that can write past the specified buffer
length in that
!          * scenario.  So, do it the dumb way for portability.
!          */
!         xfrmlen = strxfrm(NULL, val, 0);
!         xfrmstr = (char *) palloc(xfrmlen + 1);
!         xfrmlen2 = strxfrm(xfrmstr, val, xfrmlen + 1);
!         Assert(xfrmlen2 == xfrmlen);
          pfree(val);
          val = xfrmstr;
      }



ODBC query problem AGAIN

From
Luis Magaña
Date:
Hi,

After having moved all of the data to a new database initializaed with
es_MX as locale, the postmaster is dying and restarting every time a
program tries to read information on this tables:

pg_catalog.pg_class
pg_catalog.pg_namespace

it is important to note that if I do a simple select * from table with
either one of them server do not crash at all, the problem seems to
happen only when joining.

Another important point is that my problems started after I ran vacuumdb
-z on the database, or at least that's what I think.

Have tried this with psql, dbvisualizer and ODBC driver in windows.

One of the crashing queries is:

SELECT
n.nspname,c.relname,a.attname,a.atttypid,a.attnotnull,a.atttypmod,a.attlen,a.attnum,def.adsrc,dsc.description  FROM
pg_catalog.pg_namespacen  JOIN pg_catalog.pg_class c ON (c.relnamespace = n.oid) JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum)  LEFT JOIN
pg_catalog.pg_descriptiondsc ON (c.oid=dsc.objoid AND a.attnum = dsc.objsubid)  LEFT JOIN pg_catalog.pg_class dc ON
(dc.oid=dsc.classoidAND dc.relname='pg_class')  LEFT JOIN pg_catalog.pg_namespace dn ON (dc.relnamespace=dn.oid AND
dn.nspname='pg_catalog') WHERE a.attnum > 0 AND NOT a.attisdropped  AND n.nspname LIKE 'public'  AND c.relname LIKE
'catalogo_empaque' AND a.attname LIKE '%'  ORDER BY nspname,relname,attname 

The Log Output after that query is:

2003-07-18 16:34:59 [9127]   LOG:  server process (pid 9131) was
terminated by signal 10
2003-07-18 16:34:59 [9127]   LOG:  terminating any other active server
processes
2003-07-18 16:34:59 [9127]   LOG:  all server processes terminated;
reinitializing shared memory and semaphores
2003-07-18 16:34:59 [9134]   LOG:  database system was interrupted at
2003-07-18 16:32:58 CDT
2003-07-18 16:34:59 [9134]   LOG:  checkpoint record is at 0/2F4DAC30
2003-07-18 16:34:59 [9134]   LOG:  redo record is at 0/2F4DAC30; undo
record is at 0/0; shutdown TRUE
2003-07-18 16:34:59 [9134]   LOG:  next transaction id: 8808; next oid:
1833525
2003-07-18 16:34:59 [9134]   LOG:  database system was not properly shut
down; automatic recovery in progress
2003-07-18 16:34:59 [9134]   LOG:  ReadRecord: record with zero length
at 0/2F4DAC70
2003-07-18 16:34:59 [9134]   LOG:  redo is not required
2003-07-18 16:35:02 [9134]   LOG:  database system is ready

Plataform is Solaris 8, running on Sparc compiled with 32bit only, no
core is dumped by the crashing processes.

Any new suggestions ?, I will try locale 'C' when moving data to new
database.

Regards.

--
Luis Magaña.
Gnovus Networks & Software.
www.gnovus.com


Re: ODBC query problem

From
Luis Magaña
Date:
Thanks,

yes, it's running solaris 8 on a SPARC with SCSI disk.

It will be really hard for me to turn the system off in order to check
the disk for surface errors since is a system in production.

Howver, I have a core file dumped whenever this happens. Let me find out
how to do the backtrace and I'll show it to you gladly.

Thanks for the help.

On Wed, 2003-07-16 at 11:47, Andrew Sullivan wrote:
> On Wed, Jul 16, 2003 at 11:32:41AM -0500, Luis Maga?a wrote:
> > 2003-07-16 11:26:48 [1629]   LOG:  query: select relname, nspname,
> > relkind from pg_catalog.pg_class, pg_catalog.pg_namespace where relkind
> > in ('r', 'v') and nspname like 'public' and relname like
> > 'diario_factura_embarque' and relname !~ '^pg_|^dd_' and
> > pg_namespace.oid = relnamespace order by nspname, relname
> > 2003-07-16 11:26:49 [1623]   LOG:  server process (pid 1629) was
> > terminated by signal 11
>
> You don't say what platform you're running on, but I think for more
> UNIXes sig 11 is SIGSEGV.  SInce you always get it with the same
> query, I'd suspect (1) a bad library, which is causing buffer
> overflows (2) bad disk, which has put a bad block on your disk for
> one of the requested tables (which then causes the crash) or (3), and
> much less likely, bad RAM.  Check your hardware, and let us know
> whether you have a core file somewhere whence you can get a backtrace
> (actually, since it's reproducible, you ought to be able to get a
> backtrace while running the query).
>
> A
--
Luis Magaña.
Invernadero Santa Rita.
www.santarita.com.mx