Thread: Segmentation fault while COPY in 7.3

Segmentation fault while COPY in 7.3

From
"Nicolai Tufar"
Date:
Help!
Backend crashes on any COPY command I issue:
PostgreSQL is 7.3 release, is compiled from sources.
The message I get is:

apb=# COPY maras2.mrk_yyn (bsvr_no, yyn_tur, yyn_no, yyn_syf, yyn_trh) TO
stdout;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!#


The output of strace on backend is following:

# strace -p 18883
recv(7, "QCOPY maras2.mrk_yyn (bsvr_no, y"..., 8192, 0) = 77
gettimeofday({1038713111, 629131}, NULL) = 0
_llseek(11, 0, [0], SEEK_SET)           = 0
read(11, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\37\360\37\1 b1\5\0\1"..., 8192)
= 8192
read(11, "\0\0\0\0x#\375\"\20\0\0\0008\0h\35\360\37\1 \370\235\220"...,
8192) = 8192
_llseek(12, 0, [0], SEEK_SET)           = 0
read(12, "\0\0\0\0\0#\375\"\20\0\0\0008\0@\34\0 \1 \200\237\0\1\0"..., 8192)
= 8192
_llseek(21, 40960, [40960], SEEK_SET)   = 0
read(21, "\0\0\0\0\4\251\3#\20\0\0\0004\4p\17\360\37\1 \340\237 "..., 8192)
= 8192
_llseek(22, 106496, [106496], SEEK_SET) = 0
read(22, "\0\0\0\0(\326\377\"\20\0\0\0\10\1\200\1\0 \1 \200\237\0"..., 8192)
= 8192
read(17, "\0\0\0\0\20\0\0\0\1\0\0\0\360\0\20\1\0 \1 p\237 \1\340"..., 8192)
= 8192
open("/data/pgsql/base/2015749/2015757", O_RDWR|O_LARGEFILE) = 39
_llseek(39, 0, [20037632], SEEK_END)    = 0
_llseek(39, 0, [0], SEEK_SET)           = 0
read(39, "\0\0\0\0\270\7\4#\20\0\0\0\300\1\350\1\0 \1 \270\237\216"...,
8192) = 8192
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++
#

I have recompiled postgres with debug information enabled, run gdb, attached
to backend process
and caught SIGSEGV as following:

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
480             (*header->context->methods->free_p) (header->context,
pointer);
(gdb)


The output of bt is:

(gdb) bt
#0  0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
#1  0x080b37be in CopyTo (rel=0x402be088, attnumlist=0x828a3f8, binary=0
'\0', oids=0 '\0',
    fp=0x0, delim=0x81c2a57 "\t", null_print=0x81b2c22 "\\N") at copy.c:671
#2  0x080b32f6 in DoCopy (stmt=0x827b398) at copy.c:491
#3  0x08118a6a in pg_exec_query_string (query_string=0x827af48, dest=Remote,
    parse_context=0x8245378) at postgres.c:789
#4  0x08119b89 in PostgresMain (argc=4, argv=0xbfffd2a0, username=0x8240a81
"postgres")
    at postgres.c:2016
#5  0x08101d7c in DoBackend (port=0x8240950) at postmaster.c:2293
#6  0x081016c2 in BackendStartup (port=0x8240950) at postmaster.c:1915
#7  0x08100875 in ServerLoop () at postmaster.c:1000
#8  0x08100326 in PostmasterMain (argc=1, argv=0x82270c0) at
postmaster.c:779
#9  0x080debab in main (argc=1, argv=0xbfffdc34) at main.c:210
#10 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
(gdb)



Thanks in advance,
Nic


Re: Segmentation fault while COPY in 7.3

From
Nicolai Tufar
Date:
While waiting for help I decided to fix my problem by
brute-forcing it. I commented out offending call to pfree()
in src/backend/commands/copy.c at line 671. I may introduced
a memory leak, but it works fine for me now.

Best regards,
Nic.




*** ./src/backend/commands/copy.c.orig    Sun Dec  1 06:02:34 2002
--- ./src/backend/commands/copy.c    Sun Dec  1 06:02:48 2002
***************
*** 668,674 ****
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
                      CopyAttributeOut(fp, string, delim);
!                     pfree(string);
                  }
                  else
                  {
--- 668,674 ----
                                    ObjectIdGetDatum(elements[attnum - 1]),
                              Int32GetDatum(attr[attnum - 1]->atttypmod)));
                      CopyAttributeOut(fp, string, delim);
!                     /*pfree(string);*/
                  }
                  else
                  {



Re: Segmentation fault while COPY in 7.3

From
Tom Lane
Date:
"Nicolai Tufar" <ntufar@apb.com.tr> writes:
> (gdb) bt
> #0  0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
> #1  0x080b37be in CopyTo (rel=0x402be088, attnumlist=0x828a3f8, binary=0
> '\0', oids=0 '\0',
>     fp=0x0, delim=0x81c2a57 "\t", null_print=0x81b2c22 "\\N") at copy.c:671
> #2  0x080b32f6 in DoCopy (stmt=0x827b398) at copy.c:491

Hm.  Offhand it would seem that somebody's output function is returning
a non-palloc'd string.  What are the datatypes in that table, exactly?

            regards, tom lane

Re: Segmentation fault while COPY in 7.3

From
"Nicolai Tufar"
Date:
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
>
> Hm.  Offhand it would seem that somebody's output function is returning
> a non-palloc'd string.  What are the datatypes in that table, exactly?

apb=> \d maras2.mrk_yyn
           Table "maras2.mrk_yyn"
 Column  |         Type          | Modifiers
---------+-----------------------+-----------
 bsvr_no | character varying(10) |
 yyn_tur | character varying(1)  |
 yyn_no  | numeric               |
 yyn_syf | numeric               |
 yyn_trh | date                  |

apb=>

> regards, tom lane

Thanks for attention.
Regards,
Nic



Re: Segmentation fault while COPY in 7.3

From
Joe Conway
Date:
Nicolai Tufar wrote:
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
>>Hm.  Offhand it would seem that somebody's output function is returning
>>a non-palloc'd string.  What are the datatypes in that table, exactly?
>
> apb=> \d maras2.mrk_yyn
>            Table "maras2.mrk_yyn"
>  Column  |         Type          | Modifiers
> ---------+-----------------------+-----------
>  bsvr_no | character varying(10) |
>  yyn_tur | character varying(1)  |
>  yyn_no  | numeric               |
>  yyn_syf | numeric               |
>  yyn_trh | date                  |

FWIW, I just tried to repeat this on cvs tip and could not. Seems to work just
fine:

regression=# create table foo(bsvr_no character varying(10),yyn_tur character
varying(1),yyn_no numeric,yyn_syf numeric,yyn_trh date);
CREATE TABLE
regression=# insert into foo values('a','a','1.0','1.0',now()::date);
INSERT 556825 1
regression=# copy foo to stdout;
a       a       1.0     1.0     2002-11-30
regression=# copy foo(bsvr_no) to stdout;
a
regression=# copy foo(bsvr_no,yyn_tur,yyn_no,yyn_syf,yyn_trh) to stdout;
a       a       1.0     1.0     2002-11-30
regression=# insert into foo values('a','a','1.0','1.0',null);
INSERT 556826 1
regression=# copy foo(bsvr_no,yyn_tur,yyn_no,yyn_syf,yyn_trh) to stdout;
a       a       1.0     1.0     2002-11-30
a       a       1.0     1.0     \N


Joe



Re: Segmentation fault while COPY in 7.3

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> FWIW, I just tried to repeat this on cvs tip and could not. Seems to work just
> fine:

I have a sneaking suspicion that it can only be reproduced with Nicolai's
locale setting.  Not sure exactly what aspect of the locale is causing
the problem though ...

            regards, tom lane

Re: Segmentation fault while COPY in 7.3

From
"Nicolai Tufar"
Date:
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>


> Joe Conway <mail@joeconway.com> writes:
> > FWIW, I just tried to repeat this on cvs tip and could not. Seems to
work just
> > fine:
>
> I have a sneaking suspicion that it can only be reproduced with Nicolai's
> locale setting.  Not sure exactly what aspect of the locale is causing
> the problem though ...

I am trying to reproduce the same on another server with no success.
Locale is what also comes to my mind. I have compiled PostgreSQL
with --with-nls option. All databases, including template0 are in
LATIN5 encoding. My postgresql.conf contains:

LC_MESSAGES = 'tr_TR'
LC_MONETARY = 'tr_TR'
LC_NUMERIC = 'tr_TR'
LC_TIME = 'tr_TR'
datestyle = 'European'


Regards,
Nic.


>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


Re: Segmentation fault while COPY in 7.3

From
"Nicolai Tufar"
Date:
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
>
> Can you extract more info?  For example, it would be useful to see
> exactly which line of the table the COPY dies at, and what's in that
> row.

COPY dies at the first row and the data is as following:

apb=>  select * from mrk_yyn  limit 3;
 bsvr_no | yyn_tur | yyn_no | yyn_syf | yyn_trh
---------+---------+--------+---------+---------
 045030  | T       |    176 |      39 |
 045033  | T       |    176 |      39 |
 045035  | T       |    176 |      39 |
(3 rows)

apb=>

yyn_trh semm to be null. I tried to COPY other tables, the ones
without date columns, I created new tables, but it still segfaults.
And I can not reproduce it on other system.

>
> regards, tom lane
>

Nic


Re: Segmentation fault while COPY in 7.3

From
Joe Conway
Date:
Nicolai Tufar wrote:
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
>>I have a sneaking suspicion that it can only be reproduced with Nicolai's
>>locale setting.  Not sure exactly what aspect of the locale is causing
>>the problem though ...
>
> I am trying to reproduce the same on another server with no success.
> Locale is what also comes to my mind. I have compiled PostgreSQL
> with --with-nls option. All databases, including template0 are in
> LATIN5 encoding. My postgresql.conf contains:
>
> LC_MESSAGES = 'tr_TR'
> LC_MONETARY = 'tr_TR'
> LC_NUMERIC = 'tr_TR'
> LC_TIME = 'tr_TR'
> datestyle = 'European'
>

Here's what I have:
$ pg_config --configure
'--with-tcl' '--enable-integer-datetimes' '--enable-debug' '--enable-cassert'
'--enable-nls' '--enable-depend'
'DOCBOOKSTYLE=/usr/share/sgml/docbook/dsssl-stylesheets-1.76'

testl5=# select * from pg_settings where name like 'lc%' or name = 'DateStyle';
     name     |            setting
-------------+-------------------------------
  DateStyle   | ISO with European conventions
  lc_messages | tr_TR
  lc_monetary | tr_TR
  lc_numeric  | tr_TR
  lc_time     | tr_TR
(5 rows)

testl5=# \l
          List of databases
     Name    |  Owner   | Encoding
------------+----------+-----------
  parts      | postgres | SQL_ASCII
  regression | postgres | SQL_ASCII
  template0  | postgres | SQL_ASCII
  template1  | postgres | SQL_ASCII
  test       | postgres | SQL_ASCII
  testjp     | postgres | EUC_JP
  testl5     | postgres | LATIN5
(7 rows)

testl5=#  copy foo(bsvr_no,yyn_tur,yyn_no,yyn_syf,yyn_trh) to stdout;
a       a       1.0     1.0     2002-11-30

Still works for me. Could it be related to --enable-integer-datetimes?

Joe



Re: Segmentation fault while COPY in 7.3

From
Tom Lane
Date:
Ohhh ...

Nicolai, are you running with a client encoding different from server
encoding?

I'm thinking that the real problem is that pg_server_to_client() can
return its input pointer as its result, meaning that CopyAttributeOut
might try to pfree() its input.

            regards, tom lane

Re: Segmentation fault while COPY in 7.3

From
"Nicolai Tufar"
Date:
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>


> Ohhh ...
>
> Nicolai, are you running with a client encoding different from server
> encoding?

Got it!
Gentlemen, thank you very much for assistance. The body of evidence was
slowly
growing, then, finaly Tom Lan's message have enlightened me.

It all started with initdb's warning that it can not set client's connection
to 'LATIN5'.
Okay, I said, maybe some system files are not installed. Then I tried to
create a database
with ENCODING='LATIN5'. I did not pay much attention either since my
template1 and
template0 are in LATIN5. Then on Tom's suggestion I tried to change client
encoding:

aaa=# \encoding
LATIN5
aaa=# \encoding SQL_ASCII
SQL_ASCII: invalid encoding name or conversion procedure not found
aaa=# \encoding LATIN5
LATIN5: invalid encoding name or conversion procedure not found
aaa=# \encoding SQL_ASCii
aaa=# \encoding
SQL_ASCII
aaa=# \encoding LATiN5
aaa=# \encoding
LATIN5
aaa=#

So it all falls back to Turkish "I" problem I mentioned earlier. To
perform COPY successfully I had to set client locale to 'LATiN5'
(note quotes and lower-case "i"). Any other combinations result
in segmentation fault.

Now, would it be right to change locale name handling to use ASCII-only
lower and upper-case conversions?

Many thanks to Tom Lane and Joe Conway.


I've got to get some sleep :-)
Regards,
Nic.




Re: Segmentation fault while COPY in 7.3

From
Tom Lane
Date:
"Nicolai Tufar" <ntufar@apb.com.tr> writes:
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
>> Ohhh ...
>>
>> Nicolai, are you running with a client encoding different from server
>> encoding?

> Got it!

Okay, the problem is a double free of memory in COPY OUT.  I've applied
the following patch for 7.3.1.  (A better fix would be for COPY OUT to
run in a local memory context that it could reset every so often, like
once per row, and then we could remove these risky retail pfree's entirely.
I plan to do that instead in development sources.)

            regards, tom lane

*** src/backend/commands/copy.c.orig    Fri Oct 18 20:25:36 2002
--- src/backend/commands/copy.c    Sun Dec  1 12:28:24 2002
***************
*** 1470,1478 ****
      char       *string;
      char        c;
      char        delimc = delim[0];
-
      bool        same_encoding;
!     char       *string_start;
      int            mblen;
      int            i;

--- 1470,1477 ----
      char       *string;
      char        c;
      char        delimc = delim[0];
      bool        same_encoding;
!     char       *string_start = NULL;
      int            mblen;
      int            i;

***************
*** 1481,1492 ****
      {
          string = (char *) pg_server_to_client((unsigned char *) server_string,
                                                strlen(server_string));
!         string_start = string;
      }
      else
      {
          string = server_string;
-         string_start = NULL;
      }

      for (; (c = *string) != '\0'; string += mblen)
--- 1480,1491 ----
      {
          string = (char *) pg_server_to_client((unsigned char *) server_string,
                                                strlen(server_string));
!         if (string != server_string)
!             string_start = string;
      }
      else
      {
          string = server_string;
      }

      for (; (c = *string) != '\0'; string += mblen)

Re: Segmentation fault while COPY in 7.3

From
Medi Montaseri
Date:
Something is funny about this pfree()....here is a similar crash on a 7.2.3

(gdb)
#0  0x00420874 in heap_freetuple ()
#1  0x004a8390 in acquire_sample_rows ()
#2  0x004a75c8 in analyze_rel ()
#3  0x0049f690 in vacuum ()
#4  0x005585d8 in ProcessUtility ()
#5  0x00553c78 in pg_exec_query_string ()
#6  0x00555850 in PostgresMain ()
#7  0x00524660 in DoBackend ()
#8  0x00523d08 in BackendStartup ()
#9  0x00521c18 in ServerLoop ()
#10 0x005211c8 in PostmasterMain ()
#11 0x004df3d8 in main ()
#12 0x2ae34928 in __libc_start_main () from /lib/libc.so.6
(gdb)



Nicolai Tufar wrote:

>Help!
>Backend crashes on any COPY command I issue:
>PostgreSQL is 7.3 release, is compiled from sources.
>The message I get is:
>
>apb=# COPY maras2.mrk_yyn (bsvr_no, yyn_tur, yyn_no, yyn_syf, yyn_trh) TO
>stdout;
>server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
>The connection to the server was lost. Attempting reset: Failed.
>!#
>
>
>The output of strace on backend is following:
>
># strace -p 18883
>recv(7, "QCOPY maras2.mrk_yyn (bsvr_no, y"..., 8192, 0) = 77
>gettimeofday({1038713111, 629131}, NULL) = 0
>_llseek(11, 0, [0], SEEK_SET)           = 0
>read(11, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\37\360\37\1 b1\5\0\1"..., 8192)
>= 8192
>read(11, "\0\0\0\0x#\375\"\20\0\0\0008\0h\35\360\37\1 \370\235\220"...,
>8192) = 8192
>_llseek(12, 0, [0], SEEK_SET)           = 0
>read(12, "\0\0\0\0\0#\375\"\20\0\0\0008\0@\34\0 \1 \200\237\0\1\0"..., 8192)
>= 8192
>_llseek(21, 40960, [40960], SEEK_SET)   = 0
>read(21, "\0\0\0\0\4\251\3#\20\0\0\0004\4p\17\360\37\1 \340\237 "..., 8192)
>= 8192
>_llseek(22, 106496, [106496], SEEK_SET) = 0
>read(22, "\0\0\0\0(\326\377\"\20\0\0\0\10\1\200\1\0 \1 \200\237\0"..., 8192)
>= 8192
>read(17, "\0\0\0\0\20\0\0\0\1\0\0\0\360\0\20\1\0 \1 p\237 \1\340"..., 8192)
>= 8192
>open("/data/pgsql/base/2015749/2015757", O_RDWR|O_LARGEFILE) = 39
>_llseek(39, 0, [20037632], SEEK_END)    = 0
>_llseek(39, 0, [0], SEEK_SET)           = 0
>read(39, "\0\0\0\0\270\7\4#\20\0\0\0\300\1\350\1\0 \1 \270\237\216"...,
>8192) = 8192
>--- SIGSEGV (Segmentation fault) ---
>+++ killed by SIGSEGV +++
>#
>
>I have recompiled postgres with debug information enabled, run gdb, attached
>to backend process
>and caught SIGSEGV as following:
>
>(gdb) continue
>Continuing.
>
>Program received signal SIGSEGV, Segmentation fault.
>0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
>480             (*header->context->methods->free_p) (header->context,
>pointer);
>(gdb)
>
>
>The output of bt is:
>
>(gdb) bt
>#0  0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
>#1  0x080b37be in CopyTo (rel=0x402be088, attnumlist=0x828a3f8, binary=0
>'\0', oids=0 '\0',
>    fp=0x0, delim=0x81c2a57 "\t", null_print=0x81b2c22 "\\N") at copy.c:671
>#2  0x080b32f6 in DoCopy (stmt=0x827b398) at copy.c:491
>#3  0x08118a6a in pg_exec_query_string (query_string=0x827af48, dest=Remote,
>    parse_context=0x8245378) at postgres.c:789
>#4  0x08119b89 in PostgresMain (argc=4, argv=0xbfffd2a0, username=0x8240a81
>"postgres")
>    at postgres.c:2016
>#5  0x08101d7c in DoBackend (port=0x8240950) at postmaster.c:2293
>#6  0x081016c2 in BackendStartup (port=0x8240950) at postmaster.c:1915
>#7  0x08100875 in ServerLoop () at postmaster.c:1000
>#8  0x08100326 in PostmasterMain (argc=1, argv=0x82270c0) at
>postmaster.c:779
>#9  0x080debab in main (argc=1, argv=0xbfffdc34) at main.c:210
>#10 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
>(gdb)
>
>
>
>Thanks in advance,
>Nic
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>




Re: Segmentation fault while COPY in 7.3

From
Bruce Momjian
Date:
Yes, we found a double pfree in 7.3.  There will be a fix in 7.3.1.

---------------------------------------------------------------------------

Medi Montaseri wrote:
> Something is funny about this pfree()....here is a similar crash on a 7.2.3
>
> (gdb)
> #0  0x00420874 in heap_freetuple ()
> #1  0x004a8390 in acquire_sample_rows ()
> #2  0x004a75c8 in analyze_rel ()
> #3  0x0049f690 in vacuum ()
> #4  0x005585d8 in ProcessUtility ()
> #5  0x00553c78 in pg_exec_query_string ()
> #6  0x00555850 in PostgresMain ()
> #7  0x00524660 in DoBackend ()
> #8  0x00523d08 in BackendStartup ()
> #9  0x00521c18 in ServerLoop ()
> #10 0x005211c8 in PostmasterMain ()
> #11 0x004df3d8 in main ()
> #12 0x2ae34928 in __libc_start_main () from /lib/libc.so.6
> (gdb)
>
>
>
> Nicolai Tufar wrote:
>
> >Help!
> >Backend crashes on any COPY command I issue:
> >PostgreSQL is 7.3 release, is compiled from sources.
> >The message I get is:
> >
> >apb=# COPY maras2.mrk_yyn (bsvr_no, yyn_tur, yyn_no, yyn_syf, yyn_trh) TO
> >stdout;
> >server closed the connection unexpectedly
> >        This probably means the server terminated abnormally
> >        before or while processing the request.
> >The connection to the server was lost. Attempting reset: Failed.
> >!#
> >
> >
> >The output of strace on backend is following:
> >
> ># strace -p 18883
> >recv(7, "QCOPY maras2.mrk_yyn (bsvr_no, y"..., 8192, 0) = 77
> >gettimeofday({1038713111, 629131}, NULL) = 0
> >_llseek(11, 0, [0], SEEK_SET)           = 0
> >read(11, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\37\360\37\1 b1\5\0\1"..., 8192)
> >= 8192
> >read(11, "\0\0\0\0x#\375\"\20\0\0\0008\0h\35\360\37\1 \370\235\220"...,
> >8192) = 8192
> >_llseek(12, 0, [0], SEEK_SET)           = 0
> >read(12, "\0\0\0\0\0#\375\"\20\0\0\0008\0@\34\0 \1 \200\237\0\1\0"..., 8192)
> >= 8192
> >_llseek(21, 40960, [40960], SEEK_SET)   = 0
> >read(21, "\0\0\0\0\4\251\3#\20\0\0\0004\4p\17\360\37\1 \340\237 "..., 8192)
> >= 8192
> >_llseek(22, 106496, [106496], SEEK_SET) = 0
> >read(22, "\0\0\0\0(\326\377\"\20\0\0\0\10\1\200\1\0 \1 \200\237\0"..., 8192)
> >= 8192
> >read(17, "\0\0\0\0\20\0\0\0\1\0\0\0\360\0\20\1\0 \1 p\237 \1\340"..., 8192)
> >= 8192
> >open("/data/pgsql/base/2015749/2015757", O_RDWR|O_LARGEFILE) = 39
> >_llseek(39, 0, [20037632], SEEK_END)    = 0
> >_llseek(39, 0, [0], SEEK_SET)           = 0
> >read(39, "\0\0\0\0\270\7\4#\20\0\0\0\300\1\350\1\0 \1 \270\237\216"...,
> >8192) = 8192
> >--- SIGSEGV (Segmentation fault) ---
> >+++ killed by SIGSEGV +++
> >#
> >
> >I have recompiled postgres with debug information enabled, run gdb, attached
> >to backend process
> >and caught SIGSEGV as following:
> >
> >(gdb) continue
> >Continuing.
> >
> >Program received signal SIGSEGV, Segmentation fault.
> >0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
> >480             (*header->context->methods->free_p) (header->context,
> >pointer);
> >(gdb)
> >
> >
> >The output of bt is:
> >
> >(gdb) bt
> >#0  0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
> >#1  0x080b37be in CopyTo (rel=0x402be088, attnumlist=0x828a3f8, binary=0
> >'\0', oids=0 '\0',
> >    fp=0x0, delim=0x81c2a57 "\t", null_print=0x81b2c22 "\\N") at copy.c:671
> >#2  0x080b32f6 in DoCopy (stmt=0x827b398) at copy.c:491
> >#3  0x08118a6a in pg_exec_query_string (query_string=0x827af48, dest=Remote,
> >    parse_context=0x8245378) at postgres.c:789
> >#4  0x08119b89 in PostgresMain (argc=4, argv=0xbfffd2a0, username=0x8240a81
> >"postgres")
> >    at postgres.c:2016
> >#5  0x08101d7c in DoBackend (port=0x8240950) at postmaster.c:2293
> >#6  0x081016c2 in BackendStartup (port=0x8240950) at postmaster.c:1915
> >#7  0x08100875 in ServerLoop () at postmaster.c:1000
> >#8  0x08100326 in PostmasterMain (argc=1, argv=0x82270c0) at
> >postmaster.c:779
> >#9  0x080debab in main (argc=1, argv=0xbfffdc34) at main.c:210
> >#10 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
> >(gdb)
> >
> >
> >
> >Thanks in advance,
> >Nic
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >
> >
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Segmentation fault while COPY in 7.3

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Yes, we found a double pfree in 7.3.  There will be a fix in 7.3.1.

The double pfree was in COPY, though.  This looks to be a different
issue.

>> (gdb)
>> #0  0x00420874 in heap_freetuple ()
>> #1  0x004a8390 in acquire_sample_rows ()
>> #2  0x004a75c8 in analyze_rel ()
>> #3  0x0049f690 in vacuum ()
>> #4  0x005585d8 in ProcessUtility ()
>> #5  0x00553c78 in pg_exec_query_string ()
>> #6  0x00555850 in PostgresMain ()

            regards, tom lane

Re: Segmentation fault while COPY in 7.3

From
Bruce Momjian
Date:
Ewe, yea.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Yes, we found a double pfree in 7.3.  There will be a fix in 7.3.1.
>
> The double pfree was in COPY, though.  This looks to be a different
> issue.
>
> >> (gdb)
> >> #0  0x00420874 in heap_freetuple ()
> >> #1  0x004a8390 in acquire_sample_rows ()
> >> #2  0x004a75c8 in analyze_rel ()
> >> #3  0x0049f690 in vacuum ()
> >> #4  0x005585d8 in ProcessUtility ()
> >> #5  0x00553c78 in pg_exec_query_string ()
> >> #6  0x00555850 in PostgresMain ()
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Segmentation fault while COPY in 7.3

From
Medi Montaseri
Date:
Can I have a patch and would it work against 7.2.3.....

Thanks
Bruce Momjian wrote:

>Yes, we found a double pfree in 7.3.  There will be a fix in 7.3.1.
>
>---------------------------------------------------------------------------
>
>Medi Montaseri wrote:
>
>
>>Something is funny about this pfree()....here is a similar crash on a 7.2.3
>>
>>(gdb)
>>#0  0x00420874 in heap_freetuple ()
>>#1  0x004a8390 in acquire_sample_rows ()
>>#2  0x004a75c8 in analyze_rel ()
>>#3  0x0049f690 in vacuum ()
>>#4  0x005585d8 in ProcessUtility ()
>>#5  0x00553c78 in pg_exec_query_string ()
>>#6  0x00555850 in PostgresMain ()
>>#7  0x00524660 in DoBackend ()
>>#8  0x00523d08 in BackendStartup ()
>>#9  0x00521c18 in ServerLoop ()
>>#10 0x005211c8 in PostmasterMain ()
>>#11 0x004df3d8 in main ()
>>#12 0x2ae34928 in __libc_start_main () from /lib/libc.so.6
>>(gdb)
>>
>>
>>
>>Nicolai Tufar wrote:
>>
>>
>>
>>>Help!
>>>Backend crashes on any COPY command I issue:
>>>PostgreSQL is 7.3 release, is compiled from sources.
>>>The message I get is:
>>>
>>>apb=# COPY maras2.mrk_yyn (bsvr_no, yyn_tur, yyn_no, yyn_syf, yyn_trh) TO
>>>stdout;
>>>server closed the connection unexpectedly
>>>       This probably means the server terminated abnormally
>>>       before or while processing the request.
>>>The connection to the server was lost. Attempting reset: Failed.
>>>!#
>>>
>>>
>>>The output of strace on backend is following:
>>>
>>># strace -p 18883
>>>recv(7, "QCOPY maras2.mrk_yyn (bsvr_no, y"..., 8192, 0) = 77
>>>gettimeofday({1038713111, 629131}, NULL) = 0
>>>_llseek(11, 0, [0], SEEK_SET)           = 0
>>>read(11, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\37\360\37\1 b1\5\0\1"..., 8192)
>>>= 8192
>>>read(11, "\0\0\0\0x#\375\"\20\0\0\0008\0h\35\360\37\1 \370\235\220"...,
>>>8192) = 8192
>>>_llseek(12, 0, [0], SEEK_SET)           = 0
>>>read(12, "\0\0\0\0\0#\375\"\20\0\0\0008\0@\34\0 \1 \200\237\0\1\0"..., 8192)
>>>= 8192
>>>_llseek(21, 40960, [40960], SEEK_SET)   = 0
>>>read(21, "\0\0\0\0\4\251\3#\20\0\0\0004\4p\17\360\37\1 \340\237 "..., 8192)
>>>= 8192
>>>_llseek(22, 106496, [106496], SEEK_SET) = 0
>>>read(22, "\0\0\0\0(\326\377\"\20\0\0\0\10\1\200\1\0 \1 \200\237\0"..., 8192)
>>>= 8192
>>>read(17, "\0\0\0\0\20\0\0\0\1\0\0\0\360\0\20\1\0 \1 p\237 \1\340"..., 8192)
>>>= 8192
>>>open("/data/pgsql/base/2015749/2015757", O_RDWR|O_LARGEFILE) = 39
>>>_llseek(39, 0, [20037632], SEEK_END)    = 0
>>>_llseek(39, 0, [0], SEEK_SET)           = 0
>>>read(39, "\0\0\0\0\270\7\4#\20\0\0\0\300\1\350\1\0 \1 \270\237\216"...,
>>>8192) = 8192
>>>--- SIGSEGV (Segmentation fault) ---
>>>+++ killed by SIGSEGV +++
>>>#
>>>
>>>I have recompiled postgres with debug information enabled, run gdb, attached
>>>to backend process
>>>and caught SIGSEGV as following:
>>>
>>>(gdb) continue
>>>Continuing.
>>>
>>>Program received signal SIGSEGV, Segmentation fault.
>>>0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
>>>480             (*header->context->methods->free_p) (header->context,
>>>pointer);
>>>(gdb)
>>>
>>>
>>>The output of bt is:
>>>
>>>(gdb) bt
>>>#0  0x081697bf in pfree (pointer=0x828a7b0) at mcxt.c:480
>>>#1  0x080b37be in CopyTo (rel=0x402be088, attnumlist=0x828a3f8, binary=0
>>>'\0', oids=0 '\0',
>>>   fp=0x0, delim=0x81c2a57 "\t", null_print=0x81b2c22 "\\N") at copy.c:671
>>>#2  0x080b32f6 in DoCopy (stmt=0x827b398) at copy.c:491
>>>#3  0x08118a6a in pg_exec_query_string (query_string=0x827af48, dest=Remote,
>>>   parse_context=0x8245378) at postgres.c:789
>>>#4  0x08119b89 in PostgresMain (argc=4, argv=0xbfffd2a0, username=0x8240a81
>>>"postgres")
>>>   at postgres.c:2016
>>>#5  0x08101d7c in DoBackend (port=0x8240950) at postmaster.c:2293
>>>#6  0x081016c2 in BackendStartup (port=0x8240950) at postmaster.c:1915
>>>#7  0x08100875 in ServerLoop () at postmaster.c:1000
>>>#8  0x08100326 in PostmasterMain (argc=1, argv=0x82270c0) at
>>>postmaster.c:779
>>>#9  0x080debab in main (argc=1, argv=0xbfffdc34) at main.c:210
>>>#10 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
>>>(gdb)
>>>
>>>
>>>
>>>Thanks in advance,
>>>Nic
>>>
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>>
>>>
>>>
>>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 5: Have you checked our extensive FAQ?
>>
>>http://www.postgresql.org/users-lounge/docs/faq.html
>>
>>
>>
>
>
>




Re: Segmentation fault while COPY in 7.3

From
Medi Montaseri
Date:
FYI, I'm experiencing this with Async Queries (Async Purge and Aysnc
Vacuum).

Tom Lane wrote:

>Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
>
>>Yes, we found a double pfree in 7.3.  There will be a fix in 7.3.1.
>>
>>
>
>The double pfree was in COPY, though.  This looks to be a different
>issue.
>
>
>
>>>(gdb)
>>>#0  0x00420874 in heap_freetuple ()
>>>#1  0x004a8390 in acquire_sample_rows ()
>>>#2  0x004a75c8 in analyze_rel ()
>>>#3  0x0049f690 in vacuum ()
>>>#4  0x005585d8 in ProcessUtility ()
>>>#5  0x00553c78 in pg_exec_query_string ()
>>>#6  0x00555850 in PostgresMain ()
>>>
>>>
>
>            regards, tom lane
>
>




Re: Segmentation fault while COPY in 7.3

From
Bruce Momjian
Date:
I was wrong.  We found a COPY bug in 7.3 that will be fixed in 7.3.1.
Would you compile with symbols, -g, and send a new backtrace?

---------------------------------------------------------------------------

Medi Montaseri wrote:
> FYI, I'm experiencing this with Async Queries (Async Purge and Aysnc
> Vacuum).
>
> Tom Lane wrote:
>
> >Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >
> >
> >>Yes, we found a double pfree in 7.3.  There will be a fix in 7.3.1.
> >>
> >>
> >
> >The double pfree was in COPY, though.  This looks to be a different
> >issue.
> >
> >
> >
> >>>(gdb)
> >>>#0  0x00420874 in heap_freetuple ()
> >>>#1  0x004a8390 in acquire_sample_rows ()
> >>>#2  0x004a75c8 in analyze_rel ()
> >>>#3  0x0049f690 in vacuum ()
> >>>#4  0x005585d8 in ProcessUtility ()
> >>>#5  0x00553c78 in pg_exec_query_string ()
> >>>#6  0x00555850 in PostgresMain ()
> >>>
> >>>
> >
> >            regards, tom lane
> >
> >
>
>
>
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Segmentation fault in 7.3

From
Frank van Vugt
Date:
I encountered a reproducable backend crash in the following situation:

- while running a conversion tool written in TrollTech's Qt that uses libpq

- apart from the tool, only one inactive psql-client is connected to the same
database (for extra monitoring)

- during the conversion session this tool creates a number of temporary tables

The backend crashes when at the end of the conversion a 'vacuum full analyse'
is started, more specifically it dies upon doing work on one of these
temporary tables....

Obviously, no problem is encountered when vacuuming *after* the tool has
finished.


Is this known behaviour?

Otherwise, I'd be glad to assist with a backtrace.



Regards,




Frank.


Re: Segmentation fault in 7.3

From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> Is this known behaviour?

Nope.

> Otherwise, I'd be glad to assist with a backtrace.

Please.

            regards, tom lane

Re: Segmentation fault in 7.3 while vacuuming

From
Frank van Vugt
Date:
> Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> > Is this known behaviour?
>
TL> Nope.
>
> > Otherwise, I'd be glad to assist with a backtrace.
>
TL> Please.

** PostgreSQL v7.3 configured with:
./configure --prefix=/usr/src/postgresql_debug/pgsql --enable-debug
--with-openssl --with-perl --with-python --with-tcl --with-pgport=5433

** while using a client written in TrollTech's QT v3.1 that creates a number
of temporary tables, uses a number lot of 'copy' statements and at the end
tries to 'vacuum full analyse' the database

** the backend crashes.... ;(

** last information received by the frontend:
INFO:  --Relation pg_temp_2.t_article_details--
Warning: QSqlDatabaseManager unable to open database: FATAL:  The database
system is in recovery mode


** backend's log:
LOG:  query: update stock_pick set amount = case when pallet_amount >
amount_stock then amount_stock else pallet_amount end             from
t_stock where stock_pick.article_id = t_stock.id
LOG:  recycled transaction log file 0000000000000017
LOG:  query: update t_stock set amount_stock = case when pallet_amount >
amount_stock then 0 else amount_stock - pallet_amount end              where
exists (select 1 from stock_pick where article_id = t_stock.id)
LOG:  query: insert into stock_bulk select 5, id, amount_stock, 0, 1, false,
null from t_stock
LOG:  query: SELECT 1 FROM ONLY "public"."location" x WHERE "id" = $1 FOR
UPDATE OF x
LOG:  query: SELECT 1 FROM ONLY "public"."article" x WHERE "id" = $1 FOR
UPDATE OF x
LOG:  query: SELECT 1 FROM ONLY "public"."pallet_type" x WHERE "id" = $1 FOR
UPDATE OF x
LOG:  query: truncate table t_stock
LOG:  query: drop table aux_country, aux_language, aux_purchaser,
aux_salesperson, aux_valuta, aux_address
LOG:  query: drop index temp01, temp02, temp03
LOG:  query: vacuum full verbose analyse
LOG:  recycled transaction log file 000000000000001A
LOG:  recycled transaction log file 0000000000000018
LOG:  recycled transaction log file 0000000000000019
LOG:  server process (pid 32663) was terminated by signal 11
LOG:  terminating any other active server processes
WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
LOG:  connection received: host=192.168.0.79 port=33762
FATAL:  The database system is in recovery mode
LOG:  all server processes terminated; reinitializing shared memory and
semaphores
LOG:  database system was interrupted at 2002-12-05 12:22:50 CET
LOG:  checkpoint record is at 0/1DAEA234
LOG:  redo record is at 0/1D8D2C10; undo record is at 0/0; shutdown FALSE
LOG:  next transaction id: 5145; next oid: 451153
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/1D8D2C10
LOG:  ReadRecord: record with zero length at 0/1DB74550
LOG:  redo done at 0/1DB7452C
LOG:  recycled transaction log file 000000000000001B
LOG:  recycled transaction log file 000000000000001C
LOG:  database system is ready


** backtrace of the coredump:
#0  _mdfd_getrelnfd (reln=0x0) at md.c:854
854             fd = RelationGetFile(reln);
(gdb) bt
#0  _mdfd_getrelnfd (reln=0x0) at md.c:854
#1  0x8123a0f in _mdfd_getseg (reln=0x0, blkno=13) at md.c:878
#2  0x812349d in mdwrite (reln=0x0, blocknum=13, buffer=0x848dde8 "") at
md.c:461
#3  0x8123e31 in smgrwrite (which=0, reln=0x0, blocknum=13, buffer=0x848dde8
"") at smgr.c:328
#4  0x811be2a in LocalBufferAlloc (reln=0x413979d0, blockNum=0,
foundPtr=0xbfffebb7 "¿Ìëÿ¿@§\021\bÐy9A") at localbuf.c:100
#5  0x811a7c2 in ReadBufferInternal (reln=0x413979d0, blockNum=0,
bufferLockHeld=0 '\000') at bufmgr.c:144
#6  0x811a740 in ReadBuffer (reln=0x413979d0, blockNum=0) at bufmgr.c:112
#7  0x80d7d19 in scan_heap (vacrelstats=0x8389430, onerel=0x413979d0,
vacuum_pages=0xbfffed50, fraged_pages=0xbfffed40) at vacuum.c:1069
#8  0x80d7a79 in full_vacuum_rel (onerel=0x413979d0, vacstmt=0x8313858) at
vacuum.c:910
#9  0x80d799e in vacuum_rel (relid=374498, vacstmt=0x8313858,
expected_relkind=114 'r') at vacuum.c:827
#10 0x80d73a3 in vacuum (vacstmt=0x8313858) at vacuum.c:290
#11 0x8126ff6 in ProcessUtility (parsetree=0x8313858, dest=Remote,
completionTag=0xbfffee30 "") at utility.c:713
#12 0x8124dcc in pg_exec_query_string (query_string=0x8313688, dest=Remote,
parse_context=0x82ddb70) at postgres.c:789
#13 0x8125df7 in PostgresMain (argc=4, argv=0xbffff058, username=0x82d4a81
"postgres") at postgres.c:2016
#14 0x810eab9 in DoBackend (port=0x82d4950) at postmaster.c:2293
#15 0x810e467 in BackendStartup (port=0x82d4950) at postmaster.c:1915
#16 0x810d7d4 in ServerLoop () at postmaster.c:1018
#17 0x810d331 in PostmasterMain (argc=4, argv=0x82bce00) at postmaster.c:779
#18 0x80ee909 in main (argc=4, argv=0xbffff974) at main.c:210
#19 0x400d4577 in __libc_start_main () from /lib/libc.so.6
(gdb)



If you need more details on machine/tables/etc. just let me know.
 -> the backend doesn't crash if the vacuum is done through psql AFTER the
clients work is done (and the temporary tables are automatically destroyed)











Regards,






Frank.

Re: Segmentation fault in 7.3 while vacuuming

From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> ** while using a client written in TrollTech's QT v3.1 that creates a number
> of temporary tables, uses a number lot of 'copy' statements and at the end
> tries to 'vacuum full analyse' the database

Drat --- looks like the local buffer manager code has gotten out of sync
with the relcache code.  In particular I note the following in localbuf.c

        Relation    bufrel = RelationNodeCacheGetRelation(bufHdr->tag.rnode);

        /*
         * The relcache is not supposed to throw away temp rels, so this
         * should always succeed.
         */
        Assert(bufrel != NULL);

which was true at the time it was written, but is so no longer :-(

Will fix.

            regards, tom lane

Re: Segmentation fault in 7.3 while vacuuming

From
Tom Lane
Date:
I said:
> Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
>> ** while using a client written in TrollTech's QT v3.1 that creates a number
>> of temporary tables, uses a number lot of 'copy' statements and at the end
>> tries to 'vacuum full analyse' the database

> Drat --- looks like the local buffer manager code has gotten out of sync
> with the relcache code.

Here is the fix if you need to patch this locally.

            regards, tom lane


*** src/backend/storage/buffer/localbuf.c.orig    Wed Sep  4 16:31:25 2002
--- src/backend/storage/buffer/localbuf.c    Thu Dec  5 17:48:10 2002
***************
*** 90,108 ****
      {
          Relation    bufrel = RelationNodeCacheGetRelation(bufHdr->tag.rnode);

-         /*
-          * The relcache is not supposed to throw away temp rels, so this
-          * should always succeed.
-          */
-         Assert(bufrel != NULL);
-
          /* flush this page */
!         smgrwrite(DEFAULT_SMGR, bufrel, bufHdr->tag.blockNum,
!                   (char *) MAKE_PTR(bufHdr->data));
!         LocalBufferFlushCount++;

!         /* drop refcount incremented by RelationNodeCacheGetRelation */
!         RelationDecrementReferenceCount(bufrel);
      }

      /*
--- 90,113 ----
      {
          Relation    bufrel = RelationNodeCacheGetRelation(bufHdr->tag.rnode);

          /* flush this page */
!         if (bufrel == (Relation) NULL)
!         {
!             smgrblindwrt(DEFAULT_SMGR,
!                          bufHdr->tag.rnode,
!                          bufHdr->tag.blockNum,
!                          (char *) MAKE_PTR(bufHdr->data));
!         }
!         else
!         {
!             smgrwrite(DEFAULT_SMGR, bufrel,
!                       bufHdr->tag.blockNum,
!                       (char *) MAKE_PTR(bufHdr->data));
!             /* drop refcount incremented by RelationNodeCacheGetRelation */
!             RelationDecrementReferenceCount(bufrel);
!         }

!         LocalBufferFlushCount++;
      }

      /*