Thread: SQL command to dump the contents of table failed: PQendcopy() failed. Error message from server: socket not open

Hi:

I run a cron job every day to dump all the tables in my 7.4 Postgres
database. For one of the tables (sample) the command returns this error:

-bash-2.05b$ /usr/local/pgsql/bin/pg_dump -Fc -t sample heos -f
/home/bu/5/sample.dump
pg_dump: socket not open
pg_dump: SQL command to dump the contents of table "sample" failed:
PQendcopy() failed.
pg_dump: Error message from server: socket not open
pg_dump: The command was: COPY public.sample (field1, field2, field3,
field4, field5) TO stdout;

This is what the log shows:

-bash-2.05b$ more /usr/local/pgsql/logfile
LOG:  database system was shut down at 2005-09-23 13:43:16 CEST
LOG:  checkpoint record is at 122/E90B52AC
LOG:  redo record is at 122/E90B52AC; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 2227508; next OID: 30689327
LOG:  database system is ready
LOG:  server process (PID 27688) was terminated by signal 11
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2005-09-23 13:54:37 CEST
LOG:  checkpoint record is at 122/E90C3D38
LOG:  redo record is at 122/E90C3D38; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 2227617; next OID: 30697519
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 122/E90C3D78
LOG:  redo is not required
LOG:  database system is ready


This happens every day consistently. The table is quite big, about 1
million tuples and it is vacuumed.

Any help would be appreciated.

Cheers, Ruben.

On Fri, Sep 23, 2005 at 01:34:18PM +0200, ruben wrote:
>
> LOG:  server process (PID 27688) was terminated by signal 11

This suggests a bug in the backend.  There should be a core dump
somewhere under $PGDATA (unless resource limits prevent it or your
system is configured to put core dumps elsewhere) -- can you use a
debugger to get a stack trace from it?  What exact version of
PostgreSQL are you running and on what operating system?  Do you
have any non-standard extensions to PostgreSQL (custom types,
third-party modules, etc.)?

--
Michael Fuhr

Re: SQL command to dump the contents of table failed: PQendcopy()

From
ruben
Date:
Hi Michael:

The operating system is Red Hat Linux release 8.0 (Psyche) and
PostgreSQL version is 7.4.6., without non-standard extensions.

I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
don't know how to debug it to get a stack trace, I'll find out).

Thanks, Ruben.




Michael Fuhr wrote:

> On Fri, Sep 23, 2005 at 01:34:18PM +0200, ruben wrote:
>
>>LOG:  server process (PID 27688) was terminated by signal 11
>
>
> This suggests a bug in the backend.  There should be a core dump
> somewhere under $PGDATA (unless resource limits prevent it or your
> system is configured to put core dumps elsewhere) -- can you use a
> debugger to get a stack trace from it?  What exact version of
> PostgreSQL are you running and on what operating system?  Do you
> have any non-standard extensions to PostgreSQL (custom types,
> third-party modules, etc.)?
>

Re: SQL command to dump the contents of table failed: PQendcopy()

From
Tom Lane
Date:
ruben <ruben20@superguai.com> writes:
> I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
> don't know how to debug it to get a stack trace, I'll find out).

It would normally be in the per-database subdirectory
($PGDATA/base/NNN/) for the database where the problem occurs.  However,
if you don't see a core file there either, that probably means the
postmaster was started under "ulimit -c 0" to prevent core dumps.  Add
"ulimit -c unlimited" to the postmaster start script and restart it.

(For the sake of the archives, I'll mention that as of PG 8.1 core dumps
will appear directly in $PGDATA, not in its subdirectories.)

            regards, tom lane

On Fri, Sep 23, 2005 at 05:11:19PM +0200, ruben wrote:
> The operating system is Red Hat Linux release 8.0 (Psyche) and
> PostgreSQL version is 7.4.6., without non-standard extensions.
>
> I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
> don't know how to debug it to get a stack trace, I'll find out).

Did you look everywhere under $PGDATA or just in that directory?
As I recall, released versions of PostgreSQL usually dump core under
$PGDATA/base/<database oid>.  However, it's also possible that your
coredumpsize resource limit prevents core dumps; you could fix that
by putting a command like "ulimit -c unlimited" in your PostgreSQL
startup script and then stopping and restarting PostgreSQL.

Once you have a core dump, you can get a stack trace with gdb:

$ gdb /path/to/postgres /path/to/core
...
(gdb) bt

If your postgres binary was built with debugging symbols then the
stack trace should show function names, file names, and line numbers.

Can you duplicate the backend crash from psql if you issue the COPY
command that pg_dump complained about?  What about if you issue a
SELECT for all records in the table?  What does "\d tablename" show
for the table in question?

--
Michael Fuhr

Re: SQL command to dump the contents of table failed: PQendcopy()

From
ruben
Date:
Thanks Tom and Michael:

Michael Fuhr wrote:

> On Fri, Sep 23, 2005 at 05:11:19PM +0200, ruben wrote:
>
>>The operating system is Red Hat Linux release 8.0 (Psyche) and
>>PostgreSQL version is 7.4.6., without non-standard extensions.
>>
>>I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
>>don't know how to debug it to get a stack trace, I'll find out).
>
>
> Did you look everywhere under $PGDATA or just in that directory?
> As I recall, released versions of PostgreSQL usually dump core under
> $PGDATA/base/<database oid>.  However, it's also possible that your
> coredumpsize resource limit prevents core dumps; you could fix that
> by putting a command like "ulimit -c unlimited" in your PostgreSQL
> startup script and then stopping and restarting PostgreSQL.
>
> Once you have a core dump, you can get a stack trace with gdb:
>
> $ gdb /path/to/postgres /path/to/core
> ...
> (gdb) bt
>
> If your postgres binary was built with debugging symbols then the
> stack trace should show function names, file names, and line numbers.

I cannot find a core file, ulimit is set to unlimit. I guess I'm doing
something wrong:

-bash-2.05b$ ulimit
unlimited
-bash-2.05b$ find /usr/local/pgsql -name '*core*' -print
-bash-2.05b$


> Can you duplicate the backend crash from psql if you issue the COPY
> command that pg_dump complained about?

-bash-2.05b$ /usr/local/pgsql/bin/pg_dump -Fc -t llamadas heos -f
/home/buheos/5/llamadas3.dump
pg_dump: socket not open
pg_dump: SQL command to dump the contents of table "llamadas" failed:
PQendcopy() failed.
pg_dump: Error message from server: socket not open
pg_dump: The command was: COPY public.llamadas (cod_empresa,
fecha_llamada, tfno_origen, tfno_destino, duracion_llamada,
hora_llamada, cod_destino_llamada, cod_pais_destino,
cod_destino_internacional, franja_horaria, importe, cod_fuente,
precio_coste_llamada, observaciones_llamada, coment_llamada,
fecha_factura, num_factura, fecha_alta, fecha_ult_mod, fecha_sis_alta,
usuario_alta, i_a_alta, fecha_sis_ult_mod, usuario_ult_mod, i_a_ult_mod,
periodicidad_facturacion, cod_operador, franja_horaria_operador,
fichero_origen, cod_destino_internacional_operador) TO stdout;


> What about if you issue a SELECT for all records in the table?


heos=# select * from llamadas;
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING:
terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
Failed.

In the logfile:

LOG:  server process (PID 7069) was terminated by signal 11
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2005-09-26 16:49:43 CEST
LOG:  checkpoint record is at 125/858E0144
LOG:  redo record is at 125/858E0144; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 2270061; next OID: 30820346
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 125/858E0184
LOG:  redo is not required
LOG:  database system is ready
LOG:  unexpected EOF on client connection



 > What does "\d tablename" show for the table in question?

heos=# \d llamadas;
                                               Table "public.llamadas"
                Column               |           Type           |
               Modifiers
------------------------------------+--------------------------+----------------------------------------------------
  cod_empresa                        | smallint                 | not null
  fecha_llamada                      | date                     |
  tfno_origen                        | character(15)            |
  tfno_destino                       | character(15)            |
  duracion_llamada                   | integer                  |
  hora_llamada                       | time without time zone   |
default ('now'::text)::time(6) with time zone
  cod_destino_llamada                | character(1)             |
  cod_pais_destino                   | integer                  |
  cod_destino_internacional          | character(15)            |
  franja_horaria                     | character(1)             |
  importe                            | real                     |
  cod_fuente                         | integer                  |
  precio_coste_llamada               | real                     |
  observaciones_llamada              | character varying(100)   |
  coment_llamada                     | character varying(100)   |
  fecha_factura                      | date                     |
  num_factura                        | integer                  |
  fecha_alta                         | date                     |
  fecha_ult_mod                      | date                     |
  fecha_sis_alta                     | timestamp with time zone |
default ('now'::text)::timestamp(6) with time zone
  usuario_alta                       | character(10)            |
  i_a_alta                           | character(15)            |
  fecha_sis_ult_mod                  | timestamp with time zone |
  usuario_ult_mod                    | character(10)            |
  i_a_ult_mod                        | character(15)            |
  periodicidad_facturacion           | character(1)             |
  cod_operador                       | character(2)             |
  franja_horaria_operador            | character(1)             |
  fichero_origen                     | character varying(100)   |
  cod_destino_internacional_operador | character(15)            |
Indexes:
     "llamadas_i01" btree (cod_empresa, fecha_llamada, tfno_origen)




Re: SQL command to dump the contents of table failed: PQendcopy()

From
Tom Lane
Date:
ruben <ruben20@superguai.com> writes:
>> As I recall, released versions of PostgreSQL usually dump core under
>> $PGDATA/base/<database oid>.  However, it's also possible that your
>> coredumpsize resource limit prevents core dumps; you could fix that
>> by putting a command like "ulimit -c unlimited" in your PostgreSQL
>> startup script and then stopping and restarting PostgreSQL.

> I cannot find a core file, ulimit is set to unlimit. I guess I'm doing
> something wrong:

> -bash-2.05b$ ulimit
> unlimited
> -bash-2.05b$ find /usr/local/pgsql -name '*core*' -print

(1) The fact that it's unlimited in your user environment doesn't prove
that it's unlimited in the environment the postmaster is started in.

(2) I forget which constraint ulimit-with-no-argument prints, but it's
not core file size.  (Try "ulimit -a")

Please actually follow the advice given to you above.

            regards, tom lane

Re: SQL command to dump the contents of table failed: PQendcopy()

From
Peter Wiersig
Date:
On Mon, Sep 26, 2005 at 07:03:06PM +0200, ruben wrote:
>
> I guess I'm doing something wrong:
>
> -bash-2.05b$ ulimit
> unlimited

Please read manpages, in this case bash: ulimit -a

Peter

Re: SQL command to dump the contents of table failed: PQendcopy()

From
ruben
Date:
Hi Tom:

No way to get the core dump, this is what I did:

1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start
in the "start" section of the script:

    ...
    ulimit -c unlimited
    su -l postgres -s /bin/sh -c "/usr/local/pgsql/bin/pg_ctl -D $PGDATA
-p /usr/local/pgsql/bin/postmaster -l $DIRLOG/logfile start > /dev/null
2>&1" < /dev/null
    ...

I tried "man ulimit" but it didn't work, "man bash" returns the command
help.

2) Restart postmaster

    /etc/rc.d/init.d/postgresql restart

3) Run the query that produces the crash.

4) Search core file:

    -bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print
    -bash-2.05b$

    Nothing comes up.


Thanks for your help.
Ruben.



Tom Lane wrote:

> ruben <ruben20@superguai.com> writes:
>
>>>As I recall, released versions of PostgreSQL usually dump core under
>>>$PGDATA/base/<database oid>.  However, it's also possible that your
>>>coredumpsize resource limit prevents core dumps; you could fix that
>>>by putting a command like "ulimit -c unlimited" in your PostgreSQL
>>>startup script and then stopping and restarting PostgreSQL.
>
>
>>I cannot find a core file, ulimit is set to unlimit. I guess I'm doing
>>something wrong:
>
>
>>-bash-2.05b$ ulimit
>>unlimited
>>-bash-2.05b$ find /usr/local/pgsql -name '*core*' -print
>
>
> (1) The fact that it's unlimited in your user environment doesn't prove
> that it's unlimited in the environment the postmaster is started in.
>
> (2) I forget which constraint ulimit-with-no-argument prints, but it's
> not core file size.  (Try "ulimit -a")
>
> Please actually follow the advice given to you above.
>
>             regards, tom lane
>


Re: SQL command to dump the contents of table failed: PQendcopy()

From
Martijn van Oosterhout
Date:
On Wed, Sep 28, 2005 at 11:40:11AM +0200, ruben wrote:
> Hi Tom:
>
> No way to get the core dump, this is what I did:
>
> 1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start
> in the "start" section of the script:

<snip>

> 2) Restart postmaster
>
>    /etc/rc.d/init.d/postgresql restart

Does that run the start section? Maybe you need to stop/start. Maybe
the bash_profile/bashrc for the postgres user resets the core limit.

> 4) Search core file:
>
>    -bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print
>    -bash-2.05b$

It'll be under $PGDATA, where is that? It may be under
/var/lib/postgres, depending on how you installed...

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: SQL command to dump the contents of table failed: PQendcopy()

From
ruben
Date:
Thanks Martijn:

Martijn van Oosterhout wrote:

> On Wed, Sep 28, 2005 at 11:40:11AM +0200, ruben wrote:
>
>>Hi Tom:
>>
>>No way to get the core dump, this is what I did:
>>
>>1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start
>>in the "start" section of the script:
>
>
> <snip>
>
>>2) Restart postmaster
>>
>>   /etc/rc.d/init.d/postgresql restart
>
>
> Does that run the start section? Maybe you need to stop/start. Maybe
> the bash_profile/bashrc for the postgres user resets the core limit.

The server has been rebooted.

This is the postgres start script ($PGDATA is /usr/local/pgsql/):
   http://80.33.3.245/temp/postgres.txt

Regading the core limit:

-bash-2.05b$ whoami
postgres

-bash-2.05b$ cat ~/.bash_profile
PGDATA=/var/lib/pgsql/data
[ -f $PGDATA/../initdb.i18n ] &&  source $PGDATA/../initdb.i18n
export PGDATA

-bash-2.05b$ ulimit -a
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) 8192
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited

>>4) Search core file:
>>
>>   -bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print
>>   -bash-2.05b$
>
>
> It'll be under $PGDATA, where is that? It may be under
> /var/lib/postgres, depending on how you installed...

Indeed I searched the whole file structure for core files.


Regards.