Thread: postgresql 8 abort with signal 10

postgresql 8 abort with signal 10

From
Alexandre Biancalana
Date:
Hi list,

 I'm running postgresql 8.0.1 on FreeBSD 4.11-STABLE, the machine is
and AMD Sempron 2.2, 1GB Ram..

 I use postgresql as database for dspam, an spam classification
program. This database have and moderated use, on averange 10
simultaneous conections executing relative big queries using "in"
clausule.

Watching postgresql logs I see the following messages ocurs a lot of
times in a day:

May  3 06:58:44 e-filter postgres[250]: [21-1] LOG:  server process
(PID 59608) was terminated by signal 10
May  3 06:58:44 e-filter postgres[250]: [22-1] LOG:  terminating any
other active server processes
May  3 06:58:44 e-filter postgres[59605]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59605]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59605]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59605]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:44 e-filter postgres[59607]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59607]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59607]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59607]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:44 e-filter postgres[59606]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59606]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59606]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59606]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:44 e-filter postgres[59626]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59626]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59626]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59626]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:44 e-filter postgres[59628]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59629]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59629]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59629]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59629]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:44 e-filter postgres[59628]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59628]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59628]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:44 e-filter postgres[59609]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59609]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59609]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59609]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:44 e-filter postgres[59627]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:44 e-filter postgres[59627]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:44 e-filter postgres[59627]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:44 e-filter postgres[59627]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:45 e-filter postgres[69093]: [23-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:45 e-filter postgres[69093]: [23-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:45 e-filter postgres[69093]: [23-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:45 e-filter postgres[69093]: [23-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:45 e-filter postgres[59620]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:46 e-filter postgres[59620]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:46 e-filter postgres[59620]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:46 e-filter postgres[59620]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:46 e-filter postgres[59619]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:46 e-filter postgres[59619]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:46 e-filter postgres[59619]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:46 e-filter postgres[59619]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:46 e-filter postgres[59624]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:46 e-filter postgres[59624]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:46 e-filter postgres[59624]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:46 e-filter postgres[59624]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:46 e-filter postgres[59623]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:46 e-filter postgres[59623]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:46 e-filter postgres[59623]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:46 e-filter postgres[59623]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:46 e-filter postgres[59625]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:46 e-filter postgres[59625]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:46 e-filter postgres[59625]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:46 e-filter postgres[59625]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:46 e-filter postgres[59622]: [21-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:46 e-filter postgres[59622]: [21-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 06:58:46 e-filter postgres[59622]: [21-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 06:58:46 e-filter postgres[59622]: [21-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 06:58:46 e-filter postgres[59621]: [22-1] WARNING:  terminating
connection because of crash of another server process
May  3 06:58:49 e-filter postgres[250]: [23-1] LOG:  all server
processes terminated; reinitializing
May  3 06:58:51 e-filter postgres[13478]: [24-1] LOG:  database system
was interrupted at 2005-05-03 06:58:16 EST
May  3 06:58:51 e-filter postgres[13478]: [25-1] LOG:  checkpoint
record is at 14/99F69378
May  3 06:58:51 e-filter postgres[13478]: [26-1] LOG:  redo record is
at 14/99F69378; undo record is at 0/0; shutdown FALSE
May  3 06:58:51 e-filter postgres[13478]: [27-1] LOG:  next
transaction ID: 3639687; next OID: 388415
May  3 06:58:51 e-filter postgres[13478]: [28-1] LOG:  database system
was not properly shut down; automatic recovery in progress
May  3 06:58:51 e-filter postgres[13478]: [29-1] LOG:  redo starts at
14/99F693B4
May  3 06:58:53 e-filter postgres[13478]: [30-1] LOG:  record with
zero length at 14/9AE223F0
May  3 06:58:53 e-filter postgres[13478]: [31-1] LOG:  redo done at 14/9AE223C8
May  3 06:58:54 e-filter postgres[13484]: [24-1] FATAL:  the database
system is starting up
May  3 06:58:54 e-filter postgres[13485]: [24-1] FATAL:  the database
system is starting up
May  3 06:58:55 e-filter postgres[13488]: [24-1] FATAL:  the database
system is starting up
May  3 06:58:57 e-filter postgres[13478]: [32-1] LOG:  database system is ready


and some time latter its ocur again:
May  3 09:59:38 e-filter postgres[250]: [24-1] LOG:  server process
(PID 34743) was terminated by signal 10
May  3 09:59:38 e-filter postgres[250]: [25-1] LOG:  terminating any
other active server processes
May  3 09:59:38 e-filter postgres[35215]: [24-1] WARNING:  terminating
connection because of crash of another server process
May  3 09:59:38 e-filter postgres[35215]: [24-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 09:59:38 e-filter postgres[35215]: [24-3]  process exited
abnormally and possibly corrupted shared memory.
May  3 09:59:38 e-filter postgres[35215]: [24-4] HINT:  In a moment
you should be able to reconnect to the database and repeat your
command.
May  3 09:59:38 e-filter postgres[34744]: [24-1] WARNING:  terminating
connection because of crash of another server process
May  3 09:59:38 e-filter postgres[34744]: [24-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server
May  3 09:59:38 e-filter postgres[33592]: [24-1] WARNING:  terminating
connection because of crash of another server process
May  3 09:59:38 e-filter postgres[34744]: [24-3]  process exited
abnormally and possibly corrupted shared memory.


This is my postgresql.conf

max_connections = 70
superuser_reserved_connections = 2
shared_buffers = 81920
work_mem = 10240
maintenance_work_mem = 51200
fsync = true
checkpoint_segments = 8
effective_cache_size = 100000
log_destination = 'syslog'
silent_mode = true
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'


and the shared memory configuration:

kern.ipc.shmmax: 700000000
kern.ipc.shmmin: 1
kern.ipc.shmmni: 192
kern.ipc.shmseg: 256
kern.ipc.shmall: 700000000


I have some configuration error that could result in this kind of problem ?

Any ideas ? Any thoughts ?

Best Regards,
Alexandre

Re: postgresql 8 abort with signal 10

From
Tom Lane
Date:
Alexandre Biancalana <biancalana@gmail.com> writes:
> Watching postgresql logs I see the following messages ocurs a lot of
> times in a day:

> May  3 06:58:44 e-filter postgres[250]: [21-1] LOG:  server process
> (PID 59608) was terminated by signal 10

You need to find out what's triggering that.  Turning on query logging
would be a good way of investigating.

            regards, tom lane

Re: postgresql 8 abort with signal 10

From
Scott Marlowe
Date:
On Tue, 2005-05-03 at 08:39, Alexandre Biancalana wrote:
> Hi list,
>
>  I'm running postgresql 8.0.1 on FreeBSD 4.11-STABLE, the machine is
> and AMD Sempron 2.2, 1GB Ram..
>
>  I use postgresql as database for dspam, an spam classification
> program. This database have and moderated use, on averange 10
> simultaneous conections executing relative big queries using "in"
> clausule.
>
> Watching postgresql logs I see the following messages ocurs a lot of
> times in a day:
>
> May  3 06:58:44 e-filter postgres[250]: [21-1] LOG:  server process
> (PID 59608) was terminated by signal 10
> May  3 06:58:44 e-filter postgres[250]: [22-1] LOG:  terminating any
> other active server processes

SNIP

> This is my postgresql.conf
>
> max_connections = 70
> superuser_reserved_connections = 2
> shared_buffers = 81920

Rather large, shared buffers for a machine with only 1 gig of ram.  640
Meg of RAM means the kernel is basically double buffering everything.
have you tested with smaller settings and this setting was the best?

You might want to look in your signal man page on BSD and see what
signal 10 means.  On solaris it's a bus error.  Not a clue what it is in
FreeBSD myself though.

> work_mem = 10240
> maintenance_work_mem = 51200
> fsync = true
> checkpoint_segments = 8
> effective_cache_size = 100000
> log_destination = 'syslog'
> silent_mode = true
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
>
>
> and the shared memory configuration:
>
> kern.ipc.shmmax: 700000000
> kern.ipc.shmmin: 1
> kern.ipc.shmmni: 192
> kern.ipc.shmseg: 256
> kern.ipc.shmall: 700000000
>
>
> I have some configuration error that could result in this kind of problem ?
>
> Any ideas ? Any thoughts ?
>
> Best Regards,
> Alexandre
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: postgresql 8 abort with signal 10

From
Vlad
Date:
Alexandre,

I saw reports (and observed the problem myself) that all sort of
different softwares suffering from signal 11 under FreeBSD (more often
seen on 5-STABLE). So far the collection is: Apache 1.3 (myself),
Mysql (recent descussion on freebsd-stable list) and now postgresql...
The hardware is not the point of failure here. Try to post this into
freebsd-stable - perhaps additional problem report will help them find
the cause.

p.s. here is the last one I see in my apache error log:
[Wed Mar  9 17:50:45 2005] [notice] child pid 95642 exit signal
Segmentation fault (11)

On 5/3/05, Alexandre Biancalana <biancalana@gmail.com> wrote:
> Hi list,
>
>  I'm running postgresql 8.0.1 on FreeBSD 4.11-STABLE, the machine is
> and AMD Sempron 2.2, 1GB Ram..
>
>  I use postgresql as database for dspam, an spam classification
> program. This database have and moderated use, on averange 10
> simultaneous conections executing relative big queries using "in"
> clausule.
>
> Watching postgresql logs I see the following messages ocurs a lot of
> times in a day:
>
> May  3 06:58:44 e-filter postgres[250]: [21-1] LOG:  server process
> (PID 59608) was terminated by signal 10
> May  3 06:58:44 e-filter postgres[250]: [22-1] LOG:  terminating any
> other active server processes
> May  3 06:58:44 e-filter postgres[59605]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59605]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59605]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59605]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:44 e-filter postgres[59607]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59607]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59607]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59607]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:44 e-filter postgres[59606]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59606]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59606]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59606]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:44 e-filter postgres[59626]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59626]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59626]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59626]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:44 e-filter postgres[59628]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59629]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59629]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59629]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59629]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:44 e-filter postgres[59628]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59628]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59628]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:44 e-filter postgres[59609]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59609]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59609]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59609]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:44 e-filter postgres[59627]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:44 e-filter postgres[59627]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:44 e-filter postgres[59627]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:44 e-filter postgres[59627]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:45 e-filter postgres[69093]: [23-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:45 e-filter postgres[69093]: [23-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:45 e-filter postgres[69093]: [23-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:45 e-filter postgres[69093]: [23-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:45 e-filter postgres[59620]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:46 e-filter postgres[59620]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:46 e-filter postgres[59620]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:46 e-filter postgres[59620]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:46 e-filter postgres[59619]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:46 e-filter postgres[59619]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:46 e-filter postgres[59619]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:46 e-filter postgres[59619]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:46 e-filter postgres[59624]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:46 e-filter postgres[59624]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:46 e-filter postgres[59624]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:46 e-filter postgres[59624]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:46 e-filter postgres[59623]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:46 e-filter postgres[59623]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:46 e-filter postgres[59623]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:46 e-filter postgres[59623]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:46 e-filter postgres[59625]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:46 e-filter postgres[59625]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:46 e-filter postgres[59625]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:46 e-filter postgres[59625]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:46 e-filter postgres[59622]: [21-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:46 e-filter postgres[59622]: [21-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 06:58:46 e-filter postgres[59622]: [21-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 06:58:46 e-filter postgres[59622]: [21-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 06:58:46 e-filter postgres[59621]: [22-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 06:58:49 e-filter postgres[250]: [23-1] LOG:  all server
> processes terminated; reinitializing
> May  3 06:58:51 e-filter postgres[13478]: [24-1] LOG:  database system
> was interrupted at 2005-05-03 06:58:16 EST
> May  3 06:58:51 e-filter postgres[13478]: [25-1] LOG:  checkpoint
> record is at 14/99F69378
> May  3 06:58:51 e-filter postgres[13478]: [26-1] LOG:  redo record is
> at 14/99F69378; undo record is at 0/0; shutdown FALSE
> May  3 06:58:51 e-filter postgres[13478]: [27-1] LOG:  next
> transaction ID: 3639687; next OID: 388415
> May  3 06:58:51 e-filter postgres[13478]: [28-1] LOG:  database system
> was not properly shut down; automatic recovery in progress
> May  3 06:58:51 e-filter postgres[13478]: [29-1] LOG:  redo starts at
> 14/99F693B4
> May  3 06:58:53 e-filter postgres[13478]: [30-1] LOG:  record with
> zero length at 14/9AE223F0
> May  3 06:58:53 e-filter postgres[13478]: [31-1] LOG:  redo done at 14/9AE223C8
> May  3 06:58:54 e-filter postgres[13484]: [24-1] FATAL:  the database
> system is starting up
> May  3 06:58:54 e-filter postgres[13485]: [24-1] FATAL:  the database
> system is starting up
> May  3 06:58:55 e-filter postgres[13488]: [24-1] FATAL:  the database
> system is starting up
> May  3 06:58:57 e-filter postgres[13478]: [32-1] LOG:  database system is ready
>
> and some time latter its ocur again:
> May  3 09:59:38 e-filter postgres[250]: [24-1] LOG:  server process
> (PID 34743) was terminated by signal 10
> May  3 09:59:38 e-filter postgres[250]: [25-1] LOG:  terminating any
> other active server processes
> May  3 09:59:38 e-filter postgres[35215]: [24-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 09:59:38 e-filter postgres[35215]: [24-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 09:59:38 e-filter postgres[35215]: [24-3]  process exited
> abnormally and possibly corrupted shared memory.
> May  3 09:59:38 e-filter postgres[35215]: [24-4] HINT:  In a moment
> you should be able to reconnect to the database and repeat your
> command.
> May  3 09:59:38 e-filter postgres[34744]: [24-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 09:59:38 e-filter postgres[34744]: [24-2] DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server
> May  3 09:59:38 e-filter postgres[33592]: [24-1] WARNING:  terminating
> connection because of crash of another server process
> May  3 09:59:38 e-filter postgres[34744]: [24-3]  process exited
> abnormally and possibly corrupted shared memory.
>
> This is my postgresql.conf
>
> max_connections = 70
> superuser_reserved_connections = 2
> shared_buffers = 81920
> work_mem = 10240
> maintenance_work_mem = 51200
> fsync = true
> checkpoint_segments = 8
> effective_cache_size = 100000
> log_destination = 'syslog'
> silent_mode = true
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
>
> and the shared memory configuration:
>
> kern.ipc.shmmax: 700000000
> kern.ipc.shmmin: 1
> kern.ipc.shmmni: 192
> kern.ipc.shmseg: 256
> kern.ipc.shmall: 700000000
>
> I have some configuration error that could result in this kind of problem ?
>
> Any ideas ? Any thoughts ?
>
> Best Regards,
> Alexandre
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


--

Vlad

Re: postgresql 8 abort with signal 10

From
Vlad
Date:
oops... you were writing about signal 10 not signal 11. my bad - sorry

On 5/3/05, Vlad <marchenko@gmail.com> wrote:
> Alexandre,
>
> I saw reports (and observed the problem myself) that all sort of
> different softwares suffering from signal 11 under FreeBSD (more often
> seen on 5-STABLE). So far the collection is: Apache 1.3 (myself),
> Mysql (recent descussion on freebsd-stable list) and now postgresql...
> The hardware is not the point of failure here. Try to post this into
> freebsd-stable - perhaps additional problem report will help them find
> the cause.
>
> p.s. here is the last one I see in my apache error log:
> [Wed Mar  9 17:50:45 2005] [notice] child pid 95642 exit signal
> Segmentation fault (11)
>
> On 5/3/05, Alexandre Biancalana <biancalana@gmail.com> wrote:
> > Hi list,
> >
> >  I'm running postgresql 8.0.1 on FreeBSD 4.11-STABLE, the machine is
> > and AMD Sempron 2.2, 1GB Ram..
> >
> >  I use postgresql as database for dspam, an spam classification
> > program. This database have and moderated use, on averange 10
> > simultaneous conections executing relative big queries using "in"
> > clausule.
> >
> > Watching postgresql logs I see the following messages ocurs a lot of
> > times in a day:
> >
> > May  3 06:58:44 e-filter postgres[250]: [21-1] LOG:  server process
> > (PID 59608) was terminated by signal 10
> > May  3 06:58:44 e-filter postgres[250]: [22-1] LOG:  terminating any
> > other active server processes
> > May  3 06:58:44 e-filter postgres[59605]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59605]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59605]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59605]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:44 e-filter postgres[59607]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59607]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59607]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59607]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:44 e-filter postgres[59606]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59606]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59606]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59606]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:44 e-filter postgres[59626]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59626]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59626]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59626]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:44 e-filter postgres[59628]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59629]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59629]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59629]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59629]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:44 e-filter postgres[59628]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59628]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59628]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:44 e-filter postgres[59609]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59609]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59609]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59609]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:44 e-filter postgres[59627]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:44 e-filter postgres[59627]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:44 e-filter postgres[59627]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:44 e-filter postgres[59627]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:45 e-filter postgres[69093]: [23-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:45 e-filter postgres[69093]: [23-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:45 e-filter postgres[69093]: [23-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:45 e-filter postgres[69093]: [23-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:45 e-filter postgres[59620]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:46 e-filter postgres[59620]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:46 e-filter postgres[59620]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:46 e-filter postgres[59620]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:46 e-filter postgres[59619]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:46 e-filter postgres[59619]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:46 e-filter postgres[59619]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:46 e-filter postgres[59619]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:46 e-filter postgres[59624]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:46 e-filter postgres[59624]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:46 e-filter postgres[59624]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:46 e-filter postgres[59624]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:46 e-filter postgres[59623]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:46 e-filter postgres[59623]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:46 e-filter postgres[59623]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:46 e-filter postgres[59623]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:46 e-filter postgres[59625]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:46 e-filter postgres[59625]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:46 e-filter postgres[59625]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:46 e-filter postgres[59625]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:46 e-filter postgres[59622]: [21-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:46 e-filter postgres[59622]: [21-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 06:58:46 e-filter postgres[59622]: [21-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 06:58:46 e-filter postgres[59622]: [21-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 06:58:46 e-filter postgres[59621]: [22-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 06:58:49 e-filter postgres[250]: [23-1] LOG:  all server
> > processes terminated; reinitializing
> > May  3 06:58:51 e-filter postgres[13478]: [24-1] LOG:  database system
> > was interrupted at 2005-05-03 06:58:16 EST
> > May  3 06:58:51 e-filter postgres[13478]: [25-1] LOG:  checkpoint
> > record is at 14/99F69378
> > May  3 06:58:51 e-filter postgres[13478]: [26-1] LOG:  redo record is
> > at 14/99F69378; undo record is at 0/0; shutdown FALSE
> > May  3 06:58:51 e-filter postgres[13478]: [27-1] LOG:  next
> > transaction ID: 3639687; next OID: 388415
> > May  3 06:58:51 e-filter postgres[13478]: [28-1] LOG:  database system
> > was not properly shut down; automatic recovery in progress
> > May  3 06:58:51 e-filter postgres[13478]: [29-1] LOG:  redo starts at
> > 14/99F693B4
> > May  3 06:58:53 e-filter postgres[13478]: [30-1] LOG:  record with
> > zero length at 14/9AE223F0
> > May  3 06:58:53 e-filter postgres[13478]: [31-1] LOG:  redo done at 14/9AE223C8
> > May  3 06:58:54 e-filter postgres[13484]: [24-1] FATAL:  the database
> > system is starting up
> > May  3 06:58:54 e-filter postgres[13485]: [24-1] FATAL:  the database
> > system is starting up
> > May  3 06:58:55 e-filter postgres[13488]: [24-1] FATAL:  the database
> > system is starting up
> > May  3 06:58:57 e-filter postgres[13478]: [32-1] LOG:  database system is ready
> >
> > and some time latter its ocur again:
> > May  3 09:59:38 e-filter postgres[250]: [24-1] LOG:  server process
> > (PID 34743) was terminated by signal 10
> > May  3 09:59:38 e-filter postgres[250]: [25-1] LOG:  terminating any
> > other active server processes
> > May  3 09:59:38 e-filter postgres[35215]: [24-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 09:59:38 e-filter postgres[35215]: [24-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 09:59:38 e-filter postgres[35215]: [24-3]  process exited
> > abnormally and possibly corrupted shared memory.
> > May  3 09:59:38 e-filter postgres[35215]: [24-4] HINT:  In a moment
> > you should be able to reconnect to the database and repeat your
> > command.
> > May  3 09:59:38 e-filter postgres[34744]: [24-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 09:59:38 e-filter postgres[34744]: [24-2] DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server
> > May  3 09:59:38 e-filter postgres[33592]: [24-1] WARNING:  terminating
> > connection because of crash of another server process
> > May  3 09:59:38 e-filter postgres[34744]: [24-3]  process exited
> > abnormally and possibly corrupted shared memory.
> >
> > This is my postgresql.conf
> >
> > max_connections = 70
> > superuser_reserved_connections = 2
> > shared_buffers = 81920
> > work_mem = 10240
> > maintenance_work_mem = 51200
> > fsync = true
> > checkpoint_segments = 8
> > effective_cache_size = 100000
> > log_destination = 'syslog'
> > silent_mode = true
> > lc_messages = 'C'
> > lc_monetary = 'C'
> > lc_numeric = 'C'
> > lc_time = 'C'
> >
> > and the shared memory configuration:
> >
> > kern.ipc.shmmax: 700000000
> > kern.ipc.shmmin: 1
> > kern.ipc.shmmni: 192
> > kern.ipc.shmseg: 256
> > kern.ipc.shmall: 700000000
> >
> > I have some configuration error that could result in this kind of problem ?
> >
> > Any ideas ? Any thoughts ?
> >
> > Best Regards,
> > Alexandre
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
> --
>
> Vlad
>


--

Vlad

Re: postgresql 8 abort with signal 10

From
Alexandre Biancalana
Date:
>>You need to find out what's triggering that.  Turning on query logging
>>would be a good way of investigating.

 Which directives can I use to enable this ?
debug_print_parse ? debug_print_rewritten ? debug_print_plan ?
debug_pretty_print ?


>>Rather large, shared buffers for a machine with only 1 gig of ram.  640
>>Meg of RAM means the kernel is basically double buffering everything.
>>have you tested with smaller settings and this setting was the best?

I had 256 of RAM then I increase to 1GB thinking this could be a
problem of out of memory or a buggy memory...... After this "upgrade"
I increase the numbers of shared buffers,etc

It's important to say that the max memory usage reach to only 80%.

What values do you suggest ?

>>You might want to look in your signal man page on BSD and see what
>>signal 10 means.  On solaris it's a bus error.  Not a clue what it is in
>>FreeBSD myself though.

FreeBSD man page say: 10    SIGBUS

The system does not generate core dump file for this error.....

Regards,

Re: postgresql 8 abort with signal 10

From
Michael Fuhr
Date:
On Tue, May 03, 2005 at 09:54:03AM -0500, Scott Marlowe wrote:
>
> You might want to look in your signal man page on BSD and see what
> signal 10 means.  On solaris it's a bus error.  Not a clue what it is in
> FreeBSD myself though.

Signal 10 is SIGBUS (bus error) on FreeBSD 4.11.  Somewhere under
$PGDATA there might be a core dump named postmaster.core (or, more
specifically, with a file name based on the kern.corefile sysctl
setting) -- if there is, then a debugger like gdb might be able to
show where the problem happened, especially if the postmaster was
built with debugging info.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: postgresql 8 abort with signal 10

From
Michael Fuhr
Date:
On Tue, May 03, 2005 at 01:36:13PM -0300, Alexandre Biancalana wrote:
>
> The system does not generate core dump file for this error.....

Are you sure?  Where did you look and what file name did you look
for?  Unless you've changed the kern.corefile sysctl setting, the
file should be named "postgres.core", not just "core", and it should
be somewhere under $PGDATA.  Whether a core file is produced is
also affected by the kern.coredump sysctl setting and the coredumpsize
resource limit.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: postgresql 8 abort with signal 10

From
Michael Fuhr
Date:
On Tue, May 03, 2005 at 10:37:03AM -0600, Michael Fuhr wrote:
>
> Signal 10 is SIGBUS (bus error) on FreeBSD 4.11.  Somewhere under
> $PGDATA there might be a core dump named postmaster.core

Correction: the core dump should be named postgres.core (at least
it is on my FreeBSD 4.11-STABLE system if I send the backend a
signal 10).

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: postgresql 8 abort with signal 10

From
Scott Marlowe
Date:
On Tue, 2005-05-03 at 11:36, Alexandre Biancalana wrote:
> >>You need to find out what's triggering that.  Turning on query logging
> >>would be a good way of investigating.
>
>  Which directives can I use to enable this ?
> debug_print_parse ? debug_print_rewritten ? debug_print_plan ?
> debug_pretty_print ?
>
>
> >>Rather large, shared buffers for a machine with only 1 gig of ram.  640
> >>Meg of RAM means the kernel is basically double buffering everything.
> >>have you tested with smaller settings and this setting was the best?
>
> I had 256 of RAM then I increase to 1GB thinking this could be a
> problem of out of memory or a buggy memory...... After this "upgrade"
> I increase the numbers of shared buffers,etc
>
> It's important to say that the max memory usage reach to only 80%.
>
> What values do you suggest ?

Generally 25% of the memory or 256 Megs, whichever is less. In your
case, they're the same.  The Reasoning being that the kernel caches,
while postgresql only really holds onto data as long as it needs it,
then frees it, so having a really huge buffer space lets postgresql
flush the kernel cache, then the next access, after postgresql has freed
the memory that was holding the data, now has to go to disk.

The kernel is generally a lot better at caching than most apps.

So, 32768 is about as big as i'd normally go, and even that may be more
than you really need.  Note that there's overhead in managing such a
large buffer as well.  With pgsql 8.x and the new caching algorithms in
place, such overhead may be lower, and larger buffer settings may be in
order.  But if testing hasn't shown them to be faster, i'd avoid them
for now and see if your signal 10 errors start going away.

If they do, then you've likely got a kernel bug in there somewhere.  If
they don't, I'd suspect bad hardware.

> >>You might want to look in your signal man page on BSD and see what
> >>signal 10 means.  On solaris it's a bus error.  Not a clue what it is in
> >>FreeBSD myself though.
>
> FreeBSD man page say: 10    SIGBUS
>
> The system does not generate core dump file for this error.....



Re: postgresql 8 abort with signal 10

From
Alexandre Biancalana
Date:
On 5/3/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> On Tue, 2005-05-03 at 11:36, Alexandre Biancalana wrote:
> > >>You need to find out what's triggering that.  Turning on query logging
> > >>would be a good way of investigating.
> >
> >  Which directives can I use to enable this ?
> > debug_print_parse ? debug_print_rewritten ? debug_print_plan ?
> > debug_pretty_print ?
> >
> >
> > >>Rather large, shared buffers for a machine with only 1 gig of ram.  640
> > >>Meg of RAM means the kernel is basically double buffering everything.
> > >>have you tested with smaller settings and this setting was the best?
> >
> > I had 256 of RAM then I increase to 1GB thinking this could be a
> > problem of out of memory or a buggy memory...... After this "upgrade"
> > I increase the numbers of shared buffers,etc
> >
> > It's important to say that the max memory usage reach to only 80%.
> >
> > What values do you suggest ?
>
> Generally 25% of the memory or 256 Megs, whichever is less. In your
> case, they're the same.  The Reasoning being that the kernel caches,
> while postgresql only really holds onto data as long as it needs it,
> then frees it, so having a really huge buffer space lets postgresql
> flush the kernel cache, then the next access, after postgresql has freed
> the memory that was holding the data, now has to go to disk.
>
> The kernel is generally a lot better at caching than most apps.
>
> So, 32768 is about as big as i'd normally go, and even that may be more
> than you really need.  Note that there's overhead in managing such a
> large buffer as well.  With pgsql 8.x and the new caching algorithms in
> place, such overhead may be lower, and larger buffer settings may be in
> order.  But if testing hasn't shown them to be faster, i'd avoid them
> for now and see if your signal 10 errors start going away.
>
> If they do, then you've likely got a kernel bug in there somewhere.  If
> they don't, I'd suspect bad hardware.
>
> > >>You might want to look in your signal man page on BSD and see what
> > >>signal 10 means.  On solaris it's a bus error.  Not a clue what it is in
> > >>FreeBSD myself though.
> >
> > FreeBSD man page say: 10    SIGBUS
> >
> > The system does not generate core dump file for this error.....
>
>

Hi Michael,

Here is my /etc/sysctl.conf:

kern.corefile="/var/coredumps/%N.%P.core"
kern.sugid_coredump=1

and how I said before, there is no one core file in /var/coredumps....
I should say that this structure to store core files it's ok, in past
I used this a lot....

Thanks Scott I will lower shared_buffers to 32768 and try again, but
how about work_mem, maintenance_work_mem, effective_cache_size ??

Re: postgresql 8 abort with signal 10

From
Scott Marlowe
Date:
On Tue, 2005-05-03 at 12:25, Alexandre Biancalana wrote:
> On 5/3/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > On Tue, 2005-05-03 at 11:36, Alexandre Biancalana wrote:
> > > >>You need to find out what's triggering that.  Turning on query logging
> > > >>would be a good way of investigating.
> > >
> > >  Which directives can I use to enable this ?
> > > debug_print_parse ? debug_print_rewritten ? debug_print_plan ?
> > > debug_pretty_print ?
> > >
> > >
> > > >>Rather large, shared buffers for a machine with only 1 gig of ram.  640
> > > >>Meg of RAM means the kernel is basically double buffering everything.
> > > >>have you tested with smaller settings and this setting was the best?
> > >
> > > I had 256 of RAM then I increase to 1GB thinking this could be a
> > > problem of out of memory or a buggy memory...... After this "upgrade"
> > > I increase the numbers of shared buffers,etc
> > >
> > > It's important to say that the max memory usage reach to only 80%.
> > >
> > > What values do you suggest ?
> >
> > Generally 25% of the memory or 256 Megs, whichever is less. In your
> > case, they're the same.  The Reasoning being that the kernel caches,
> > while postgresql only really holds onto data as long as it needs it,
> > then frees it, so having a really huge buffer space lets postgresql
> > flush the kernel cache, then the next access, after postgresql has freed
> > the memory that was holding the data, now has to go to disk.
> >
> > The kernel is generally a lot better at caching than most apps.
> >
> > So, 32768 is about as big as i'd normally go, and even that may be more
> > than you really need.  Note that there's overhead in managing such a
> > large buffer as well.  With pgsql 8.x and the new caching algorithms in
> > place, such overhead may be lower, and larger buffer settings may be in
> > order.  But if testing hasn't shown them to be faster, i'd avoid them
> > for now and see if your signal 10 errors start going away.
> >
> > If they do, then you've likely got a kernel bug in there somewhere.  If
> > they don't, I'd suspect bad hardware.
> >
> > > >>You might want to look in your signal man page on BSD and see what
> > > >>signal 10 means.  On solaris it's a bus error.  Not a clue what it is in
> > > >>FreeBSD myself though.
> > >
> > > FreeBSD man page say: 10    SIGBUS
> > >
> > > The system does not generate core dump file for this error.....
> >
> >
>
> Hi Michael,
>
> Here is my /etc/sysctl.conf:
>
> kern.corefile="/var/coredumps/%N.%P.core"
> kern.sugid_coredump=1
>
> and how I said before, there is no one core file in /var/coredumps....
> I should say that this structure to store core files it's ok, in past
> I used this a lot....
>
> Thanks Scott I will lower shared_buffers to 32768 and try again, but
> how about work_mem, maintenance_work_mem, effective_cache_size ??

work_mem is how much memory things like sorts can allocate.  It really
kind of depends on the kind of parallel load you're looking at possibly
handling.  If you'll never have more than a dozen or so open connections
that could be doing sorts (select distinct, order by, union, etc...)
then having it be 10 to 20 meg is fine.  If you're going to handle
hundreds or even thousands of connections, you have to be careful it's
not big enough to run your machine out of memory, or you'll start
getting swap storms.

maintenance_work_mem is used by processes like vacuum, which tend to be
run one at a time, so having it be fairly large, like 32 to 64 meg is no
big issue.  Note that you can set either of these settings higher for
one shot things, like nightly maintenance, if you need to keep them
lower during the day to ensure proper operation.

effective_cache_size is a setting that simply tells the query planner
about how much the kernel / OS is caching of your data set.  Generally
the cached value shown in top or some other system monitor on a
dedicated machine is about right.

work_mem and maintenance_work_mem are in 1k increments, while the other
two, (buffers and effective_cache_size) are in 8k increments, btw.

Re: postgresql 8 abort with signal 10

From
Alexandre Biancalana
Date:
Thank you for the detailed explanation Scott, they are very handy !!

I reduced the shared_buffers to 32768, but the problem still occurs.....

Any other idea ??

Re: postgresql 8 abort with signal 10

From
Scott Marlowe
Date:
On Tue, 2005-05-03 at 15:04, Alexandre Biancalana wrote:
> Thank you for the detailed explanation Scott, they are very handy !!
>
> I reduced the shared_buffers to 32768, but the problem still occurs.....
>
> Any other idea ??

Yeah, I had a sneaking suspicion that shared_buffers wasn't causing the
issue really.

Sounds like either a hardware fault, or a BSD bug.  I'd check the BSD
mailing lists for mention of said bug, and see if you can grab a spare
drive and install the last stable version of FreeBSD 4.x and if that
fixes the problem.

If you decide to try linux, avoid the 2.6 kernel, it's still got
issues...  2.4 is pretty stable.

I really doubt it's a problem in postgresql itself though.

Re: postgresql 8 abort with signal 10

From
Alexandre Biancalana
Date:
Ohhh god :(

The FreeBSD is the last STABLE version..... I can try to change some
hardware, I already changed memory, what can I try now ? the processor
? motherboard ??



On 5/3/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> On Tue, 2005-05-03 at 15:04, Alexandre Biancalana wrote:
> > Thank you for the detailed explanation Scott, they are very handy !!
> >
> > I reduced the shared_buffers to 32768, but the problem still occurs.....
> >
> > Any other idea ??
>
> Yeah, I had a sneaking suspicion that shared_buffers wasn't causing the
> issue really.
>
> Sounds like either a hardware fault, or a BSD bug.  I'd check the BSD
> mailing lists for mention of said bug, and see if you can grab a spare
> drive and install the last stable version of FreeBSD 4.x and if that
> fixes the problem.
>
> If you decide to try linux, avoid the 2.6 kernel, it's still got
> issues...  2.4 is pretty stable.
>
> I really doubt it's a problem in postgresql itself though.
>

Re: postgresql 8 abort with signal 10

From
Scott Marlowe
Date:
On Tue, 2005-05-03 at 15:56, Alexandre Biancalana wrote:
> Ohhh god :(
>
> The FreeBSD is the last STABLE version..... I can try to change some
> hardware, I already changed memory, what can I try now ? the processor
> ? motherboard ??

You're running FreeBSD 5, right?  I'd try to find the last version of 4
and put it on a spare drive and see if that works or has the same
problem.

If you're running 4, then I'd try a spare machine to see if the problem
follows BSD or the hardware.

If the error really is a buss error, then this problem is way out of the
realm of what I'm familiar with.  Especially with regards to BSD.

Re: postgresql 8 abort with signal 10

From
Roman Neuhauser
Date:
# biancalana@gmail.com / 2005-05-03 17:56:53 -0300:
> The FreeBSD is the last STABLE version..... I can try to change some
> hardware, I already changed memory, what can I try now ? the processor
> ? motherboard ??

> On 5/3/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > On Tue, 2005-05-03 at 15:04, Alexandre Biancalana wrote:
> > > Thank you for the detailed explanation Scott, they are very handy !!
> > >
> > > I reduced the shared_buffers to 32768, but the problem still occurs.....
> > >
> > > Any other idea ??
> >
> > Yeah, I had a sneaking suspicion that shared_buffers wasn't causing the
> > issue really.
> >
> > Sounds like either a hardware fault, or a BSD bug.  I'd check the BSD
> > mailing lists for mention of said bug, and see if you can grab a spare
> > drive and install the last stable version of FreeBSD 4.x and if that
> > fixes the problem.
> >
> > If you decide to try linux, avoid the 2.6 kernel, it's still got
> > issues...  2.4 is pretty stable.
> >
> > I really doubt it's a problem in postgresql itself though.

    For the sake of archives, what was causing the SIGBUSes?

--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991

Re: postgresql 8 abort with signal 10

From
Alexandre Biancalana
Date:
I changed from postgresql to mysql and everything now is great ;)

Same machine, same os, etc...

On 6/2/05, Roman Neuhauser <neuhauser@sigpipe.cz> wrote:
> # biancalana@gmail.com / 2005-05-03 17:56:53 -0300:
> > The FreeBSD is the last STABLE version..... I can try to change some
> > hardware, I already changed memory, what can I try now ? the processor
> > ? motherboard ??
>
> > On 5/3/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > > On Tue, 2005-05-03 at 15:04, Alexandre Biancalana wrote:
> > > > Thank you for the detailed explanation Scott, they are very handy !!
> > > >
> > > > I reduced the shared_buffers to 32768, but the problem still occurs.....
> > > >
> > > > Any other idea ??
> > >
> > > Yeah, I had a sneaking suspicion that shared_buffers wasn't causing the
> > > issue really.
> > >
> > > Sounds like either a hardware fault, or a BSD bug.  I'd check the BSD
> > > mailing lists for mention of said bug, and see if you can grab a spare
> > > drive and install the last stable version of FreeBSD 4.x and if that
> > > fixes the problem.
> > >
> > > If you decide to try linux, avoid the 2.6 kernel, it's still got
> > > issues...  2.4 is pretty stable.
> > >
> > > I really doubt it's a problem in postgresql itself though.
>
>     For the sake of archives, what was causing the SIGBUSes?
>
> --
> How many Vietnam vets does it take to screw in a light bulb?
> You don't know, man.  You don't KNOW.
> Cause you weren't THERE.             http://bash.org/?255991
>