Thread: castoroides spinlock failure on test_shm_mq
Has anybody noticed the way castoroides is randomly failing? SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,270000)),200, 3); ! PANIC: stuck spinlock (100cb92f4) detected at atomics.c:30 ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jun 20, 2015 at 12:24 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Has anybody noticed the way castoroides is randomly failing? > > SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,270000)),200, 3); > ! PANIC: stuck spinlock (100cb92f4) detected at atomics.c:30 > ! server closed the connection unexpectedly > ! This probably means the server terminated abnormally > ! before or while processing the request. > ! connection to server was lost Yeah, Andres and I discussed it a month ago: http://www.postgresql.org/message-id/20150527225528.GP5310@alap3.anarazel.de I think we're going to need to try to implement real memory barriers on all architectures we support. It's not clear whether there's some suitable generic fallback that we could use or whether we're going to need something different for each case. I had thought Andres was planning to work on this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-06-20 09:35:39 -0400, Robert Haas wrote: > On Sat, Jun 20, 2015 at 12:24 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Has anybody noticed the way castoroides is randomly failing? > > > > SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,270000)),200, 3); > > ! PANIC: stuck spinlock (100cb92f4) detected at atomics.c:30 > > ! server closed the connection unexpectedly > > ! This probably means the server terminated abnormally > > ! before or while processing the request. > > ! connection to server was lost > > Yeah, Andres and I discussed it a month ago: > > http://www.postgresql.org/message-id/20150527225528.GP5310@alap3.anarazel.de > > I think we're going to need to try to implement real memory barriers > on all architectures we support. It's not clear whether there's some > suitable generic fallback that we could use or whether we're going to > need something different for each case. I had thought Andres was > planning to work on this. I am. I'd posted on the other thread that I want to use waitpid(PostmasterPid, WNOHANG) as the fallback for now. Unless somebody protests I'm going to commit that first, wait for a while to see wether it stabilizes the solaris members, and then commit a better fallback for solaris with suncc. Greetings, Andres Freund