Re: Fw: Windows 10 got stuck with PostgreSQL at starting up. Addingdelay lets it avoid. - Mailing list pgsql-hackers

From Yugo Nagata
Subject Re: Fw: Windows 10 got stuck with PostgreSQL at starting up. Addingdelay lets it avoid.
Date
Msg-id 20180801174700.e2638203ce5c5449163d6d2e@sraoss.co.jp
Whole thread Raw
In response to Re: Fw: Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets it avoid.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, 20 Jul 2018 10:48:15 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Yugo Nagata <nagata@sraoss.co.jp> writes:
> > Recently, one of our clients reported a problem that Windows 10 sometime 
> > (approximately once in 300 tries) hung up at OS starting up while PostgreSQL
> > 9.3.x service is starting up. My co-worker analyzed this and found that
> > PostgreSQL's auxiliary process and Windows' logon processes are in a dead-lock
> > situation.
> 
> Really?  What would they deadlock on?  Why is there any connection
> whatsoever?  Why has nobody else run into this?

It is not clear where the hang occered, but this might be a problem
only on the specific version of Windows. Our client reported that
the hang occured with  Windows 10 IoT Enterpise 2015 LTSB, but not
with Windows 10 IoT Enterpise 2016 LTSB or Windows 7. 

> 
> > He reported this problem to pgsql-general list as below. Also, he created a patch
> > to add a build-time option for adding 0.5 or 3.0 seconds delay after each sub 
> > process starts.
> 
> This seems like an ugly hack that probably doesn't reliably resolve
> whatever the problem is, but does manage to kill postmaster
> responsiveness :-(.  It'd be especially awful to insert such a delay
> after forking parallel worker processes, which would be a problem in
> anything much newer than 9.3.

Agreed.

> I think we need more investigation; and to start with, reproducing
> the problem in a branch that's not within hailing distance of its EOL
> would be a good idea.  (Not that I have reason to think PG's behavior
> has changed much here ... but 9.3 is just not a good basis for asking
> us to do anything now.)

They also reported that this problem occured with Windows 10 IoT Enterpise
2015 LTSB + PostgreSQL 10.3 as well as PostgreSQL 9.3.22. However, 
reproducing this would be hard because we don't have Windows 10 IoT
enviromnemt and also the frequency is approximately once in 300 retries
of OS startup.

We will investigate this more and report if we found something.

Regards,


-- 
Yugo Nagata <nagata@sraoss.co.jp>


pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: Online enabling of checksums
Next
From: Tomas Vondra
Date:
Subject: Re: Online enabling of checksums