Re: [9.4 bug] The database server hangs with write-heavy workload on Windows - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [9.4 bug] The database server hangs with write-heavy workload on Windows
Date
Msg-id 20141010141127.GE6670@alap3.anarazel.de
Whole thread Raw
In response to Re: [9.4 bug] The database server hangs with write-heavy workload on Windows  ("MauMau" <maumau307@gmail.com>)
List pgsql-hackers
On 2014-10-10 23:08:34 +0900, MauMau wrote:
> From: "Craig Ringer" <craig@2ndquadrant.com>
> >It sounds like they've produced a test case, so they should be able to
> >with a bit of luck.
> >
> >Or even better, send you the test case.
> 
> I asked the user about this.  It sounds like the relevant test case consists
> of many scripts.  He explained to me that the simplified test steps are:
> 
> 1. initdb
> 2. pg_ctl start
> 3. Create 16 tables.  Each of those tables consist of around 10 columns.
> 4. Insert 1000 rows into each of those 16 tables.
> 5. Launch 16 psql sessions concurrently.  Each session updates all 1000 rows
> of one table, e.g., session 1 updates table 1, session 2 updates table 2,
> and so on.
> 6. Repeat step 5 50 times.
> 
> This sounds a bit complicated, but I understood that the core part is 16
> concurrent updates, which should lead to contention on xlog insert slots
> and/or spinlocks.

Hm. I've run similar loads on linux for long enough that I'm relatively
sure I'd have seen this.

Could you get them to print out the content's of the lwlock all these
processes are waiting for?

> >Your next step here really needs to be to make this reproducible against
> >a debug build. Then see if reverting the xlog scalability work actually
> >changes the behaviour, given that you hypothesised that it could be
> >involved.

I don't think you can trivially revert the xlog scalability stuff.

> Thank you, but that may be labor-intensive and time-consuming.  In addition,
> the user uses a machine with multiple CPU cores, while I only have a desktop
> PC with two CPU cores.  So I doubt I can reproduce the problem on my PC.

Well, it'll also be labor intensive for the community to debug.

> I asked the user to change S_UNLOCK to something like the following and run
> the test during this weekend (the next Monday is a national holiday in
> Japan).
> 
> #define S_UNLOCK(lock)  InterlockedExchange(lock, 0)

That shouldn't be required. For one, on 9.4 (not 9.5!) spinlock releases
only need to prevent reordering on the CPU level. As x86 is a TSO
architecture (total store order) that doesn't require doing anything
special. And even if it'd require more, on msvc volatile reads/stores
act as acquire/release fences unless you monkey with the compiler settings.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: [9.4 bug] The database server hangs with write-heavy workload on Windows
Next
From: Andres Freund
Date:
Subject: Re: Wait free LW_SHARED acquisition - v0.9