Re: XLogInsert scaling, revisited - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: XLogInsert scaling, revisited
Date
Msg-id CAHGQGwGJYD=TZgqLXB57s-ZoWVeQ8FLLGzPAKDM84+s4Uf8shw@mail.gmail.com
Whole thread Raw
In response to Re: XLogInsert scaling, revisited  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Fri, Sep 28, 2012 at 12:58 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Hmm, I cannot reproduce this on my Linux laptop. However, I think I see what
> the problem is: the assertion should assert that (*CurrPos* % XLOG_BLCKZ >=
> SizeOfXLogShortPHD), not currpos. The former is an XLogRecPtr, the latter is
> a pointer. If the WAL buffers are aligned at 8k boundaries, the effect is
> the same, but otherwise the assertion is just wrong. And as it happens, if
> O_DIRECT is defined, we align WAL buffers at XLOG_BLCKSZ. I think that's why
> I don't see this on my laptop. Does Mac OS X not define O_DIRECT?

Yes, AFAIK Mac OS doesn't support O_DIRECT.

> Anyway, attached is a patch with that fixed.

Thanks! In new patch, initdb was successfully completed.

I encountered another strange issue: When I called pg_switch_xlog() while
pgbench -j 1 -c 1 -T 600 is running, both pg_switch_xlog() and all connections
of pgbench got stuck.

Here is the backtrace of stuck pg_switch_xlog():
(gdb) bt
#0  0x00007fff8fe13c46 in semop ()
#1  0x0000000106b97d34 in PGSemaphoreLock ()
#2  0x0000000106a2e8cf in WaitXLogInsertionsToFinish ()
#3  0x0000000106a2fe8b in XLogInsert ()
#4  0x0000000106a30576 in RequestXLogSwitch ()
#5  0x0000000106a37950 in pg_switch_xlog ()
#6  0x0000000106b19bd3 in ExecMakeFunctionResult ()
#7  0x0000000106b14be1 in ExecProject ()
#8  0x0000000106b2b83d in ExecResult ()
#9  0x0000000106b14000 in ExecProcNode ()
#10 0x0000000106b13080 in standard_ExecutorRun ()
#11 0x0000000106be919f in PortalRunSelect ()
#12 0x0000000106bea5c9 in PortalRun ()
#13 0x0000000106be8519 in PostgresMain ()
#14 0x0000000106ba4ef9 in PostmasterMain ()
#15 0x0000000106b418f1 in main ()

Here is the backtrace of stuck pgbench connection:
(gdb) bt
#0  0x00007fff8fe13c46 in semop ()
#1  0x0000000106b97d34 in PGSemaphoreLock ()
#2  0x0000000106bda95e in LWLockAcquireWithCondVal ()
#3  0x0000000106a25556 in WALInsertLockAcquire ()
#4  0x0000000106a2fa8a in XLogInsert ()
#5  0x0000000106a0386d in heap_update ()
#6  0x0000000106b2a03e in ExecModifyTable ()
#7  0x0000000106b14010 in ExecProcNode ()
#8  0x0000000106b13080 in standard_ExecutorRun ()
#9  0x0000000106be9ceb in ProcessQuery ()
#10 0x0000000106be9eec in PortalRunMulti ()
#11 0x0000000106bea71e in PortalRun ()
#12 0x0000000106be8519 in PostgresMain ()
#13 0x0000000106ba4ef9 in PostmasterMain ()
#14 0x0000000106b418f1 in main ()

Though I've not read the patch yet, probably lock mechanism
in XLogInsert would have a bug which causes the above problem.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Doc patch, normalize search_path in index
Next
From: Dimitri Fontaine
Date:
Subject: Re: [9.1] 2 bugs with extensions