Thread: Wal sync odirect

Wal sync odirect

From
Миша Тюрин
Date:
hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. i
thinkin my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd under
wal)- there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archive
commandmay read wal directly without os cache. just opinion, i have not done any tests yet. but i am still under the
somememory eviction anomaly. 

Re: Wal sync odirect

From
Craig Ringer
Date:
On 07/21/2013 10:01 PM, Миша Тюрин wrote:
> hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal.
ithink in my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd
underwal) - there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my
archivecommand may read wal directly without os cache. just opinion, i have not done any tests yet. but i am still
underthe some memory eviction anomaly.
 

PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode
for WAL writes. See comments in src/include/access/xlogdefs.h (search
for O_DIRECT). You should also examine
src/backend/access/transam/xlog.c, particularly the function
get_sync_bit(...)

Try doing some tests with pg_test_fsync, see how performance looks. If
your theory is right and WAL traffic is putting pressure on kernel write
buffers, using fsync=open_datasync - which should be the default on
Linux - may help.

I'd recommend doing some detailed tracing and performance measurements
before trying to proceed further.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Wal sync odirect

From
Миша Тюрин
Date:

i tell about wal_level is higher than MINIMAL


wal_level != minimal
http://doxygen.postgresql.org/xlogdefs_8h_source.html
"
48
  * Because O_DIRECT bypasses the kernel buffers, and because we never
49  * read those buffers except during crash recovery or if wal_level != minimal
"

> hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. i think in my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd under wal) - there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archive command may read wal directly without os cache. just opinion, i have not done any tests yet. but i am still under the some memory eviction anomaly.

PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode
for WAL writes. See comments in src/include/access/xlogdefs.h (search
for O_DIRECT). You should also examine
src/backend/access/transam/xlog.c, particularly the function
get_sync_bit(...)

Try doing some tests with pg_test_fsync, see how performance looks. If
your theory is right and WAL traffic is putting pressure on kernel write
buffers, using fsync=open_datasync - which should be the default on
Linux - may help.

I'd recommend doing some detailed tracing and performance measurements
before trying to proceed further.

--
 Craig Ringer http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Re: [HACKERS] Wal sync odirect

From
Craig Ringer
Date:
On 07/22/2013 03:30 PM, Миша Тюрин wrote:
> 
> i tell about wal_level is higher than MINIMAL

OK, so you want to be able to force O_DIRECT for wal_level = archive ?

I guess that makes sense if you expect the archive_command to read the
file out of the RAID controller's write cache before it gets flushed and
your archive_command can also use direct I/O to avoid pulling it into cache.

You already know where to change to start experimenting with this. What
exactly are you trying to ask? I don't see any risk in forcing O_DIRECT
for higher wal_level, but I'm not an expert in WAL and recovery. I'd
recommend testing on a non-critical PostgreSQL instance.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Re: [HACKERS] Wal sync odirect

From
Cédric Villemain
Date:
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">Lelundi 22 juillet 2013 09:39:50, Craig Ringer a écrit :<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> On
07/22/201303:30 PM, Миша Тюрин wrote:<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> > <p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > i tell about
wal_levelis higher than MINIMAL<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> <p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> OK, so you want to be
ableto force O_DIRECT for wal_level = archive ?<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> I
guessthat makes sense if you expect the archive_command to read the<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> file out of the RAID
controller'swrite cache before it gets flushed and<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> your archive_command can also use direct
I/Oto avoid pulling it into cache.<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> <p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> You already know where
tochange to start experimenting with this. What<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> exactly are you trying to ask? I don't
seeany risk in forcing O_DIRECT<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> for higher wal_level, but I'm not an expert in WAL and
recovery.I'd<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;
text-indent:0px;-qt-user-state:0;">> recommend testing on a non-critical PostgreSQL instance.<p
style="-qt-paragraph-type:empty;margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">IIRC there is also some fadvise() call to flush the buffer cache
whenusing <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;
text-indent:0px;-qt-user-state:0;">'minimal', but not when using archiving of WAL.<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I'm unsure
howthis has been tunned with streaming replication addition.<p style="-qt-paragraph-type:empty; margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> <p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">see
xlog.c|h<pstyle="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">-- <p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Cédric Villemain +33 (0)6 20
3022 52<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;
text-indent:0px;-qt-user-state:0;">http://2ndQuadrant.fr/<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">PostgreSQL: Support 24x7 - Développement,
Expertiseet Formation<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; ">