Thread: Wal sync odirect
hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. i thinkin my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd under wal)- there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archive commandmay read wal directly without os cache. just opinion, i have not done any tests yet. but i am still under the somememory eviction anomaly.
On 07/21/2013 10:01 PM, Миша Тюрин wrote: > hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. ithink in my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd underwal) - there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archivecommand may read wal directly without os cache. just opinion, i have not done any tests yet. but i am still underthe some memory eviction anomaly. PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode for WAL writes. See comments in src/include/access/xlogdefs.h (search for O_DIRECT). You should also examine src/backend/access/transam/xlog.c, particularly the function get_sync_bit(...) Try doing some tests with pg_test_fsync, see how performance looks. If your theory is right and WAL traffic is putting pressure on kernel write buffers, using fsync=open_datasync - which should be the default on Linux - may help. I'd recommend doing some detailed tracing and performance measurements before trying to proceed further. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
i tell about wal_level is higher than MINIMAL
wal_level != minimal
http://doxygen.postgresql.org/xlogdefs_8h_source.html
"
48 * Because O_DIRECT bypasses the kernel buffers, and because we never"
> hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. i think in my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd under wal) - there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archive command may read wal directly without os cache. just opinion, i have not done any tests yet. but i am still under the some memory eviction anomaly.
PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode
for WAL writes. See comments in src/include/access/xlogdefs.h (search
for O_DIRECT). You should also examine
src/backend/access/transam/xlog.c, particularly the function
get_sync_bit(...)
Try doing some tests with pg_test_fsync, see how performance looks. If
your theory is right and WAL traffic is putting pressure on kernel write
buffers, using fsync=open_datasync - which should be the default on
Linux - may help.
I'd recommend doing some detailed tracing and performance measurements
before trying to proceed further.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/22/2013 03:30 PM, Миша Тюрин wrote: > > i tell about wal_level is higher than MINIMAL OK, so you want to be able to force O_DIRECT for wal_level = archive ? I guess that makes sense if you expect the archive_command to read the file out of the RAID controller's write cache before it gets flushed and your archive_command can also use direct I/O to avoid pulling it into cache. You already know where to change to start experimenting with this. What exactly are you trying to ask? I don't see any risk in forcing O_DIRECT for higher wal_level, but I'm not an expert in WAL and recovery. I'd recommend testing on a non-critical PostgreSQL instance. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Lelundi 22 juillet 2013 09:39:50, Craig Ringer a écrit :<p style=" margin-top:0px; margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> On 07/22/201303:30 PM, Миша Тюрин wrote:<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> > <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > i tell about wal_levelis higher than MINIMAL<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> OK, so you want to be ableto force O_DIRECT for wal_level = archive ?<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p style=" margin-top:0px; margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> I guessthat makes sense if you expect the archive_command to read the<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> file out of the RAID controller'swrite cache before it gets flushed and<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> your archive_command can also use direct I/Oto avoid pulling it into cache.<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> You already know where tochange to start experimenting with this. What<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> exactly are you trying to ask? I don't seeany risk in forcing O_DIRECT<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> for higher wal_level, but I'm not an expert in WAL and recovery.I'd<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;-qt-user-state:0;">> recommend testing on a non-critical PostgreSQL instance.<p style="-qt-paragraph-type:empty;margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; -qt-user-state:0;">IIRC there is also some fadvise() call to flush the buffer cache whenusing <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;-qt-user-state:0;">'minimal', but not when using archiving of WAL.<p style=" margin-top:0px; margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I'm unsure howthis has been tunned with streaming replication addition.<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">see xlog.c|h<pstyle="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;text-indent:0px; -qt-user-state:0;">-- <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Cédric Villemain +33 (0)6 20 3022 52<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;-qt-user-state:0;">http://2ndQuadrant.fr/<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">PostgreSQL: Support 24x7 - Développement, Expertiseet Formation<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;-qt-block-indent:0; text-indent:0px; ">