Thread: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?

Hi all,


On 2019/10/09, I posted a question here concerning PostgreSQL I/O, the
(primarily) Linux fsync problem and my lecturer's attitude to
PostgreSQL (text of that email is at the bottom of this post).


I asked why the fsync issue didn't affect Oracle and/or MySQL for example?

As far as I can see, this was because Oracle uses Direct I/O whereas
PostgreSQL uses Buffered I/O. I know that the issue has been resolved
and no longer affects currently supported versions - but I'm still
curious...


From here: https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/

> Whereas, writing the modified/dirty buffers to datafiles from shared buffers is always through Buffered IO.


Now, I'm not quite sure that I completely comprehend matters: Is there
a difference between Asynchronous I/O and Buffered I/O?

If so, could some kind person point me to referernces that explain the
difference?


But, my foggy brain aside, I read (can't find URL - paraphrasing):
PostgreSQL is happy to let the kernel take the load off the server and
look after I/O - that's fine, but you'd better be able to trust your
kernel.


However, MySQL also uses Asynchronous I/O by default - or does it?

From here:
https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html#:~:text=InnoDB%20uses%20the%20asynchronous%20I,which%20is%20enabled%20by%20default,
we have:

> InnoDB uses the asynchronous I/O subsystem (native AIO) on Linux to perform read-ahead and write requests for data
filepages. This behavior is controlled by the innodb_use_native_aio configuration option, which is enabled by default. 


Now, I haven't (AFAIK) seen references to problems with this fsync
issue on MySQL. Maybe they're so used to losing data, nobody noticed?
:-)


Seriously though, it does beg the question - why did this cause a
major issue for PostgreSQL but not for MySQL?


Is it because of (a) difference(s) between Asynchronous I/O and
Buffered I/O asked about above?


A couple of pointers (excuse the pun!) about this issue would be
great. Some stuff which shows the difference between Direct I/O and
the others would also be helpful. I seem to remember there was mention
of this for PostgreSQL but that it would be a mulit-year project. Why
so if PostgreSQL can already use Direct I/O for the WAL logs?

Obviously, I can search and I have been - but I'd appreciate material
from people here who can sort the wheat from the chaff and point me to
solid references. If here is not a suitable forum, then kindly
redirect me.

TIA and rgs,


Pól Ua...



=============================

> 2019/10/09 Is my lecturer wrong about PostgreSQL? I think he is!


I recently started a Masters in Computer Science (and not at the
institution in my email address).

One of my courses is "Advanced Databases" - yummy I thought - it's not
even compulsory for me but I just *_had_* to take this module. The
lecturer is a bit of an Oracle fan-boy (ACE director no less...
hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to
do my dissertation with him. So, we're having a chat and I make plain
my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me
that there are problems with random block corruption with PostgreSQL.
I said "really" and before that conversation could go any further,
another student came over and asked a question.

So, I toddled off and did some research - I had heard something about
this before (vague fuzzy memories) of a problem with the Linux kernel
so I searched for a bit and duly dug up a couple of pages

https://lwn.net/Articles/752063/ : PostgreSQL's fsync() surprise - and

https://news.ycombinator.com/item?id=19238121 : Linux Fsync Issue for
Buffered IO and Its Preliminary Fix for PostgreSQL

So, this week I go back to my lecturer and say, yep, there was some
issue but it was a Linux kernel problem and not PostgreSQL's fault and
has been resolved.

He tells me that he knew about that but that there was another issue
(he had "spoken to people" at meetings!). I said "well, why isn't it
fixed?" and he replied "where's the impetus?" to which I responded
(quite shocked at this stage) something like "well, I know that the
core team values correctness very highly" to which he came back with
"yes, but they have no commercial imperative to fix anything - they
have to wait until somebody is capable enough and interested enough to
do the work". He then muttered something about this mysterious flaw
having been fixed in EnterpriseDB.

At this point, I lost interest. Having lurked on lists and going by my
general "gut feeling" - if there was a serious issue causing
irrecoverable block corruption, I'm pretty sure that it would be "all
hands on deck" until this problem had been solved and "nice-to-haves"
(GENERATED AS... for example) would have been parked till then.

Now, I have four questions:

1) Is my lecturer full of it or does he really have a point?

2) The actual concrete acknowledged problem with fsync that affected
PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it
was so rare that it never became apparent - it wasn't that obvious
with PostgreSQL either - one of those rare and intermittent problems?

3) Were there ever any problems with BSD?

4) What is the OS of choice for *_serious_* PostgreSQL installations?

I hope that I have been clear, but should anyone require any
clarification, please don't hesitate to ask me.

Tia and rgs,

Pól...




This wiki page.
It has PR references for mysql and mongo for the fsycnc issue.


I'd leave the more intellectual brainstorming to the experts.

Also, ask for concrete references / reproducible scenarios for opinions if you care.
else it leads to rumours :)

Thanks,
Vijay




On Sun, 2 May 2021 at 19:17, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
Hi all,


On 2019/10/09, I posted a question here concerning PostgreSQL I/O, the
(primarily) Linux fsync problem and my lecturer's attitude to
PostgreSQL (text of that email is at the bottom of this post).


I asked why the fsync issue didn't affect Oracle and/or MySQL for example?

As far as I can see, this was because Oracle uses Direct I/O whereas
PostgreSQL uses Buffered I/O. I know that the issue has been resolved
and no longer affects currently supported versions - but I'm still
curious...


From here: https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/

> Whereas, writing the modified/dirty buffers to datafiles from shared buffers is always through Buffered IO.


Now, I'm not quite sure that I completely comprehend matters: Is there
a difference between Asynchronous I/O and Buffered I/O?

If so, could some kind person point me to referernces that explain the
difference?


But, my foggy brain aside, I read (can't find URL - paraphrasing):
PostgreSQL is happy to let the kernel take the load off the server and
look after I/O - that's fine, but you'd better be able to trust your
kernel.


However, MySQL also uses Asynchronous I/O by default - or does it?

From here: https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html#:~:text=InnoDB%20uses%20the%20asynchronous%20I,which%20is%20enabled%20by%20default,
we have:

> InnoDB uses the asynchronous I/O subsystem (native AIO) on Linux to perform read-ahead and write requests for data file pages. This behavior is controlled by the innodb_use_native_aio configuration option, which is enabled by default.


Now, I haven't (AFAIK) seen references to problems with this fsync
issue on MySQL. Maybe they're so used to losing data, nobody noticed?
:-)


Seriously though, it does beg the question - why did this cause a
major issue for PostgreSQL but not for MySQL?


Is it because of (a) difference(s) between Asynchronous I/O and
Buffered I/O asked about above?


A couple of pointers (excuse the pun!) about this issue would be
great. Some stuff which shows the difference between Direct I/O and
the others would also be helpful. I seem to remember there was mention
of this for PostgreSQL but that it would be a mulit-year project. Why
so if PostgreSQL can already use Direct I/O for the WAL logs?

Obviously, I can search and I have been - but I'd appreciate material
from people here who can sort the wheat from the chaff and point me to
solid references. If here is not a suitable forum, then kindly
redirect me.

TIA and rgs,


Pól Ua...



=============================

> 2019/10/09 Is my lecturer wrong about PostgreSQL? I think he is!


I recently started a Masters in Computer Science (and not at the
institution in my email address).

One of my courses is "Advanced Databases" - yummy I thought - it's not
even compulsory for me but I just *_had_* to take this module. The
lecturer is a bit of an Oracle fan-boy (ACE director no less...
hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to
do my dissertation with him. So, we're having a chat and I make plain
my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me
that there are problems with random block corruption with PostgreSQL.
I said "really" and before that conversation could go any further,
another student came over and asked a question.

So, I toddled off and did some research - I had heard something about
this before (vague fuzzy memories) of a problem with the Linux kernel
so I searched for a bit and duly dug up a couple of pages

https://lwn.net/Articles/752063/ : PostgreSQL's fsync() surprise - and

https://news.ycombinator.com/item?id=19238121 : Linux Fsync Issue for
Buffered IO and Its Preliminary Fix for PostgreSQL

So, this week I go back to my lecturer and say, yep, there was some
issue but it was a Linux kernel problem and not PostgreSQL's fault and
has been resolved.

He tells me that he knew about that but that there was another issue
(he had "spoken to people" at meetings!). I said "well, why isn't it
fixed?" and he replied "where's the impetus?" to which I responded
(quite shocked at this stage) something like "well, I know that the
core team values correctness very highly" to which he came back with
"yes, but they have no commercial imperative to fix anything - they
have to wait until somebody is capable enough and interested enough to
do the work". He then muttered something about this mysterious flaw
having been fixed in EnterpriseDB.

At this point, I lost interest. Having lurked on lists and going by my
general "gut feeling" - if there was a serious issue causing
irrecoverable block corruption, I'm pretty sure that it would be "all
hands on deck" until this problem had been solved and "nice-to-haves"
(GENERATED AS... for example) would have been parked till then.

Now, I have four questions:

1) Is my lecturer full of it or does he really have a point?

2) The actual concrete acknowledged problem with fsync that affected
PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it
was so rare that it never became apparent - it wasn't that obvious
with PostgreSQL either - one of those rare and intermittent problems?

3) Were there ever any problems with BSD?

4) What is the OS of choice for *_serious_* PostgreSQL installations?

I hope that I have been clear, but should anyone require any
clarification, please don't hesitate to ask me.

Tia and rgs,

Pól...




--
Thanks,
Vijay
Mumbai, India
On 5/2/21 8:46 AM, Pól Ua Laoínecháin wrote:
[snip]
Now, I'm not quite sure that I completely comprehend matters: Is there
a difference between Asynchronous I/O and Buffered I/O?

* Asynchronous (a-syn-chron-ous) is an adjective which means "not together with time".
* Buffered means "read more than you need at the moment, and then do processing to/from a cache".

They are not mutually exclusive.

Their antonyms are
* Synchronous (syn-chron-ous): together with time.
* Direct, where you read/write only what you need at the moment, directly to/from the IO device.

Thus, async IO is where you tell the IO subsystem that you need something, and then go off and do something else; the IO system interrupts you when the data has been delivered.
Synchronous IO is where you request IO and then wait for the data.

--
Angular momentum makes the world go 'round.
On 2021-05-02 14:46:41 +0100, Pól Ua Laoínecháin wrote:
> Now, I'm not quite sure that I completely comprehend matters: Is there
> a difference between Asynchronous I/O and Buffered I/O?

Yes.

Buffered I/O means that there is a buffer (or maybe several layers of
buffers) between the application and the device: A read request might
return data which has been cached from a previous read request without
touching the device. And a write request will return as soon as the data
is written to the buffer (it will be written to the device at some later
time). The opposite of buffered I/O is direct I/O, which always talks
directly to the device and doesn't use any buffers.

Asynchronous I/O refers to a different programming model: Any read or
write request only initiates the data transfer and returns immediately.
The application will later be notified when the request is finished.
This is very different from the traditional (in Unix) synchronous
programming model where a read would block until the data was actually
available and a write would block until the data was safely transferred
to the OS's buffer cache (and can be overwritten by the application).

In theory all four combinations (buffered synchronous, buffered
asynchronous, direct synchronous, direct asynchronous) are possible, but
some OS's may not implement all of them.

        hp


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment
Hi Vijay, and thanks for your input.

> This wiki page.
> It has PR references for mysql and mongo for the fsycnc issue.
> Fsync Errors - PostgreSQL wiki


OK - so that shows that it did affect MySQL and MongoDB... I'm
surprised that there wasn't more of a brouhaha over it so - what with
MySQL being so prevalent - there appears to have been more coverage of
the issue on the PostgreSQL side - even given that it was first
spotted by PG developers.


> I'd leave the more intellectual brainstorming to the experts.

Indeed - some of those dev list conversations can be very involved...

Rgs,


Pól...

> Vijay



Hi Ron, and thanks for your input.

> > Now, I'm not quite sure that I completely comprehend matters: Is there
> > a difference between Asynchronous I/O and Buffered I/O?

> * Asynchronous (a-syn-chron-ous) is an adjective which means "not together with time".
> * Buffered means "read more than you need at the moment, and then do processing to/from a cache".
> Their antonyms are
> * Synchronous (syn-chron-ous): together with time.
> * Direct, where you read/write only what you need at the moment, directly to/from the IO device.

> Thus, async IO is where you tell the IO subsystem that you need something, and then go off and do something else; the
IOsystem interrupts you when the data has been delivered. 
> Synchronous IO is where you request IO and then wait for the data.

Grand - that's the conclusion I was coming to myself...

If you (or anyone) would have any good detailed technical references
which explain these issues, I'd be very grateful.

Rgs,

Pól...



Hi Peter, and thanks for your input.

> > Now, I'm not quite sure that I completely comprehend matters: Is there
> > a difference between Asynchronous I/O and Buffered I/O?

> Buffered I/O... <explanation snipped>

So, with buffered I/O, I might write something to the db (with a
commit) and would get a "yes, written" response (almost instantanous),
even though the data hadn't actually touched the disk?

And any subsequent (independent - separate user) read request will see
that data as having been "written" even though it hasn't actually
reached the disk yet? It would do this, because read requests would
first look through buffers before actually checking the disk?

> Asynchronous I/O refers to a different programming model: Any read or
> write request only initiates the data transfer and returns immediately.
> The application will later be notified when the request is finished.
> This is very different from the traditional (in Unix) synchronous
> programming model where a read would block until the data was actually
> available and a write would block until the data was safely transferred
> to the OS's buffer cache (and can be overwritten by the application).


So, any separate read would see what? What does PostgreSQL do? Or is
this configurable?

Also, how does the WAL fit into all of this?

As I responded to others who have taken the trouble to reply to me -
if you (or anyone) would have some detailed references covering these
issues (either PG specific or generally), I'd be very grateful.

MfG und vielen Dank.

Pól...

>         hp