PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL? - Mailing list pgsql-general
From | Pól Ua Laoínecháin |
---|---|
Subject | PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL? |
Date | |
Msg-id | CAF4RT5S-yddHmGzxZwpJZwxLkSzk22F-FuxpVrwaD+Pn03evaQ@mail.gmail.com Whole thread Raw |
Responses |
Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?
Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL? Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL? |
List | pgsql-general |
Hi all, On 2019/10/09, I posted a question here concerning PostgreSQL I/O, the (primarily) Linux fsync problem and my lecturer's attitude to PostgreSQL (text of that email is at the bottom of this post). I asked why the fsync issue didn't affect Oracle and/or MySQL for example? As far as I can see, this was because Oracle uses Direct I/O whereas PostgreSQL uses Buffered I/O. I know that the issue has been resolved and no longer affects currently supported versions - but I'm still curious... From here: https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/ > Whereas, writing the modified/dirty buffers to datafiles from shared buffers is always through Buffered IO. Now, I'm not quite sure that I completely comprehend matters: Is there a difference between Asynchronous I/O and Buffered I/O? If so, could some kind person point me to referernces that explain the difference? But, my foggy brain aside, I read (can't find URL - paraphrasing): PostgreSQL is happy to let the kernel take the load off the server and look after I/O - that's fine, but you'd better be able to trust your kernel. However, MySQL also uses Asynchronous I/O by default - or does it? From here: https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html#:~:text=InnoDB%20uses%20the%20asynchronous%20I,which%20is%20enabled%20by%20default, we have: > InnoDB uses the asynchronous I/O subsystem (native AIO) on Linux to perform read-ahead and write requests for data filepages. This behavior is controlled by the innodb_use_native_aio configuration option, which is enabled by default. Now, I haven't (AFAIK) seen references to problems with this fsync issue on MySQL. Maybe they're so used to losing data, nobody noticed? :-) Seriously though, it does beg the question - why did this cause a major issue for PostgreSQL but not for MySQL? Is it because of (a) difference(s) between Asynchronous I/O and Buffered I/O asked about above? A couple of pointers (excuse the pun!) about this issue would be great. Some stuff which shows the difference between Direct I/O and the others would also be helpful. I seem to remember there was mention of this for PostgreSQL but that it would be a mulit-year project. Why so if PostgreSQL can already use Direct I/O for the WAL logs? Obviously, I can search and I have been - but I'd appreciate material from people here who can sort the wheat from the chaff and point me to solid references. If here is not a suitable forum, then kindly redirect me. TIA and rgs, Pól Ua... ============================= > 2019/10/09 Is my lecturer wrong about PostgreSQL? I think he is! I recently started a Masters in Computer Science (and not at the institution in my email address). One of my courses is "Advanced Databases" - yummy I thought - it's not even compulsory for me but I just *_had_* to take this module. The lecturer is a bit of an Oracle fan-boy (ACE director no less... hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to do my dissertation with him. So, we're having a chat and I make plain my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me that there are problems with random block corruption with PostgreSQL. I said "really" and before that conversation could go any further, another student came over and asked a question. So, I toddled off and did some research - I had heard something about this before (vague fuzzy memories) of a problem with the Linux kernel so I searched for a bit and duly dug up a couple of pages https://lwn.net/Articles/752063/ : PostgreSQL's fsync() surprise - and https://news.ycombinator.com/item?id=19238121 : Linux Fsync Issue for Buffered IO and Its Preliminary Fix for PostgreSQL So, this week I go back to my lecturer and say, yep, there was some issue but it was a Linux kernel problem and not PostgreSQL's fault and has been resolved. He tells me that he knew about that but that there was another issue (he had "spoken to people" at meetings!). I said "well, why isn't it fixed?" and he replied "where's the impetus?" to which I responded (quite shocked at this stage) something like "well, I know that the core team values correctness very highly" to which he came back with "yes, but they have no commercial imperative to fix anything - they have to wait until somebody is capable enough and interested enough to do the work". He then muttered something about this mysterious flaw having been fixed in EnterpriseDB. At this point, I lost interest. Having lurked on lists and going by my general "gut feeling" - if there was a serious issue causing irrecoverable block corruption, I'm pretty sure that it would be "all hands on deck" until this problem had been solved and "nice-to-haves" (GENERATED AS... for example) would have been parked till then. Now, I have four questions: 1) Is my lecturer full of it or does he really have a point? 2) The actual concrete acknowledged problem with fsync that affected PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it was so rare that it never became apparent - it wasn't that obvious with PostgreSQL either - one of those rare and intermittent problems? 3) Were there ever any problems with BSD? 4) What is the OS of choice for *_serious_* PostgreSQL installations? I hope that I have been clear, but should anyone require any clarification, please don't hesitate to ask me. Tia and rgs, Pól...
pgsql-general by date: