Thread: Beginner Question:Why it always make sure that the postgres better than common csv file storage in disaster recovery?
Beginner Question:Why it always make sure that the postgres better than common csv file storage in disaster recovery?
From
Wen Yi
Date:
I am a student who are interesting in database kernel.When I am reviewing my database course,a question make me confused.
In file system,if a error happen when I insert some data into data saving system,the whole data exists will be broken and can't recovery anymore.
But when I check the code in postgres,I found the postgres also use the write function(That! is a UNIX file system api)
My question is:
Since it's all built on top of the file system,why it always make sure that the postgres better than common csv file storage in disaster recovery?
Thanks in advance!
Re: Beginner Question:Why it always make sure that the postgres better than common csv file storage in disaster recovery?
From
Adrian Klaver
Date:
On 7/3/22 20:06, Wen Yi wrote: > I am a student who are interesting in database kernel.When I am > reviewing my database course,a question make me confused. > > In file system,if a error happen when I insert some data into data > saving system,the whole data exists will be broken and can't recovery > anymore. > > But when I check the code in postgres,I found the postgres also use the > write function(That! is a UNIX file system api) > > My question is: > > Since it's all built on top of the file system,why it always make sure > that the postgres better than common csv file storage in disaster recovery? https://www.postgresql.org/docs/current/wal.html > > Thanks in advance! -- Adrian Klaver adrian.klaver@aklaver.com
Re: Beginner Question:Why it always make sure that the postgres better than common csv file storage in disaster recovery?
From
Tom Lane
Date:
Wen Yi <chuxuec@outlook.com> writes: > Since it's all built on top of the file system,why it always make sure > that the postgres better than common csv file storage in disaster > recovery? Sure, Postgres cannot be any more reliable than the filesystem it's sitting on top of (nor the physical storage underneath that, etc etc). However, if you're comparing to some program that just writes a flat file in CSV format or the like, that program is probably not even *trying* to offer reliable storage. Some things that are likely missing: * POSIX-compatible file systems promise nothing about the durability of data that hasn't been successfully fsync'd. You need to issue fsync's, and you need a plan about what to do if you crash between writing some data and getting an fsync confirmation, because maybe those bits are safely down on disk, or maybe they aren't, or maybe just some of them are. * If you did crash partway through an update, you'd like some assurances that the user-visible state after recovery will be what it was before starting the failed update. That CSV-using program probably isn't even trying to do that. Getting back to a consistent state after a crash typically involves some scheme along the lines of replaying a write-ahead log. * None of this is worth anything if you can't even tell the difference between good data and bad data. CSV is pretty low on redundancy --- not as bad as some formats, sure, but it's far from checkable. There's more to it than that, but if there's not any attention to crash recovery then it's not what I'd call a database. The filesystem alone won't promise much here. regards, tom lane