Home > mailing lists

Re: Large files for relations - Mailing list pgsql-hackers

From	MARK CALLAGHAN
Subject	Re: Large files for relations
Date	May 15, 2023 16:43:17
Msg-id	CAFbpF8OaxX+ZhKb=XTnLxGgJZxC8iTxEF_YeNEjwWWZNG1tAEQ@mail.gmail.com Whole thread
In response to	Re: Large files for relations (Thomas Munro <thomas.munro@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, May 12, 2023 at 4:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, May 13, 2023 at 4:41 AM MARK CALLAGHAN <mdcallag@gmail.com> wrote:
> Repeating what was mentioned on Twitter, because I had some experience with the topic. With fewer files per table there will be more contention on the per-inode mutex (which might now be the per-inode rwsem). I haven't read filesystem source in a long time. Back in the day, and perhaps today, it was locked for the duration of a write to storage (locked within the kernel) and was briefly locked while setting up a read.
>
> The workaround for writes was one of:
> 1) enable disk write cache or use battery-backed HW RAID to make writes faster (yes disks, I encountered this prior to 2010)
> 2) use XFS and O_DIRECT in which case the per-inode mutex (rwsem) wasn't locked for the duration of a write
>
> I have a vague memory that filesystems have improved in this regard.

(I am interpreting your "use XFS" to mean "use XFS instead of ext4".)

Yes, although when the decision was made it was probably ext-3 -> XFS. We suffered from fsync a file == fsync the filesystem

because MySQL binlogs use buffered IO and are appended on write. Switching from ext-? to XFS was an easy perf win

so I don't have much experience with ext-? over the past decade.

Right, 80s file systems like UFS (and I suspect ext and ext2, which

Late 80s is when I last hacked on Unix fileys code, excluding browsing XFS and ext source. Unix was easy back then -- one big kernel lock covers everything.

some time sooner). Currently our code believes that it is not safe to
call fdatasync() for files whose size might have changed. There is no

Long ago we added code for InnoDB to avoid fsync/fdatasync in some cases when O_DIRECT was used. While great for performance
we also forgot to make sure they were still done when files were extended. Eventually we fixed that.

Thanks for all of the details.

Mark Callaghan
mdcallag@gmail.com

pgsql-hackers by date:

From: Bruce Momjian
Date: 15 May 2023, 16:22:38
Subject: Re: cutting down the TODO list thread

From: "Drouvot, Bertrand"
Date: 15 May 2023, 16:45:23
Subject: Re: Autogenerate some wait events code and documentation

Re: Large files for relations - Mailing list pgsql-hackers

Previous

Next