Re: Large files for relations - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Large files for relations |
Date | |
Msg-id | CA+hUKGKvcbdTnrKtD33oWf3q1RGYiPz8=wRDdPPtJgOTfsvDOw@mail.gmail.com Whole thread Raw |
In response to | Re: Large files for relations (Jim Mlodgenski <jimmy76@gmail.com>) |
Responses |
Re: Large files for relations
Re: Large files for relations |
List | pgsql-hackers |
On Fri, May 12, 2023 at 8:16 AM Jim Mlodgenski <jimmy76@gmail.com> wrote: > On Mon, May 1, 2023 at 9:29 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> I am not aware of any modern/non-historic filesystem[2] that can't do >> large files with ease. Anyone know of anything to worry about on that >> front? > > There is some trouble in the ambiguity of what we mean by "modern" and "large files". There are still a large number ofusers of ext4 where the max file size is 16TB. Switching to a single large file per relation would effectively cut themax table size in half for those users. How would a user with say a 20TB table running on ext4 be impacted by this change? Hrmph. Yeah, that might be a bit of a problem. I see it discussed in various places that MySQL/InnoDB can't have tables bigger than 16TB on ext4 because of this, when it's in its default one-file-per-object mode (as opposed to its big-tablespace-files-to-hold-all-the-objects mode like DB2, Oracle etc, in which case I think you can have multiple 16TB segment files and get past that ext4 limit). It's frustrating because 16TB is still really, really big and you probably should be using partitions, or more partitions, to avoid all kinds of other scalability problems at that size. But however hypothetical the scenario might be, it should work, and this is certainly a plausible argument against the "aggressive" plan described above with the hard cut-off where we get to drop the segmented mode. Concretely, a 20TB pg_upgrade in copy mode would fail while trying to concatenate with the above patches, so you'd have to use link or reflink mode (you'd probably want to use that anyway unless due to sheer volume of data to copy otherwise, since ext4 is also not capable of block-range sharing), but then you'd be out of luck after N future major releases, according to that plan where we start deleting the code, so you'd need to organise some smaller partitions before that time comes. Or pg_upgrade to a target on xfs etc. I wonder if a future version of extN will increase its max file size. A less aggressive version of the plan would be that we just keep the segment code for the foreseeable future with no planned cut off, and we make all of those "piggy back" transformations that I showed in the patch set optional. For example, I had it so that CLUSTER would quietly convert your relation to large format, if it was still in segmented format (might as well if you're writing all the data out anyway, right?), but perhaps that could depend on a GUC. Likewise for base backup. Etc. Then someone concerned about hitting the 16TB limit on ext4 could opt out. Or something like that. It seems funny though, that's exactly the user who should want this feature (they have 16,000 relation segment files).
pgsql-hackers by date: