Re: [HACKERS] Problems with >2GB tables on Linux 2.0 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Problems with >2GB tables on Linux 2.0 |
Date | |
Msg-id | 17722.918518134@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Problems with >2GB tables on Linux 2.0 (Thomas Reinke <reinke@e-softinc.com>) |
Responses |
Re: [HACKERS] Problems with >2GB tables on Linux 2.0
(Peter T Mount <peter@retep.org.uk>)
|
List | pgsql-hackers |
Peter T Mount wrote: >> How about dropping the suffix, so you would have: >> .../data/2/tablename >> Doing that doesn't mean having to increase the filename buffer size, just >> the format and arg order (from %s.%d to %d/%s). I thought of that also, but concluded it was a bad idea, because it means you cannot symlink several of the /n subdirectories to the same place. It also seems just plain risky/errorprone to have different files named the same thing... >> I'd think we could add a test when the new segment is created for the >> symlink/directory. If it doesn't exist, then create it. Absolutely, the system would need to auto-create a /n subdirectory if one didn't already exist. Thomas Reinke <reinke@e-softinc.com> writes: > ... I'm not entirely sure that this is an effective > solution to data distribution. Well, I'm certain we could do better if we wanted to put some direct effort into that issue, but we can get a usable scheme this way with practically no effort except writing a little how-to documentation. Assume you have N big tables where you know what N is. (You probably have a lot of little tables as well, which we assume can be ignored for the purposes of space allocation.) If you configure the max file size as M megabytes, the toplevel data directory will have M * N megabytes of stuff (plus little files). If all the big tables are about the same size, say K * M meg apiece, then you wind up with K-1 subdirectories each also containing M * N meg, which you can readily scatter across different filesystems by setting up the subdirectories as symlinks. In practice the later subdirectories are probably less full because the big tables aren't all equally big, but you can put more of them on a single filesystem to make up for that. If N varies considerably over time then this scheme doesn't work so well, but I don't see any scheme that would cope with a very variable database without physically moving files around every so often. When we get to the point where people are routinely complaining what a pain in the neck it is to manage big databases this way, it'll be time enough to improve the design and write some scripts to help rearrange files on the fly. Right now, I would just like to see a scheme that doesn't require the dbadmin to symlink each individual table file in order to split a big database. (It could probably be argued that even doing that much is ahead of the demand, but since it's so cheap to provide this little bit of functionality we might as well do it.) > I'd suggest making the max file size 1 Gig default, configurable > someplace, and solving the data distribution as a separate effort. We might actually be saying the same thing, if by that remark you mean that we can come back later and write "real" data distribution management tools. I'm just pointing out that given a configurable max file size we can have a primitive facility almost for free. regards, tom lane
pgsql-hackers by date: