Thread: Minimizing disk space
Howdy! I'm currently in a MySQL -> PostgreSQL migration project (Go, go, go, ... shall I cc: slashdot, too? ;-) Part of this is in embedded context, where a (diskless) embedded computer runs from flash. Since we don't want to stress the flash too much, the db is actually loaded from a dump at boot and run off a ramdisk (and dumped back to flash periodically.) So all the nice database crash-safety that pg offers goes down the drain... With this in mind: what can we tune to minimize diskspace (RAM-disk space) usage? And, possibly, RAM usage as well (disk access is, after all, almost as fast as RAM access in this scenario) With our test dump, the db (after import) is ca. 300M on disk, ca. half in WAL files (pg_xlog.) If I could mostly get rid of the WAL (keep it to a bare minimum and run pg without fsync, something like that), the remaining 160 to 180M would be ok. Next thing would be: are there tunables (compile time?) to minimize the actual size of the rows? Your input is valued, thanks in advance. cheers -- vbi [... for the peanut gallery: we're migrating mostly because the cost of the MySQL licenses is beginning to hurt us.] -- A budget is just a method of worrying before you spend money, as well as afterward.
Attachment
Adrian von Bidder <avbidder@fortytwo.ch> writes: > With our test dump, the db (after import) is ca. 300M on disk, ca. half in > WAL files (pg_xlog.) If I could mostly get rid of the WAL (keep it to a > bare minimum and run pg without fsync, something like that), the remaining > 160 to 180M would be ok. checkpoint_segments is the main tunable determining the amount of WAL space. Lowering it will increase the amount of write activity, but for a ramdisk I don't think you care. Depending on your usage patterns, it might be worth making autovacuum more aggressive to ensure that table bloat stays under control. regards, tom lane
On Wed, 2010-01-06 at 16:39 +0100, Adrian von Bidder wrote: > Howdy! > > I'm currently in a MySQL -> PostgreSQL migration project (Go, go, go, ... > shall I cc: slashdot, too? ;-) > > Part of this is in embedded context, where a (diskless) embedded computer > runs from flash. Since we don't want to stress the flash too much, the db > is actually loaded from a dump at boot and run off a ramdisk (and dumped > back to flash periodically.) > > So all the nice database crash-safety that pg offers goes down the drain... > > With this in mind: what can we tune to minimize diskspace (RAM-disk space) > usage? And, possibly, RAM usage as well (disk access is, after all, almost > as fast as RAM access in this scenario) > > With our test dump, the db (after import) is ca. 300M on disk, ca. half in > WAL files (pg_xlog.) If I could mostly get rid of the WAL (keep it to a > bare minimum and run pg without fsync, something like that), the remaining > 160 to 180M would be ok. > > Next thing would be: are there tunables (compile time?) to minimize the > actual size of the rows? > > Your input is valued, thanks in advance. Wow, here are some things but honestly Pg may not be the DB you are looking for. You can set checkpoint_segments to 3. That will use 48 megs at any given time but will force a checkpoint (likely quite a few) during your load. This will slow the load down. As far as rows... there isn't really anything you can do. The data and its representation is the data. I thoroughly applaud your willingness to move to Pg but in this case something like SQLite or even Ingres may be a better solution for you. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Wed, Jan 6, 2010 at 8:39 AM, Adrian von Bidder <avbidder@fortytwo.ch> wrote: > Howdy! > With our test dump, the db (after import) is ca. 300M on disk, ca. half in > WAL files (pg_xlog.) If I could mostly get rid of the WAL (keep it to a > bare minimum and run pg without fsync, something like that), the remaining > 160 to 180M would be ok. > > Next thing would be: are there tunables (compile time?) to minimize the > actual size of the rows? > > Your input is valued, thanks in advance. Have you looked at firebird and / or sqllite? They're both far better suited for this type of application.
Adrian von Bidder wrote: > With our test dump, the db (after import) is ca. 300M on disk, ca. half in > WAL files (pg_xlog.) If I could mostly get rid of the WAL (keep it to a > bare minimum and run pg without fsync, something like that), the remaining > 160 to 180M would be ok. > Drop checkpoint_segments=1 and checkpoint_completion_target=0 before running your load. It will take longer and the system will run slower, but it will reduce the typical WAL footprint. Normally the number of 16MB WAL files will never been more than (2 + checkpoint_completion_target) * checkpoint_segments + 1, although the burst size can be bigger than that under load. You should settle into where there's only 48MB being used there with those parameter changes. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Wed, 2010-01-06 at 16:39 +0100, Adrian von Bidder wrote: > Howdy! > > I'm currently in a MySQL -> PostgreSQL migration project (Go, go, go, ... > shall I cc: slashdot, too? ;-) > > Part of this is in embedded context, where a (diskless) embedded computer > runs from flash. Since we don't want to stress the flash too much, the db > is actually loaded from a dump at boot and run off a ramdisk (and dumped > back to flash periodically.) > > So all the nice database crash-safety that pg offers goes down the drain... > > With this in mind: what can we tune to minimize diskspace (RAM-disk space) > usage? And, possibly, RAM usage as well (disk access is, after all, almost > as fast as RAM access in this scenario) > > With our test dump, the db (after import) is ca. 300M on disk, ca. half in > WAL files (pg_xlog.) If I could mostly get rid of the WAL (keep it to a > bare minimum and run pg without fsync, something like that), the remaining > 160 to 180M would be ok. > > Next thing would be: are there tunables (compile time?) to minimize the > actual size of the rows? > > Your input is valued, thanks in advance. Wow, here are some things but honestly Pg may not be the DB you are looking for. You can set checkpoint_segments to 3. That will use 48 megs at any given time but will force a checkpoint (likely quite a few) during your load. This will slow the load down. As far as rows... there isn't really anything you can do. The data and its representation is the data. I thoroughly applaud your willingness to move to Pg but in this case something like SQLite or even Ingres may be a better solution for you. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
Heyho! On Wednesday 06 January 2010 18.26:05 Joshua D. Drake wrote: ... > > With this in mind: what can we tune to minimize diskspace (RAM-disk > > space) usage? [thanks to all for your input!] > Wow, here are some things but honestly Pg may not be the DB you are > looking for. Well, parts of the code are shared with a server side component, which we already ported from Oracle to pg, so licensing costs aside we were hoping avoid having to maintain yet another different type of db backend (build up know how, ...) But if using pg won't work, I guess sqlite (or Ingres, which I haven't worked with before) would be an alternative. Although, for example with SQLite, I'd have to check carefully if we don't need more features than it has ... cheers -- vbi -- featured product: vim - http://vim.org
Attachment
On 7/01/2010 11:27 PM, Adrian von Bidder wrote: > Heyho! > > On Wednesday 06 January 2010 18.26:05 Joshua D. Drake wrote: > ... >>> With this in mind: what can we tune to minimize diskspace (RAM-disk >>> space) usage? > > [thanks to all for your input!] > >> Wow, here are some things but honestly Pg may not be the DB you are >> looking for. > > Well, parts of the code are shared with a server side component, which we > already ported from Oracle to pg, so licensing costs aside we were hoping > avoid having to maintain yet another different type of db backend (build up > know how, ...) > > But if using pg won't work, I guess sqlite (or Ingres, which I haven't > worked with before) would be an alternative. Although, for example with > SQLite, I'd have to check carefully if we don't need more features than it > has ... Firebird is also designed for embedded use. Pg will work - it just might not be as convenient and easy to use in that environment. If you're using Pg elsewhere, then it's going to be a trade-off between the cost of supporting another DB and the work required to make Pg do what you want. -- Craig Ringer