Thread: Minimizing disk space

Minimizing disk space

From
Adrian von Bidder
Date:
Howdy!

I'm currently in a MySQL -> PostgreSQL migration project (Go, go, go, ...
shall I cc: slashdot, too? ;-)

Part of this is in embedded context, where a (diskless) embedded computer
runs from flash.  Since we don't want to stress the flash too much, the db
is actually loaded from a dump at boot and run off a ramdisk (and dumped
back to flash periodically.)

So all the nice database crash-safety that pg offers goes down the drain...

With this in mind: what can we tune to minimize diskspace (RAM-disk space)
usage?  And, possibly, RAM usage as well (disk access is, after all, almost
as fast as RAM access in this scenario)

With our test dump, the db (after import) is ca. 300M on disk, ca. half in
WAL files (pg_xlog.)  If I could mostly get rid of the WAL (keep it to a
bare minimum and run pg without fsync, something like that), the remaining
160 to 180M would be ok.

Next thing would be: are there tunables (compile time?) to minimize the
actual size of the rows?

Your input is valued, thanks in advance.

cheers
-- vbi

[... for the peanut gallery: we're migrating mostly because the cost of the
MySQL licenses is beginning to hurt us.]


--
A budget is just a method of worrying before you spend money, as well
as afterward.

Attachment

Re: Minimizing disk space

From
Tom Lane
Date:
Adrian von Bidder <avbidder@fortytwo.ch> writes:
> With our test dump, the db (after import) is ca. 300M on disk, ca. half in
> WAL files (pg_xlog.)  If I could mostly get rid of the WAL (keep it to a
> bare minimum and run pg without fsync, something like that), the remaining
> 160 to 180M would be ok.

checkpoint_segments is the main tunable determining the amount of WAL
space.  Lowering it will increase the amount of write activity, but
for a ramdisk I don't think you care.

Depending on your usage patterns, it might be worth making autovacuum
more aggressive to ensure that table bloat stays under control.

            regards, tom lane

Re: Minimizing disk space

From
"Joshua D. Drake"
Date:
On Wed, 2010-01-06 at 16:39 +0100, Adrian von Bidder wrote:
> Howdy!
>
> I'm currently in a MySQL -> PostgreSQL migration project (Go, go, go, ...
> shall I cc: slashdot, too? ;-)
>
> Part of this is in embedded context, where a (diskless) embedded computer
> runs from flash.  Since we don't want to stress the flash too much, the db
> is actually loaded from a dump at boot and run off a ramdisk (and dumped
> back to flash periodically.)
>
> So all the nice database crash-safety that pg offers goes down the drain...
>
> With this in mind: what can we tune to minimize diskspace (RAM-disk space)
> usage?  And, possibly, RAM usage as well (disk access is, after all, almost
> as fast as RAM access in this scenario)
>
> With our test dump, the db (after import) is ca. 300M on disk, ca. half in
> WAL files (pg_xlog.)  If I could mostly get rid of the WAL (keep it to a
> bare minimum and run pg without fsync, something like that), the remaining
> 160 to 180M would be ok.
>
> Next thing would be: are there tunables (compile time?) to minimize the
> actual size of the rows?
>
> Your input is valued, thanks in advance.

Wow, here are some things but honestly Pg may not be the DB you are
looking for.

You can set checkpoint_segments to 3. That will use 48 megs at any given
time but will force a checkpoint (likely quite a few) during your load.
This will slow the load down.

As far as rows... there isn't really anything you can do. The data and
its representation is the data.

I thoroughly applaud your willingness to move to Pg but in this case
something like SQLite or even Ingres may be a better solution for you.

Sincerely,

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

Re: Minimizing disk space

From
Scott Marlowe
Date:
On Wed, Jan 6, 2010 at 8:39 AM, Adrian von Bidder <avbidder@fortytwo.ch> wrote:
> Howdy!
> With our test dump, the db (after import) is ca. 300M on disk, ca. half in
> WAL files (pg_xlog.)  If I could mostly get rid of the WAL (keep it to a
> bare minimum and run pg without fsync, something like that), the remaining
> 160 to 180M would be ok.
>
> Next thing would be: are there tunables (compile time?) to minimize the
> actual size of the rows?
>
> Your input is valued, thanks in advance.

Have you looked at firebird and / or sqllite?  They're both far better
suited for this type of application.

Re: Minimizing disk space

From
Greg Smith
Date:
Adrian von Bidder wrote:
> With our test dump, the db (after import) is ca. 300M on disk, ca. half in
> WAL files (pg_xlog.)  If I could mostly get rid of the WAL (keep it to a
> bare minimum and run pg without fsync, something like that), the remaining
> 160 to 180M would be ok.
>
Drop checkpoint_segments=1 and checkpoint_completion_target=0 before
running your load.  It will take longer and the system will run slower,
but it will reduce the typical WAL footprint.  Normally the number of
16MB WAL files will never been more than (2 +
checkpoint_completion_target) * checkpoint_segments + 1, although the
burst size can be bigger than that under load.  You should settle into
where there's only 48MB being used there with those parameter changes.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Minimizing disk space

From
"Joshua D. Drake"
Date:
On Wed, 2010-01-06 at 16:39 +0100, Adrian von Bidder wrote:
> Howdy!
>
> I'm currently in a MySQL -> PostgreSQL migration project (Go, go, go, ...
> shall I cc: slashdot, too? ;-)
>
> Part of this is in embedded context, where a (diskless) embedded computer
> runs from flash.  Since we don't want to stress the flash too much, the db
> is actually loaded from a dump at boot and run off a ramdisk (and dumped
> back to flash periodically.)
>
> So all the nice database crash-safety that pg offers goes down the drain...
>
> With this in mind: what can we tune to minimize diskspace (RAM-disk space)
> usage?  And, possibly, RAM usage as well (disk access is, after all, almost
> as fast as RAM access in this scenario)
>
> With our test dump, the db (after import) is ca. 300M on disk, ca. half in
> WAL files (pg_xlog.)  If I could mostly get rid of the WAL (keep it to a
> bare minimum and run pg without fsync, something like that), the remaining
> 160 to 180M would be ok.
>
> Next thing would be: are there tunables (compile time?) to minimize the
> actual size of the rows?
>
> Your input is valued, thanks in advance.

Wow, here are some things but honestly Pg may not be the DB you are
looking for.

You can set checkpoint_segments to 3. That will use 48 megs at any given
time but will force a checkpoint (likely quite a few) during your load.
This will slow the load down.

As far as rows... there isn't really anything you can do. The data and
its representation is the data.

I thoroughly applaud your willingness to move to Pg but in this case
something like SQLite or even Ingres may be a better solution for you.

Sincerely,

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.


Re: Minimizing disk space

From
Adrian von Bidder
Date:
Heyho!

On Wednesday 06 January 2010 18.26:05 Joshua D. Drake wrote:
...
> > With this in mind: what can we tune to minimize diskspace (RAM-disk
> > space) usage?

[thanks to all for your input!]

> Wow, here are some things but honestly Pg may not be the DB you are
> looking for.

Well, parts of the code are shared with a server side component, which we
already ported from Oracle to pg, so licensing costs aside we were hoping
avoid having to maintain yet another different type of db backend (build up
know how, ...)

But if using pg won't work, I guess sqlite (or Ingres, which I haven't
worked with before) would be an alternative.  Although, for example with
SQLite, I'd have to check carefully if we don't need more features than it
has ...

cheers
-- vbi

--
featured product: vim - http://vim.org

Attachment

Re: Minimizing disk space

From
Craig Ringer
Date:
On 7/01/2010 11:27 PM, Adrian von Bidder wrote:
> Heyho!
>
> On Wednesday 06 January 2010 18.26:05 Joshua D. Drake wrote:
> ...
>>> With this in mind: what can we tune to minimize diskspace (RAM-disk
>>> space) usage?
>
> [thanks to all for your input!]
>
>> Wow, here are some things but honestly Pg may not be the DB you are
>> looking for.
>
> Well, parts of the code are shared with a server side component, which we
> already ported from Oracle to pg, so licensing costs aside we were hoping
> avoid having to maintain yet another different type of db backend (build up
> know how, ...)
>
> But if using pg won't work, I guess sqlite (or Ingres, which I haven't
> worked with before) would be an alternative.  Although, for example with
> SQLite, I'd have to check carefully if we don't need more features than it
> has ...

Firebird is also designed for embedded use.

Pg will work - it just might not be as convenient and easy to use in
that environment. If you're using Pg elsewhere, then it's going to be a
trade-off between the cost of supporting another DB and the work
required to make Pg do what you want.

--
Craig Ringer