Thread: fine tuning for logging server

fine tuning for logging server

From
Florin Andrei
Date:
Hardware: relatively modern Intel CPU, OS and database each on its own
IDE hard-drive (separate IDE cables). Enough memory, i think, but i
can't add too much (not beyond 1GB).
Software: Linux-2.6, pgsql-8.0.1

Function: Essentially a logging server. There are two applications (like
syslog) on the same box that are logging to pgsql, each one to its own
database. There are a few tables in one DB, and exactly one table in the
other.
Most of the time, the apps are just doing mindless INSERTs to the DB.
Every now and then, an admin performs some SELECTs via a PHP interface.

Objective: Make the DB as fast as possible. Of course i'd like the
SELECTs to be fast, but the INSERTs take precedence. It's gotta be able
to swallow as many messages per second as possible given the hardware.

Question: What are the pgsql parameters that need to be tweaked? What
are the guidelines for such a situation?

--
Florin Andrei

http://florin.myip.org/


Re: fine tuning for logging server

From
John Arbash Meinel
Date:
Florin Andrei wrote:

>Hardware: relatively modern Intel CPU, OS and database each on its own
>IDE hard-drive (separate IDE cables). Enough memory, i think, but i
>can't add too much (not beyond 1GB).
>Software: Linux-2.6, pgsql-8.0.1
>
>Function: Essentially a logging server. There are two applications (like
>syslog) on the same box that are logging to pgsql, each one to its own
>database. There are a few tables in one DB, and exactly one table in the
>other.
>Most of the time, the apps are just doing mindless INSERTs to the DB.
>Every now and then, an admin performs some SELECTs via a PHP interface.
>
>Objective: Make the DB as fast as possible. Of course i'd like the
>SELECTs to be fast, but the INSERTs take precedence. It's gotta be able
>to swallow as many messages per second as possible given the hardware.
>
>Question: What are the pgsql parameters that need to be tweaked? What
>are the guidelines for such a situation?
>
>
>
Put pg_xlog onto the same drive as the OS, not the drive with the database.

Do as many inserts per transaction that you can get away with.
100-1000 is pretty good.

Keep the number of indexes and foreign key references low to keep
INSERTS fast.

Keep a few indexes around to keep SELECTs reasonable speedy.

If you are doing lots and lots of logging, need only archival and slow
access for old data, but fast access on new data, consider partitioning
your table, and then using a view to join them back together.

If you are only having a couple processing accessing the db at any given
time, you can probably increase work_mem and maintenance_work_mem a bit.
If you have 1G ram, maybe around 50M for work_mem. But really this is
only if you have 1-3 selects going on at a time.

With 2 disks, and fixed hardware, it's a lot more about configuring your
schema and the application. If you want more performance, adding more
disks is probably the first thing to do.

John
=:->


Attachment

Re: fine tuning for logging server

From
Florin Andrei
Date:
On Wed, 2005-03-30 at 17:50 -0800, Florin Andrei wrote:

> Function: Essentially a logging server. There are two applications (like
> syslog) on the same box that are logging to pgsql, each one to its own
> database. There are a few tables in one DB, and exactly one table in the
> other.
> Most of the time, the apps are just doing mindless INSERTs to the DB.
> Every now and then, an admin performs some SELECTs via a PHP interface.

For performance reasons, i was thinking to keep the tables append-only,
and simply rotate them out every so often (daily?) and delete those
tables that are too old. Is that a good idea?

--
Florin Andrei

http://florin.myip.org/


Re: fine tuning for logging server

From
John Arbash Meinel
Date:
Florin Andrei wrote:

>On Wed, 2005-03-30 at 17:50 -0800, Florin Andrei wrote:
>
>
>
>>Function: Essentially a logging server. There are two applications (like
>>syslog) on the same box that are logging to pgsql, each one to its own
>>database. There are a few tables in one DB, and exactly one table in the
>>other.
>>Most of the time, the apps are just doing mindless INSERTs to the DB.
>>Every now and then, an admin performs some SELECTs via a PHP interface.
>>
>>
>
>For performance reasons, i was thinking to keep the tables append-only,
>and simply rotate them out every so often (daily?) and delete those
>tables that are too old. Is that a good idea?
>
>
>
If you aren't doing updates, then I'm pretty sure the data stays packed
pretty well. I don't know that you need daily rotations, but you
certainly could consider some sort of rotation schedule.

The biggest performance improvement, though, is probably to group
inserts into transactions.
I had an application (in a different db, but it should be relevant),
where using a transaction changed the time from 6min -> 6 sec.
It was just thrashing on all the little inserts that it had to fsync to
disk.

How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s?
I think the hardware should be capable of the 10-100 range if things are
properly configured. Naturally that depends on all sorts of factors, but
it should give you an idea.

John
=:->


Attachment

Re: fine tuning for logging server

From
Florin Andrei
Date:
On Wed, 2005-03-30 at 19:59 -0600, John Arbash Meinel wrote:

> Put pg_xlog onto the same drive as the OS, not the drive with the database.

I forgot to mention: the OS drive is purposefully made very slow - the
write cache is turned off and the FS is Ext3 with data=journal. Is then
still ok to put pg_xlog on it?

The reason: if the power cord is yanked, the OS _must_ boot back up in
good condition. If the DB is corrupted, whatever, nuke it then re-
initialize it. But the OS must survive act-of-god events.

No, there is no uninterruptible power supply. It sucks, but that's how
it is. I cannot change that.

--
Florin Andrei

http://florin.myip.org/


Re: fine tuning for logging server

From
Florin Andrei
Date:
On Wed, 2005-03-30 at 20:11 -0600, John Arbash Meinel wrote:
> Florin Andrei wrote:
> >
> >For performance reasons, i was thinking to keep the tables append-only,
> >and simply rotate them out every so often (daily?) and delete those
> >tables that are too old. Is that a good idea?
> >
> If you aren't doing updates, then I'm pretty sure the data stays packed
> pretty well. I don't know that you need daily rotations, but you
> certainly could consider some sort of rotation schedule.

(sorry for re-asking, i'm coming from a mysql mindset and i still have a
lot to learn about pgsql)

So, it is indeed a bad idea to delete rows from tables, right? Better
just rotate to preserve the performance.

Daily rotation may simplify the application logic - then i'll know that
each table is one day's worth of data.

> The biggest performance improvement, though, is probably to group
> inserts into transactions.

Yes, i know that. I have little control over the apps, though. I'll see
what i can do.

> How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s?

More is better. <shrug>
I guess i'll put it together and give it a spin and see just how far it
goes.

I actually have some controls over the data that's being sent (in some
places i can limit the number of events/second), so that might save me
right there.

--
Florin Andrei

http://florin.myip.org/


Re: fine tuning for logging server

From
John Arbash Meinel
Date:
Florin Andrei wrote:

>On Wed, 2005-03-30 at 19:59 -0600, John Arbash Meinel wrote:
>
>
>
>>Put pg_xlog onto the same drive as the OS, not the drive with the database.
>>
>>
>
>I forgot to mention: the OS drive is purposefully made very slow - the
>write cache is turned off and the FS is Ext3 with data=journal. Is then
>still ok to put pg_xlog on it?
>
>The reason: if the power cord is yanked, the OS _must_ boot back up in
>good condition. If the DB is corrupted, whatever, nuke it then re-
>initialize it. But the OS must survive act-of-god events.
>
>No, there is no uninterruptible power supply. It sucks, but that's how
>it is. I cannot change that.
>
>
>
You don't want write cache for pg_xlog either. And you could always
create a second partition that used reiserfs, or something like that.

If you have to survine "act-of-god" you probably should consider making
the system into a RAID1 instead of 2 separate drives (software RAID
should be fine).

'Cause a much worse act-of-god is having a drive crash. No matter what
you do in software, a failed platter will prevent you from booting. RAID
1 at least means 2 drives have to die.

If you need insert speed, and can't do custom transactions at the
application side, you could try creating a RAM disk for the insert
table, and then create a cron job that bulk pulls it out of that table
and inserts it into the rest of the system. That should let you get a
super-fast insert speed, and the bulk copies should stay reasonably fast.

Just realize that if your cron job stops running, your machine will
slowly eat up all of it's ram, and really not play nice. I think adding
an extra hard-drive is probably the best way to boost performance and
reliability, but if you have a $0 budget, this is a possibility.

John
=:->


Attachment

Re: fine tuning for logging server

From
John Arbash Meinel
Date:
Florin Andrei wrote:

>On Wed, 2005-03-30 at 20:11 -0600, John Arbash Meinel wrote:
>
>
>>Florin Andrei wrote:
>>
>>
>>>For performance reasons, i was thinking to keep the tables append-only,
>>>and simply rotate them out every so often (daily?) and delete those
>>>tables that are too old. Is that a good idea?
>>>
>>>
>>>
>>If you aren't doing updates, then I'm pretty sure the data stays packed
>>pretty well. I don't know that you need daily rotations, but you
>>certainly could consider some sort of rotation schedule.
>>
>>
>
>(sorry for re-asking, i'm coming from a mysql mindset and i still have a
>lot to learn about pgsql)
>
>So, it is indeed a bad idea to delete rows from tables, right? Better
>just rotate to preserve the performance.
>
>
The only problems are if you get a lot of old tuples in places you don't
want them. If you are always appending new values that are increasing,
and you are deleting from the other side, I think vacuum will do a fine
job at cleaning up. It's deleting/updating every 3rd entry that starts
to cause holes (though probably vacuum still does a pretty good job).

>Daily rotation may simplify the application logic - then i'll know that
>each table is one day's worth of data.
>
>
>
I don't think it is necessary, but if you like it, go for it. I would
tend to think that you would want a "today" table, and a "everything
else" table, as it simplifies your queries, and lets you have foreign
keys (though if you are from mysql, you may not be used to using them.)

>>The biggest performance improvement, though, is probably to group
>>inserts into transactions.
>>
>>
>
>Yes, i know that. I have little control over the apps, though. I'll see
>what i can do.
>
>
You could always add a layer inbetween. Or look at my mention of a fast
temp table, with a periodic cron job to pull in the new data. You can
run cron as fast as 1/min which might be just right depending on your needs.
It also means that you could ignore foreign keys and indexes on the temp
table, and only evaluate them on the main table.

>
>
>>How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s?
>>
>>
>
>More is better. <shrug>
>I guess i'll put it together and give it a spin and see just how far it
>goes.
>
>I actually have some controls over the data that's being sent (in some
>places i can limit the number of events/second), so that might save me
>right there.
>
>
>
Good luck. And remember, tuning your queries can be just as important.
(Though if you are doing append only inserts, there probably isn't much
that you can do).

If all you are doing is append only logging, the fastest thing is
probably just a flat file. You could have something that comes along
later to move it into the database. It doesn't really sound like you are
using any features a database provides. (normalization, foreign keys,
indexes, etc.)

John
=:->


Attachment

Re: fine tuning for logging server

From
PFC
Date:
> The reason: if the power cord is yanked, the OS _must_ boot back up in
> good condition. If the DB is corrupted, whatever, nuke it then re-
> initialize it. But the OS must survive act-of-god events.

    Well, in that case :
    - Use reiserfs3 for your disks
    - Use MySQL with MyISAM tables

Re: fine tuning for logging server

From
Michael Adler
Date:
On Wed, Mar 30, 2005 at 08:41:43PM -0600, John Arbash Meinel wrote:
> If all you are doing is append only logging, the fastest thing is
> probably just a flat file. You could have something that comes along
> later to move it into the database. It doesn't really sound like you are
> using any features a database provides. (normalization, foreign keys,
> indexes, etc.)

Here's two ideas that I don't think have been mentioned yet: Use copy
to bulk load the data instead of individual imports. And if you get
desperate, you can run pg with fsync=false since you don't seem to
care about re-initializing your whole database in the case of
unexpected interruption.

 -Mike