Thread: fine tuning for logging server
Hardware: relatively modern Intel CPU, OS and database each on its own IDE hard-drive (separate IDE cables). Enough memory, i think, but i can't add too much (not beyond 1GB). Software: Linux-2.6, pgsql-8.0.1 Function: Essentially a logging server. There are two applications (like syslog) on the same box that are logging to pgsql, each one to its own database. There are a few tables in one DB, and exactly one table in the other. Most of the time, the apps are just doing mindless INSERTs to the DB. Every now and then, an admin performs some SELECTs via a PHP interface. Objective: Make the DB as fast as possible. Of course i'd like the SELECTs to be fast, but the INSERTs take precedence. It's gotta be able to swallow as many messages per second as possible given the hardware. Question: What are the pgsql parameters that need to be tweaked? What are the guidelines for such a situation? -- Florin Andrei http://florin.myip.org/
Florin Andrei wrote: >Hardware: relatively modern Intel CPU, OS and database each on its own >IDE hard-drive (separate IDE cables). Enough memory, i think, but i >can't add too much (not beyond 1GB). >Software: Linux-2.6, pgsql-8.0.1 > >Function: Essentially a logging server. There are two applications (like >syslog) on the same box that are logging to pgsql, each one to its own >database. There are a few tables in one DB, and exactly one table in the >other. >Most of the time, the apps are just doing mindless INSERTs to the DB. >Every now and then, an admin performs some SELECTs via a PHP interface. > >Objective: Make the DB as fast as possible. Of course i'd like the >SELECTs to be fast, but the INSERTs take precedence. It's gotta be able >to swallow as many messages per second as possible given the hardware. > >Question: What are the pgsql parameters that need to be tweaked? What >are the guidelines for such a situation? > > > Put pg_xlog onto the same drive as the OS, not the drive with the database. Do as many inserts per transaction that you can get away with. 100-1000 is pretty good. Keep the number of indexes and foreign key references low to keep INSERTS fast. Keep a few indexes around to keep SELECTs reasonable speedy. If you are doing lots and lots of logging, need only archival and slow access for old data, but fast access on new data, consider partitioning your table, and then using a view to join them back together. If you are only having a couple processing accessing the db at any given time, you can probably increase work_mem and maintenance_work_mem a bit. If you have 1G ram, maybe around 50M for work_mem. But really this is only if you have 1-3 selects going on at a time. With 2 disks, and fixed hardware, it's a lot more about configuring your schema and the application. If you want more performance, adding more disks is probably the first thing to do. John =:->
Attachment
On Wed, 2005-03-30 at 17:50 -0800, Florin Andrei wrote: > Function: Essentially a logging server. There are two applications (like > syslog) on the same box that are logging to pgsql, each one to its own > database. There are a few tables in one DB, and exactly one table in the > other. > Most of the time, the apps are just doing mindless INSERTs to the DB. > Every now and then, an admin performs some SELECTs via a PHP interface. For performance reasons, i was thinking to keep the tables append-only, and simply rotate them out every so often (daily?) and delete those tables that are too old. Is that a good idea? -- Florin Andrei http://florin.myip.org/
Florin Andrei wrote: >On Wed, 2005-03-30 at 17:50 -0800, Florin Andrei wrote: > > > >>Function: Essentially a logging server. There are two applications (like >>syslog) on the same box that are logging to pgsql, each one to its own >>database. There are a few tables in one DB, and exactly one table in the >>other. >>Most of the time, the apps are just doing mindless INSERTs to the DB. >>Every now and then, an admin performs some SELECTs via a PHP interface. >> >> > >For performance reasons, i was thinking to keep the tables append-only, >and simply rotate them out every so often (daily?) and delete those >tables that are too old. Is that a good idea? > > > If you aren't doing updates, then I'm pretty sure the data stays packed pretty well. I don't know that you need daily rotations, but you certainly could consider some sort of rotation schedule. The biggest performance improvement, though, is probably to group inserts into transactions. I had an application (in a different db, but it should be relevant), where using a transaction changed the time from 6min -> 6 sec. It was just thrashing on all the little inserts that it had to fsync to disk. How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s? I think the hardware should be capable of the 10-100 range if things are properly configured. Naturally that depends on all sorts of factors, but it should give you an idea. John =:->
Attachment
On Wed, 2005-03-30 at 19:59 -0600, John Arbash Meinel wrote: > Put pg_xlog onto the same drive as the OS, not the drive with the database. I forgot to mention: the OS drive is purposefully made very slow - the write cache is turned off and the FS is Ext3 with data=journal. Is then still ok to put pg_xlog on it? The reason: if the power cord is yanked, the OS _must_ boot back up in good condition. If the DB is corrupted, whatever, nuke it then re- initialize it. But the OS must survive act-of-god events. No, there is no uninterruptible power supply. It sucks, but that's how it is. I cannot change that. -- Florin Andrei http://florin.myip.org/
On Wed, 2005-03-30 at 20:11 -0600, John Arbash Meinel wrote: > Florin Andrei wrote: > > > >For performance reasons, i was thinking to keep the tables append-only, > >and simply rotate them out every so often (daily?) and delete those > >tables that are too old. Is that a good idea? > > > If you aren't doing updates, then I'm pretty sure the data stays packed > pretty well. I don't know that you need daily rotations, but you > certainly could consider some sort of rotation schedule. (sorry for re-asking, i'm coming from a mysql mindset and i still have a lot to learn about pgsql) So, it is indeed a bad idea to delete rows from tables, right? Better just rotate to preserve the performance. Daily rotation may simplify the application logic - then i'll know that each table is one day's worth of data. > The biggest performance improvement, though, is probably to group > inserts into transactions. Yes, i know that. I have little control over the apps, though. I'll see what i can do. > How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s? More is better. <shrug> I guess i'll put it together and give it a spin and see just how far it goes. I actually have some controls over the data that's being sent (in some places i can limit the number of events/second), so that might save me right there. -- Florin Andrei http://florin.myip.org/
Florin Andrei wrote: >On Wed, 2005-03-30 at 19:59 -0600, John Arbash Meinel wrote: > > > >>Put pg_xlog onto the same drive as the OS, not the drive with the database. >> >> > >I forgot to mention: the OS drive is purposefully made very slow - the >write cache is turned off and the FS is Ext3 with data=journal. Is then >still ok to put pg_xlog on it? > >The reason: if the power cord is yanked, the OS _must_ boot back up in >good condition. If the DB is corrupted, whatever, nuke it then re- >initialize it. But the OS must survive act-of-god events. > >No, there is no uninterruptible power supply. It sucks, but that's how >it is. I cannot change that. > > > You don't want write cache for pg_xlog either. And you could always create a second partition that used reiserfs, or something like that. If you have to survine "act-of-god" you probably should consider making the system into a RAID1 instead of 2 separate drives (software RAID should be fine). 'Cause a much worse act-of-god is having a drive crash. No matter what you do in software, a failed platter will prevent you from booting. RAID 1 at least means 2 drives have to die. If you need insert speed, and can't do custom transactions at the application side, you could try creating a RAM disk for the insert table, and then create a cron job that bulk pulls it out of that table and inserts it into the rest of the system. That should let you get a super-fast insert speed, and the bulk copies should stay reasonably fast. Just realize that if your cron job stops running, your machine will slowly eat up all of it's ram, and really not play nice. I think adding an extra hard-drive is probably the best way to boost performance and reliability, but if you have a $0 budget, this is a possibility. John =:->
Attachment
Florin Andrei wrote: >On Wed, 2005-03-30 at 20:11 -0600, John Arbash Meinel wrote: > > >>Florin Andrei wrote: >> >> >>>For performance reasons, i was thinking to keep the tables append-only, >>>and simply rotate them out every so often (daily?) and delete those >>>tables that are too old. Is that a good idea? >>> >>> >>> >>If you aren't doing updates, then I'm pretty sure the data stays packed >>pretty well. I don't know that you need daily rotations, but you >>certainly could consider some sort of rotation schedule. >> >> > >(sorry for re-asking, i'm coming from a mysql mindset and i still have a >lot to learn about pgsql) > >So, it is indeed a bad idea to delete rows from tables, right? Better >just rotate to preserve the performance. > > The only problems are if you get a lot of old tuples in places you don't want them. If you are always appending new values that are increasing, and you are deleting from the other side, I think vacuum will do a fine job at cleaning up. It's deleting/updating every 3rd entry that starts to cause holes (though probably vacuum still does a pretty good job). >Daily rotation may simplify the application logic - then i'll know that >each table is one day's worth of data. > > > I don't think it is necessary, but if you like it, go for it. I would tend to think that you would want a "today" table, and a "everything else" table, as it simplifies your queries, and lets you have foreign keys (though if you are from mysql, you may not be used to using them.) >>The biggest performance improvement, though, is probably to group >>inserts into transactions. >> >> > >Yes, i know that. I have little control over the apps, though. I'll see >what i can do. > > You could always add a layer inbetween. Or look at my mention of a fast temp table, with a periodic cron job to pull in the new data. You can run cron as fast as 1/min which might be just right depending on your needs. It also means that you could ignore foreign keys and indexes on the temp table, and only evaluate them on the main table. > > >>How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s? >> >> > >More is better. <shrug> >I guess i'll put it together and give it a spin and see just how far it >goes. > >I actually have some controls over the data that's being sent (in some >places i can limit the number of events/second), so that might save me >right there. > > > Good luck. And remember, tuning your queries can be just as important. (Though if you are doing append only inserts, there probably isn't much that you can do). If all you are doing is append only logging, the fastest thing is probably just a flat file. You could have something that comes along later to move it into the database. It doesn't really sound like you are using any features a database provides. (normalization, foreign keys, indexes, etc.) John =:->
Attachment
> The reason: if the power cord is yanked, the OS _must_ boot back up in > good condition. If the DB is corrupted, whatever, nuke it then re- > initialize it. But the OS must survive act-of-god events. Well, in that case : - Use reiserfs3 for your disks - Use MySQL with MyISAM tables
On Wed, Mar 30, 2005 at 08:41:43PM -0600, John Arbash Meinel wrote: > If all you are doing is append only logging, the fastest thing is > probably just a flat file. You could have something that comes along > later to move it into the database. It doesn't really sound like you are > using any features a database provides. (normalization, foreign keys, > indexes, etc.) Here's two ideas that I don't think have been mentioned yet: Use copy to bulk load the data instead of individual imports. And if you get desperate, you can run pg with fsync=false since you don't seem to care about re-initializing your whole database in the case of unexpected interruption. -Mike