Re: Syslog to postgresql - need peaks of 5.000/s - Mailing list pgsql-admin
From | paul rivers |
---|---|
Subject | Re: Syslog to postgresql - need peaks of 5.000/s |
Date | |
Msg-id | 479BB8F0.9090508@gmail.com Whole thread Raw |
In response to | Syslog to postgresql - need peaks of 5.000/s ("Michael Monnerie" <michael.monnerie@it-management.at>) |
Responses |
Re: Syslog to postgresql - need peaks of 5.000/s
|
List | pgsql-admin |
Michael Monnerie wrote: > Dear list, > > I shall build up a central syslog server able to do 5.000 log lines per > second peak, 1.000 sustained. I found some pages describing current > syslog-ng with magic "log to file then insert": > http://ben.muppethouse.com/SYSLOG-DOC.html > http://www.kdough.net/docs/syslog_postgresql/ > > But according to syslog, they will release the "direct syslog to postgresql" > feature soon: > http://www.balabit.com/network-security/syslog-ng/ > > Anyway, I'd like to know if anybody has a syslog-ng with postgresql that > does up to 5000 inserts/s, and what hardware I need for that. Of course, > some selects for statistical purposes must also be possible, so not only > writes here. Any ideas? > > mfg zmi > Hi Michael, I do something like this with syslog-ng logs. The processing looks something like this: - Today, about 20 different devices feed the same kind of logs to a central syslog server. - For each device,a parse-and-load (p&l) daemon is running that parses data elements out of the log and then stores the parsed data plus a copy of the original log line to the db. The parsing is probably less trivial than one would guess (much more complex and varied than say web server logs, for example). - This p&l daemon is safely following the tail of the file, and so this parsing and loading is happening in near real time, where each log line turns into an insert. - For a given <device>/<year>/<month>/<day> file, the p&l of course follows to <day+1> when the next day's file appears. - Everything is of course transactional with a status table in the db tracking where it is, so if a p&l is shutdown for whatever reason, it knows what line in the log to pick up and start at again. There are config options to commit every so many log lines or every x seconds and that sort of thing. Using the same libraries I wrote for the p&l daemon, there is a bulk load utility to take a raw syslog file, parse and load it. The main difference between this bulk utility and the p&l is the bulk utility completely parses the syslog file and generates an intermediate file it then does a COPY to load in one fell swoop. There is some selects done on this throughout the day, but that's hard to quantify -- it varies a lot depending on need. On a nightly basis, though, we end up touching the entire day's input to generate summary data that allows us to do rapid trend analysis. In addition to some ad-hoc access, a slew of reports run periodically against both the raw and rollup data. There has been talk of revising the rollup process, so it can happen more or less continuously throughout the day. No work has been done, but I believe it's entirely feasible. We also use table partitioning. There is a partition for each day + device. This dramatically help performance, so long as you're very up on the documented caveats of Pg's partitioning. I did extensive metrics on what our throughput was for both the p&l process and the bulk-loader. In our case, the p&l seemed to peak around 1,600 lines/sec and the bulk loader around 5,500 lines/sec for the whole process: read, parse, load and commit the data to the db. For each additional p&l daemon running on the same host, we lose a small percentage of peak (same for the bulk processor). The machine is absolutely nothing special: a 32 bit dell blade, dual core 2 cpu machine with 4gb running RHEL4 with postgres 8.2.6. The disk is SAN-attached to a hitachi 9990, but that's because that's what where I work insists on for any box. This box is running everything -- receiving the logs, parsing and loading them, and running the db. [There is, incidentally, a redundant machine receiving the same logs and doing the same thing.] I'm sorry I don't know offhand how many spindles in the SAN back the machine's filesystems, but we're not talking dozens. This machine is really considered nothing special. This isn't even the only thing this machine is doing. Also, the parsing of our files adds a significant amount of time. If you are doing minimal parsing and more or less bulk-loading the data, I would expect you to see far better numbers than the 1,600 l/s and 5,500 l/s. Our system has been running for about a year and a half, and postgres has always been rock solid. I just ran a query to see we processed about 1.5 trillion log lines last month. That works out to an average of around 560 lines/s or so, which is less than what you're targeting. But if rates doubled (either if we doubled the number of tracked devices or the existing doubled their volumes), I'm confident there would be no problems. Were I in your shoes, I would start collecting the syslogs immediately, and then prototype something up, so you can profile your process end to end, think about what you do if you ever need to catch up the loading [we never have needed to, but we know we can, which is great], and so forth. I think a lot will be determined by how much you need to process the logs before loading, whether your loads are going to be via insert or copy, and how many logfiles will you be feeding the db at the same time. Hardware and all that will come into play, but until you answer these questions first, it's going to be hard to say more about it. Anyway, that's my experience with this, hope it helps. Regards, Paul
pgsql-admin by date: