Thread: best way to storing logs

best way to storing logs

From
Ibrahim Edib Kokdemir
Date:
Hi,
In our environment, we are logging "all" statements because of the security considerations (future auditing if necessary). But the system is very big and produces 100GB logs for an hour and we expect that this will be much more. We are having trouble to find the disk for this amount of data.

Now, we are considering one of the following paths:
- deduped file system for all logs. 
- parsing useful lines with syslog server or pgaudit. And are there any drawbacks for using remote syslog for the logs?

What should be the right path for that?

Regards,
Ibrahim.

Re: best way to storing logs

From
PT
Date:
On Tue, 30 Jan 2018 00:38:02 +0300
Ibrahim Edib Kokdemir <kokdemir@gmail.com> wrote:

> Hi,
> In our environment, we are logging "all" statements because of the security
> considerations (future auditing if necessary). But the system is very big
> and produces 100GB logs for an hour and we expect that this will be much
> more. We are having trouble to find the disk for this amount of data.
> 
> Now, we are considering one of the following paths:
> - deduped file system for all logs.
> - parsing useful lines with syslog server or pgaudit. And are there any
> drawbacks for using remote syslog for the logs?
> 
> What should be the right path for that?

I did something like this quite recently. Here's what I learned:

rsyslog works fine. It can handle quite a bit of traffic with no problem.
There is theoretical data loss if you use UDP, so if you need true
auditability, be sure ot use TCP. We had a single dedicated logging
server receiving TONS of log data from multiple Postgres servers and
never had a problem with performance.

Rotate the logs often. Many smaller logs is easier to manage than fewer
large ones. We were rotating once an hour, and the typical size
was tens of G to about 100G per hour.

Compress older logs. Everyone rants about these newer compression algorithms,
like bzip2, but remember that they save a little more space at the cost of
a lot more processing time. gzip is still the way to go, if you ask me, but
YMMV.

After logs got a week old we shipped them off to AWS Glacier storage.
It was a pretty cost effective way to offload the problem of ever-increasing
storage requirements. YMMV on that as well.

We didn't bother to trim out any data from the logs, so I can't say how
much that would or wouldn't help.

-- 
Bill Moran