Re: PG vs ElasticSearch for Logs - Mailing list pgsql-general

From Terry Schmitt
Subject Re: PG vs ElasticSearch for Logs
Date
Msg-id CAOOcyswtD_anQXPD6Ru9r-FRMU7dxii436CxbjO8qShAF9S6xw@mail.gmail.com
Whole thread Raw
In response to Re: PG vs ElasticSearch for Logs  (Andy Colson <andy@squeakycode.net>)
Responses Graylog  (Thomas Güttler <guettliml@thomas-guettler.de>)
List pgsql-general
Certainly Postgres is capable of handling this volume just fine. Throw in some partition rotation handling and you have a solution.
If you want to play with something different, check out Graylog, which is backed by Elasticsearch. A bit more work to set up than a single Postgres table, but it has ben a success for us storing, syslog, app logs, and Postgres logs from several hundred network devices, Windows and Linux servers. Rotation is handled based on your requirements and drilling down to the details is trivial. Alerting is baked in as well. It could well be overkill for your needs, but I don't know what your environment looks like.

T

On Mon, Aug 22, 2016 at 7:03 AM, Andy Colson <andy@squeakycode.net> wrote:
On 8/22/2016 2:39 AM, Thomas Güttler wrote:


Am 19.08.2016 um 19:59 schrieb Andy Colson:
On 8/19/2016 2:32 AM, Thomas Güttler wrote:
I want to store logs in a simple table.

Here my columns:

  Primary-key (auto generated)
  timestamp
  host
  service-on-host
  loglevel
  msg
  json (optional)

I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We don't have high traffic. About 200k rows per day.

My heart beats for postgres. We use it since several years.

On the other hand, the sentence "Don't store logs in a DB" is
somewhere in my head.....

What do you think?




I played with ElasticSearch a little, mostly because I wanted to use
Kibana which looks really pretty.  I dumped a ton
of logs into it, and made a pretty dashboard ... but in the end it
didn't really help me, and wasn't that useful.  My
problem is, I don't want to have to go look at it.  If something goes
bad, then I want an email alert, at which point
I'm going to go run top, and tail the logs.

Another problem I had with kibana/ES is the syntax to search stuff is
different than I'm used to.  It made it hard to
find stuff in kibana.

Right now, I have a perl script that reads apache logs and fires off
updates into PG to keep stats.  But its an hourly
summary, which the website turns around and queries the stats to show
pretty usage graphs.

You use Perl to read apache logs. Does this work?

Forwarding logs reliably is not easy. Logs are streams, files in unix
are not streams. Sooner or later
the files get rotated. RELP exists, but AFAIK it's usage is not wide
spread:

  https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol

Let's see how to get the logs into postgres ....

In the end, PG or ES, all depends on what you want.

Most of my logs start from a http request. I want a unique id per request
in every log line which gets created. This way I can trace the request,
even if its impact spans to several hosts and systems which do not
receive http requests.

Regards,
  Thomas Güttler



I don't read the file.  In apache.conf:

# v, countyia, ip, sess, ts, url, query, status
LogFormat "3,%{countyName}e,%a,%{VCSID}C,%{%Y-%m-%dT%H:%M:%S%z}t,\"%U\",\"%q\",%>s" csv3

CustomLog "|/usr/local/bin/statSender.pl -r 127.0.0.1" csv3

I think I read somewhere that if you pipe to a script (like above) and you dont read fast enough, it could slow apache down.  That's why the script above dumps do redis first.  That way I can move processes around, restart the database, etc, etc, and not break apache in any way.

The important part of the script:

while (my $x = <>)
{
        chomp($x);
        next unless ($x);
try_again:
        if ($redis)
        {
                eval {
                        $redis->lpush($qname, $x);
                };
                if ($@)
                {
                        $redis = redis_connect();
                        goto try_again;
                }
                # just silence this one
                eval {
                        $redis->ltrim($qname, 0, 1000);
                };
        }
}

Any other machine, or even multiple, then reads from redis and inserts into PG.

You can see, in my script, I trim the queue to 1000 items, but that's because I'm not as worried about loosing results.  Your setup would probably be different.  I also setup redis to not save anything to disk, again, because I don't mind if I loose a few hits here or there.  But you get the idea.

-Andy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Igor Neyman
Date:
Subject: Re: Foreign key against a partitioned table
Next
From: Adam Brusselback
Date:
Subject: Re: Foreign key against a partitioned table