Re: PG vs ElasticSearch for Logs - Mailing list pgsql-general

From Andy Colson
Subject Re: PG vs ElasticSearch for Logs
Date
Msg-id 054bf95c-93ad-3ae0-2ad3-9652ed81a3af@squeakycode.net
Whole thread Raw
In response to Re: PG vs ElasticSearch for Logs  (Thomas Güttler <guettliml@thomas-guettler.de>)
Responses Re: PG vs ElasticSearch for Logs  (Terry Schmitt <terry.schmitt@gmail.com>)
List pgsql-general
On 8/22/2016 2:39 AM, Thomas Güttler wrote:
>
>
> Am 19.08.2016 um 19:59 schrieb Andy Colson:
>> On 8/19/2016 2:32 AM, Thomas Güttler wrote:
>>> I want to store logs in a simple table.
>>>
>>> Here my columns:
>>>
>>>   Primary-key (auto generated)
>>>   timestamp
>>>   host
>>>   service-on-host
>>>   loglevel
>>>   msg
>>>   json (optional)
>>>
>>> I am unsure which DB to choose: Postgres, ElasticSearch or ...?
>>>
>>> We don't have high traffic. About 200k rows per day.
>>>
>>> My heart beats for postgres. We use it since several years.
>>>
>>> On the other hand, the sentence "Don't store logs in a DB" is
>>> somewhere in my head.....
>>>
>>> What do you think?
>>>
>>>
>>>
>>
>> I played with ElasticSearch a little, mostly because I wanted to use
>> Kibana which looks really pretty.  I dumped a ton
>> of logs into it, and made a pretty dashboard ... but in the end it
>> didn't really help me, and wasn't that useful.  My
>> problem is, I don't want to have to go look at it.  If something goes
>> bad, then I want an email alert, at which point
>> I'm going to go run top, and tail the logs.
>>
>> Another problem I had with kibana/ES is the syntax to search stuff is
>> different than I'm used to.  It made it hard to
>> find stuff in kibana.
>>
>> Right now, I have a perl script that reads apache logs and fires off
>> updates into PG to keep stats.  But its an hourly
>> summary, which the website turns around and queries the stats to show
>> pretty usage graphs.
>
> You use Perl to read apache logs. Does this work?
>
> Forwarding logs reliably is not easy. Logs are streams, files in unix
> are not streams. Sooner or later
> the files get rotated. RELP exists, but AFAIK it's usage is not wide
> spread:
>
>   https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol
>
> Let's see how to get the logs into postgres ....
>
>> In the end, PG or ES, all depends on what you want.
>
> Most of my logs start from a http request. I want a unique id per request
> in every log line which gets created. This way I can trace the request,
> even if its impact spans to several hosts and systems which do not
> receive http requests.
>
> Regards,
>   Thomas Güttler
>
>

I don't read the file.  In apache.conf:

# v, countyia, ip, sess, ts, url, query, status
LogFormat
"3,%{countyName}e,%a,%{VCSID}C,%{%Y-%m-%dT%H:%M:%S%z}t,\"%U\",\"%q\",%>s"
csv3

CustomLog "|/usr/local/bin/statSender.pl -r 127.0.0.1" csv3

I think I read somewhere that if you pipe to a script (like above) and
you dont read fast enough, it could slow apache down.  That's why the
script above dumps do redis first.  That way I can move processes
around, restart the database, etc, etc, and not break apache in any way.

The important part of the script:

while (my $x = <>)
{
    chomp($x);
    next unless ($x);
try_again:
    if ($redis)
    {
        eval {
            $redis->lpush($qname, $x);
        };
        if ($@)
        {
            $redis = redis_connect();
            goto try_again;
        }
        # just silence this one
        eval {
            $redis->ltrim($qname, 0, 1000);
        };
    }
}

Any other machine, or even multiple, then reads from redis and inserts
into PG.

You can see, in my script, I trim the queue to 1000 items, but that's
because I'm not as worried about loosing results.  Your setup would
probably be different.  I also setup redis to not save anything to disk,
again, because I don't mind if I loose a few hits here or there.  But
you get the idea.

-Andy


pgsql-general by date:

Previous
From: Sameer Kumar
Date:
Subject: Re: PG vs ElasticSearch for Logs
Next
From: Kevin Grittner
Date:
Subject: Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed