Log Data Analytics : Confused about the choice of Database - Mailing list pgsql-general

From Peeyush Agarwal
Subject Log Data Analytics : Confused about the choice of Database
Date
Msg-id CA+-6-YJs_OQ=jqWZgy_Q-t2zVUEaGDOb-EH8W5BANOfPg3T-tw@mail.gmail.com
Whole thread Raw
Responses Re: Log Data Analytics : Confused about the choice of Database  (Dorian Hoxha <dorian.hoxha@gmail.com>)
List pgsql-general
Hi,

I have log data of the following format:


Session    Timestamp    Event                Parameters
1          1            Started Session      
1          2            Logged In            Username:"user1"
2          3            Started Session
1          3            Started Challenge    title:"Challenge 1", level:"2"
2          4            Logged In            Username:"user2"
Now, a person wants to carry out analytics on this log data (And would like to receive it as a JSON blob after appropriate transformations). For example, he may want to receive a JSON blob where the Log Data is grouped by Session and TimeFromSessionStart and CountOfEvents are added before the data is sent so that he can carry out meaningful analysis. Here I should return:


[ {   "session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd Challenge", "title":"Challenge 1", "level":"2" }] }, {    "session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session     Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}]  }
]

Here, TimeFromSessionStartCountOfEvents etc. [Let's call it synthetic additional data] will not be hard coded and I will make a web interface to allow the person to decide what kind of synthetic data he requires in the JSON blob. I would like to provide a good amount of flexibility to the person to decide what kind of synthetic data he wants in the JSON blob.

If I use PostgreSQL, I can store the data in the following manner: Session and Event can be stringTimestamp can be date and Parameters can be hstore(key value pairs available in PostgreSQL). After that, I can use SQL queries to compute the synthetic (or additional) data, store it temporarily in variables in a Rails Application (which will interact with PostgreSQL database and act as interface for the person who wants the JSON blob) and create JSON blob from it.

However I am not sure if PostgreSQL is the best choice for this use case. I have put the detailed question on SO at http://stackoverflow.com/questions/23544604/log-data-analytics

Looking for some help from the community.

Peeyush Agarwal

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Question about synchronous replication
Next
From: Borodin Vladimir
Date:
Subject: Re: Question about synchronous replication