Home > mailing lists

Need advice for handling big data in postgres - Mailing list pgsql-general

From	Tobias Fielitz
Subject	Need advice for handling big data in postgres
Date	January 14, 2015 03:42:56
Msg-id	54B5BB85.7060309@streethawk.com Whole thread Raw
Responses	Re: Need advice for handling big data in postgres
List	pgsql-general

Tree view

Hi,

I am quite new to handle a lot of data in Postgres and I would be happy
to get as much advice from experienced data jongleurs as possible.

THE SCENARIO:
I have a big table `client_log`. It is the "main gate" for clients (and
there is a lot of them) to talk to the backend.
Each client sends more or less log lines, resulting in the table being
written lots of times every second producing a lot of data. Here is the
table definition:
http://stackoverflow.com/questions/25542010/postgres-huge-table-with-delayed-read-and-write-access
*code* determines the nature of the log line. Example: code=20
determines a location log line (the important information will be in
*latitude* and *longitude*), code=8999 carries a comment that just needs
to get stored (in the *comment* field).

Just storing the data is useless, I need to look at it. At some log
lines quite regularly, at some just once, at others every now and then.
I need indexes on nearly every single column for quick access. So what
is the `best` way of storing the data?

OPTION 1 - PARTITIONING:
For each query only a few columns are interesting and I could partition
the table (as it was suggested on SO)
by *created* and by *code*.
There is roughly 10 different codes and I would keep data for the last
two months (partitioned by day). So I would end up having 10 * 60 = 600
partitions.
For every partition I could create indexes that are relevant to that
partition (examples: *created*, *created_on_server* or *latitude* and
*longitude*).

OPTION 2 - MULTIPLE TABLES:
I could create the tables myself: one for location log lines, one for
comment log lines etc. and store them via python in the correct table
(depending on *code*). Each of these tables would only have the columns
and indexes needed.

OUTCOME:
I expect partitioning to be faster because Postgres selects the correct
partition for me automatically. I can easily get rid of old data by
dropping the corresponding partition. The downside of the partition
approach is that all partitions inherit all columns of the master table
which is unnecessary (and consumes disc space?).

Which option should I go for?
Is there anything else that I haven't considered?

Any help and comments appreciated.

PS.: Does partitioning work trouble-free with master-slave replication?

Attachment

tobias.vcf

pgsql-general by date:

From: Robert DiFalco
Date: 14 January 2015, 02:52:39
Subject: Re: Simple Atomic Relationship Insert

From: Berend Tober
Date: 14 January 2015, 04:53:43
Subject: Re: Simple Atomic Relationship Insert

Need advice for handling big data in postgres - Mailing list pgsql-general

Attachment

Previous

Next