Need advice for handling big data in postgres - Mailing list pgsql-general

From Tobias Fielitz
Subject Need advice for handling big data in postgres
Date
Msg-id 54B5BB85.7060309@streethawk.com
Whole thread Raw
Responses Re: Need advice for handling big data in postgres  (Vincent Veyron <vv.lists@wanadoo.fr>)
List pgsql-general
Hi,

I am quite new to handle a lot of data in Postgres and I would be happy
to get as much advice from experienced data jongleurs as possible.

THE SCENARIO:
I have a big table `client_log`. It is the "main gate" for clients (and
there is a lot of them) to talk to the backend.
Each client sends more or less log lines, resulting in the table being
written lots of times every second producing a lot of data. Here is the
table definition:
http://stackoverflow.com/questions/25542010/postgres-huge-table-with-delayed-read-and-write-access
*code* determines the nature of the log line. Example: code=20
determines a location log line (the important information will be in
*latitude* and *longitude*), code=8999 carries a comment that just needs
to get stored (in the *comment* field).

Just storing the data is useless, I need to look at it. At some log
lines quite regularly, at some just once, at others every now and then.
I need indexes on nearly every single column for quick access. So what
is the `best` way of storing the data?

OPTION 1 - PARTITIONING:
For each query only a few columns are interesting and I could partition
the table (as it was suggested on SO)
by *created* and by *code*.
There is roughly 10 different codes and I would keep data for the last
two months (partitioned by day). So I would end up having 10 * 60 = 600
partitions.
For every partition I could create indexes that are relevant to that
partition (examples: *created*, *created_on_server* or *latitude* and
*longitude*).

OPTION 2 - MULTIPLE TABLES:
I could create the tables myself: one for location log lines, one for
comment log lines etc. and store them via python in the correct table
(depending on *code*). Each of these tables would only have the columns
and indexes needed.

OUTCOME:
I expect partitioning to be faster because Postgres selects the correct
partition for me automatically. I can easily get rid of old data by
dropping the corresponding partition. The downside of the partition
approach is that all partitions inherit all columns of the master table
which is unnecessary (and consumes disc space?).

Which option should I go for?
Is there anything else that I haven't considered?

Any help and comments appreciated.

PS.: Does partitioning work trouble-free with master-slave replication?

Attachment

pgsql-general by date:

Previous
From: Robert DiFalco
Date:
Subject: Re: Simple Atomic Relationship Insert
Next
From: Berend Tober
Date:
Subject: Re: Simple Atomic Relationship Insert