Re: how to handle a big table for data log - Mailing list pgsql-performance

From kuopo
Subject Re: how to handle a big table for data log
Date
Msg-id AANLkTilcCUP2ohhotmvoOoDlp2j1OiqwcZusEvYjjfzS@mail.gmail.com
Whole thread Raw
In response to Re: how to handle a big table for data log  ("Jorge Montero" <jorge_montero@homedecorators.com>)
Responses Re: how to handle a big table for data log
Re: how to handle a big table for data log
List pgsql-performance
Let me make my problem clearer. Here is a requirement to log data from a set of objects consistently. For example, the object maybe a mobile phone and it will report its location every 30s. To record its historical trace, I create a table like
CREATE TABLE log_table
(
  id integer NOT NULL,
 data_type integer NOT NULL,
 data_value double precision,
 ts timestamp with time zone NOT NULL,
 CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
)
;
In my location log example, the field data_type could be longitude or latitude.

I create a primary key (id, data_type, ts) to make my queries more efficient. The major type of queries would ask the latest data_value of a data_type by given id and timestamp. For this kind of query, I make the following SQL statement
SELECT * FROM log_table WHERE id=[given id] and data_type='longitude' and (ts = (SELECT max(ts) FROM log_table WHERE id=[given id]and data_type='longitude' and ts<=[given timestamp]));
According to my evaluation, its performance is acceptable.

However, I concern more about the performance of insert operation. As I have mentioned, the log_table is growing so I decide to partition it. Currently, I partition it by date and only keep it 60 days. This partition is helpful. But when I partition it by data_type (in my case, the number of data_type is limited, say 10), the performance of insert operation will be degraded. I guess this is caused by multiple vacuum/analyze on these partitioned data_type log tables. However, if I put all data_type logs together, I can expect that the performance of insert operation will also have degradation if I want to expand the system to support more mobile phones or more data_type.

This is my current situation. Please give me some hints to improve the performance (especially for the insert part).


kuopo.


On Mon, Jul 19, 2010 at 11:37 PM, Jorge Montero <jorge_montero@homedecorators.com> wrote:
> Large tables, by themselves, are not necessarily a problem. The problem is
> what you might be trying to do with them. Depending on the operations you
> are trying to do, partitioning the table might help performance or make it
> worse.
>  
> What kind of queries are you running? How many days of history are you
> keeping? Could you post an explain analyze output of a query that is being
> problematic?
> Given the amount of data you hint about, your server configuration, and
> custom statistic targets for the big tables in question would be useful.
>
>>>> kuopo <spkuo@cs.nctu.edu.tw> 7/19/2010 1:27 AM >>>
> Hi,
>
> I have a situation to handle a log table which would accumulate a
> large amount of logs. This table only involves insert and query
> operations. To limit the table size, I tried to split this table by
> date. However, the number of the logs is still large (46 million
> records per day). To further limit its size, I tried to split this log
> table by log type. However, this action does not improve the
> performance. It is much slower than the big table solution. I guess
> this is because I need to pay more cost on the auto-vacuum/analyze for
> all split tables.
>
> Can anyone comment on this situation? Thanks in advance.
>
>
> kuopo.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Linux Filesystems again - Ubuntu this time
Next
From: Hannu Krosing
Date:
Subject: Re: Pooling in Core WAS: Need help in performance tuning.