Hi,
On 13 October 2012 01:44, Chitra Creta <chitracreta@gmail.com> wrote:
> I currently have a table that is growing very quickly - i.e 7 million
> records in 5 days. This table acts as a placeholder for statistics, and
> hence the records are merely inserted and never updated or deleted.
>
> Many queries are run on this table to obtain trend analysis. However, these
> queries are now starting to take a very long time (hours) to execute due to
> the size of the table.
Have you changed autovacuum settings to make it more agressive?
Another options is to run analyse after loading.
>
> I have put indexes on this table, to no significant benefit. Some of the
> other strategies I have thought of:
> 1. Purge old data
> 3. Partition
Those two go together. Partitioning is useful if you can constrain
queries to specific ranges ie. this query needs last two days. You
shouldn't go over 200 - 300 partitions per table. Partition
granularity should be the same as the amount of data in average query.
if you run weekly queries then use weekly partitions (bi-weekly or
daily partitions might work but I do not have good experience).
It is easy to purge old data because you need to drop unwanted
partitions (no table/index bloat). Loading is little bit tricky
becuase you have load data into right partition.
> 4. Creation of daily, monthly, yearly summary tables that contains
> aggregated data specific to the statistics required
I think this way to the hell. You start with few tables and then you
add more tables until you realise that it takes longer to update them
then run your queries :)
You might benefit from query parallelisation, for example, pgpool-II,
Stado, Postgres XC or do not use Postgres at all. For example, any
column oriented NoSQL database might be good choice.
--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)
(http://www.linkedin.com/in/ondrejivanic)