Thread: CLUSTER

CLUSTER

From
Adam Kavan
Date:
I have a table that II am constantly inserting into (around 10 times a
second right now but hope to increase latter).  I hold these rows for a
week then summarize and delete them.  During that week I need to access
ranges of these rows based on a timestamp in each row set to now() when I
insert them.  I have this column indexed but if I let the system run after
a few weeks it can take a fairly long time to get information from this
table (for instance the avg() of inserted values in the last 5 minuets
could take over a minuet to calculate).  If I run CLUSTER on the index of
the time stamps my time to do this drops down to under 5 seconds again.  I
checked the plan being used by explain analyze and orginally it was always
doing seqscans so I set enable_seqscan = FALSE just before I run this
query, this forces it to use the index and speeds it up quite a bit.

Does anyone know a way I can reorder the database without doing a
CLUSTER?  It stops all insertions into the table and takes several minuets
during which time a large backlog builds up.  Or should I be using some
other method of speeding up the table?

--- Adam Kavan
--- akavan@cox.net


Re: CLUSTER

From
"Claudio Lapidus"
Date:
Hello Adam

I had a similar problem, although my table is much smaller, it keeps around
50K rows, I think if you are inserting 10 by second you have 10*3600*24*7,
that's about 6 million records. Also I'm assuming here that you also delete
constantly any record being more than one week old, is this correct or do
you delete all rows once a week and start over with the table empty?

Essentially what we do here is vacuum very frequently (every five minutes or
so), being very careful about FSM settings and run a VACUUM FULL and a
REINDEX once a week, that keeps the table performance within limits. We
never run CLUSTER for that.

We run PG 7.3.2, in any case if your version is below the 7.4 line then you
are exposed to suffer from index bloating, which is probably the cause of
your problem.

hth
cl.

----- Original Message -----
From: "Adam Kavan" <akavan@cox.net>
To: <pgsql-general@postgresql.org>
Sent: Thursday, October 23, 2003 2:38 PM
Subject: [GENERAL] CLUSTER


> I have a table that II am constantly inserting into (around 10 times a
> second right now but hope to increase latter).  I hold these rows for a
> week then summarize and delete them.  During that week I need to access
> ranges of these rows based on a timestamp in each row set to now() when I
> insert them.  I have this column indexed but if I let the system run after
> a few weeks it can take a fairly long time to get information from this
> table (for instance the avg() of inserted values in the last 5 minuets
> could take over a minuet to calculate).  If I run CLUSTER on the index of
> the time stamps my time to do this drops down to under 5 seconds again.  I
> checked the plan being used by explain analyze and orginally it was always
> doing seqscans so I set enable_seqscan = FALSE just before I run this
> query, this forces it to use the index and speeds it up quite a bit.
>
> Does anyone know a way I can reorder the database without doing a
> CLUSTER?  It stops all insertions into the table and takes several minuets
> during which time a large backlog builds up.  Or should I be using some
> other method of speeding up the table?
>
> --- Adam Kavan
> --- akavan@cox.net
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: CLUSTER

From
Adam Kavan
Date:
At 02:02 AM 10/24/03 -0300, Claudio Lapidus wrote:
>Hello Adam
>
>I had a similar problem, although my table is much smaller, it keeps around
>50K rows, I think if you are inserting 10 by second you have 10*3600*24*7,
>that's about 6 million records. Also I'm assuming here that you also delete
>constantly any record being more than one week old, is this correct or do
>you delete all rows once a week and start over with the table empty?

Every night at midnight I delete everything over 7 days old.


>Essentially what we do here is vacuum very frequently (every five minutes or
>so), being very careful about FSM settings and run a VACUUM FULL and a
>REINDEX once a week, that keeps the table performance within limits. We
>never run CLUSTER for that.

I'm running pg_autovac on my database, and 7.4 (updating to beta 5 today to
see if it improves things).

--- Adam Kavan
--- akavan@cox.net