Thread: Re: [PATCHES] Maintaining cluster order on insert

Re: [PATCHES] Maintaining cluster order on insert

From

Tom Lane

Date:

09 August 2006, 20:17:08

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> While thinking about index-organized-tables and similar ideas, it
> occurred to me that there's some low-hanging-fruit: maintaining cluster
> order on inserts by trying to place new heap tuples close to other
> similar tuples.

Doesn't this happen for free if there's enough space?  UPDATE tries to
place the new tuple on the same page it's already on.  In practice
people are only likely to cluster by primary keys (or other things that
seldom change) so I don't particularly agree with inventing a large wart
to support "optimally" placing things somewhere else...

            regards, tom lane

Re: [PATCHES] Maintaining cluster order on insert

From

"Jonah H. Harris"

Date:

09 August 2006, 20:47:38

On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> UPDATE tries to place the new tuple on the same page it's already
> on.

I think he meant for INSERT.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 2nd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/

Re: [PATCHES] Maintaining cluster order on insert

From

Gene

Date:

10 August 2006, 01:31:57

I have a table that inserts lots of rows (million+ per day) int8 as primary key, and I cluster by a timestamp which is approximately the timestamp of the insert beforehand and is therefore in increasing order and doesn't change. Most of the rows are updated about 3 times over time roughly within the next 30 minutes. Should I assume that that all of these updates will be on separate pages unless I perform a cluster (which takes a long time) and performance will suffer due to this? Is it possible to preallocate space on the same page for future updates (whatever the average number of updates may be per row) decreasing the number of page loads for querying?

Gene

On 8/9/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:

On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> UPDATE tries to place the new tuple on the same page it's already
> on.

I think he meant for INSERT.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 2nd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

--
Eugene Hart

Re: [PATCHES] Maintaining cluster order on insert

From

Tom Lane

Date:

10 August 2006, 02:50:06

Gene <genekhart@gmail.com> writes:
> I have a table that inserts lots of rows (million+ per day) int8 as primary
> key, and I cluster by a timestamp which is approximately the timestamp of
> the insert beforehand and is therefore in increasing order and doesn't
> change. Most of the rows are updated about 3 times over time roughly within
> the next 30 minutes.

ISTM you should hardly need to worry about clustering that --- the data
will be in timestamp order pretty naturally.

The main problem you're going to have is the update-3-times bit.  You
could keep updated rows on the same page as the original if you ran the
table at fillfactor 25% (which you'll be able to do in 8.2) ... but
while this might be sane for the leading edge of the table, you hardly
want such low storage density in the stable part.

You could reduce the fillfactor requirement if you could vacuum the
table constantly (every 10 minutes or so) but I assume the table is
large enough to make that unattractive.  (Eventually we should have
a version of vacuum that understands where the dirty stuff is, which
might make this approach tenable ... but not in 8.2.)

Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable.  Storage density in the "fresh"
part would be poor, but it should be small enough you don't care.

            regards, tom lane

Re: [PATCHES] Maintaining cluster order on insert

From

Gene

Date:

10 August 2006, 05:18:07

You are correct the main part I'm worried about is the updates, being so far from the originals. fyi I am partitioning the tables by the timestamp column,vacuum analyzing once per hour, creating one child partition per day in a cron job. Because I'm using hibernate for database abstraction (stateless sessions), I can only have one RULE since having more than one insert rule will not return the correct number of updated rows which confuses hibernate. The one rule just directs inserts to the latest child partition which seems to work well.

The reason I'm doing the clustering is I was hoping that with the "stable" non-updating partitions I could execute a CLUSTER at night (slow...) and it would compact the tables into their most efficient state for querying which always involves a date range. bad idea? In this fillfactor feature, will you be able to set it to 100% once you know that no more updates will occur? Or will doing a cluster effectively do this? Will the fill factor only apply for inserts?

"Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care."

This sounds interesting, I could create a RULE/INSERT on the unstable table, I will know during the update if it is ready to be put in the stable table. What would be an efficient way to do the transfer? Since the updates occur somewhat randomly, wouldnt the tuples in the stable table then be out of natural timestamp order?

thanks for all of your help and comments! it is greatly appreciated!
Gene Hart

On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us > wrote:

Gene <genekhart@gmail.com> writes:
> I have a table that inserts lots of rows (million+ per day) int8 as primary
> key, and I cluster by a timestamp which is approximately the timestamp of
> the insert beforehand and is therefore in increasing order and doesn't
> change. Most of the rows are updated about 3 times over time roughly within
> the next 30 minutes.

ISTM you should hardly need to worry about clustering that --- the data
will be in timestamp order pretty naturally.

The main problem you're going to have is the update-3-times bit.  You
could keep updated rows on the same page as the original if you ran the
table at fillfactor 25% (which you'll be able to do in 8.2) ... but
while this might be sane for the leading edge of the table, you hardly
want such low storage density in the stable part.

You could reduce the fillfactor requirement if you could vacuum the
table constantly (every 10 minutes or so) but I assume the table is
large enough to make that unattractive.  (Eventually we should have
a version of vacuum that understands where the dirty stuff is, which
might make this approach tenable ... but not in 8.2.)

Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable.  Storage density in the "fresh"
part would be poor, but it should be small enough you don't care.

                        regards, tom lane

--
Eugene Hart

Re: [PATCHES] Maintaining cluster order on insert

From

Heikki Linnakangas

Date:

10 August 2006, 08:24:38

Gene wrote:
> You are correct the main part I'm worried about is the updates, being 
> so far from the originals.

Yeah, you won't benefit from the patch at all.
> The reason I'm doing the clustering is I was hoping that with the 
> "stable" non-updating partitions I could execute a CLUSTER at night 
> (slow...) and it would compact the tables into their most efficient 
> state for querying which always involves a date range. bad idea? In 
> this fillfactor feature, will you be able to set it to 100% once you 
> know that no more updates will occur? Or will doing a cluster 
> effectively do this? Will the fill factor only apply for inserts?

That sounds like a good approach. CLUSTER obeys the fillfactor, so 
you'll want to set it to 100 for the older partitions before you CLUSTER.

You might want to experiment with the fillfactor. You might get the best 
performance if you just set it to 100 even for the latest partition, if 
your queries usually have to scan most of it anyway. Fillfactor 100 will 
help to keep it dense and in memory, so it won't matter so much if it's 
disorganized.
> "Your best bet might be to partition the table into two subtables, one
> with "stable" data and one with the fresh data, and transfer rows from
> one to the other once they get stable.  Storage density in the "fresh"
> part would be poor, but it should be small enough you don't care."
>
> This sounds interesting, I could create a RULE/INSERT on the unstable 
> table, I will know during the update if it is ready to be put in the 
> stable table. What would be an efficient way to do the transfer? Since 
> the updates occur somewhat randomly, wouldnt the tuples in the stable 
> table then be out of natural timestamp order?
I'm not sure I understand the last sentence. I thought the updates 
usually occur within 30 minutes of the insert. So if you transfer the 
rows to the stable table after 30 minutes, there won't be updates to the 
stable table.

- Heikki

Re: [PATCHES] Maintaining cluster order on insert

From

Heikki Linnakangas

Date:

10 August 2006, 08:24:40

Jonah H. Harris wrote:
> On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> UPDATE tries to place the new tuple on the same page it's already
>> on.
>
> I think he meant for INSERT.
>

Right. Update is indeed taken care of already.

One example where this would help would be a customer_history table that
stores all transactions of a customer. Primary key is (customer_id,
transaction_id). You would want to keep the table clustered by
customer_id to make it quick to retrieve all transactions of a customer.
In general, any table with more or less random insert/delete activity
that you want to keep in order would benefit.

- Heikki

Re: [PATCHES] Maintaining cluster order on insert

From

stark

Date:

10 August 2006, 17:27:19

Gene <genekhart@gmail.com> writes:

> "Your best bet might be to partition the table into two subtables, one
> with "stable" data and one with the fresh data, and transfer rows from
> one to the other once they get stable.  Storage density in the "fresh"
> part would be poor, but it should be small enough you don't care."
>
> This sounds interesting, I could create a RULE/INSERT on the unstable table,
> I will know during the update if it is ready to be put in the stable table.
> What would be an efficient way to do the transfer? Since the updates occur
> somewhat randomly, wouldnt the tuples in the stable table then be out of
> natural timestamp order?

You may find it easier to handle some of the logic in a low level application
layer or layer of stored procedures rather than trying to make it entirely
transparent with rules. If you do want it to be transparent you might also
consider whether you want triggers instead of rules.

Another direction you might want to consider is whether the columns that
you're updating would be more normalized in a separate table. You might really
want to have a record of those past states as well. So you might find having
three records in this other table for each of your regular records in your
main table might actually work out better.

Even if you only have a 1-1 relationship sometimes this kind of horizontal
partitioning (or do people consider this vertical partitioning?) is still
worthwhile. If the columns being updated are very small or often not needed at
all then it may be reasonably efficient to look them up separately and still
let you store the bulk of the data efficiently and access it in a fast
sequential scan.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com