Thread: Re: [PATCHES] Maintaining cluster order on insert
Heikki Linnakangas <heikki@enterprisedb.com> writes: > While thinking about index-organized-tables and similar ideas, it > occurred to me that there's some low-hanging-fruit: maintaining cluster > order on inserts by trying to place new heap tuples close to other > similar tuples. Doesn't this happen for free if there's enough space? UPDATE tries to place the new tuple on the same page it's already on. In practice people are only likely to cluster by primary keys (or other things that seldom change) so I don't particularly agree with inventing a large wart to support "optimally" placing things somewhere else... regards, tom lane
On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > UPDATE tries to place the new tuple on the same page it's already > on. I think he meant for INSERT. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
I have a table that inserts lots of rows (million+ per day) int8 as primary key, and I cluster by a timestamp which is approximately the timestamp of the insert beforehand and is therefore in increasing order and doesn't change. Most of the rows are updated about 3 times over time roughly within the next 30 minutes. Should I assume that that all of these updates will be on separate pages unless I perform a cluster (which takes a long time) and performance will suffer due to this? Is it possible to preallocate space on the same page for future updates (whatever the average number of updates may be per row) decreasing the number of page loads for querying?
Gene
--
Eugene Hart
Gene
On 8/9/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:
On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> UPDATE tries to place the new tuple on the same page it's already
> on.
I think he meant for INSERT.
--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Eugene Hart
Gene <genekhart@gmail.com> writes: > I have a table that inserts lots of rows (million+ per day) int8 as primary > key, and I cluster by a timestamp which is approximately the timestamp of > the insert beforehand and is therefore in increasing order and doesn't > change. Most of the rows are updated about 3 times over time roughly within > the next 30 minutes. ISTM you should hardly need to worry about clustering that --- the data will be in timestamp order pretty naturally. The main problem you're going to have is the update-3-times bit. You could keep updated rows on the same page as the original if you ran the table at fillfactor 25% (which you'll be able to do in 8.2) ... but while this might be sane for the leading edge of the table, you hardly want such low storage density in the stable part. You could reduce the fillfactor requirement if you could vacuum the table constantly (every 10 minutes or so) but I assume the table is large enough to make that unattractive. (Eventually we should have a version of vacuum that understands where the dirty stuff is, which might make this approach tenable ... but not in 8.2.) Your best bet might be to partition the table into two subtables, one with "stable" data and one with the fresh data, and transfer rows from one to the other once they get stable. Storage density in the "fresh" part would be poor, but it should be small enough you don't care. regards, tom lane
You are correct the main part I'm worried about is the updates, being so far from the originals. fyi I am partitioning the tables by the timestamp column,vacuum analyzing once per hour, creating one child partition per day in a cron job. Because I'm using hibernate for database abstraction (stateless sessions), I can only have one RULE since having more than one insert rule will not return the correct number of updated rows which confuses hibernate. The one rule just directs inserts to the latest child partition which seems to work well.
The reason I'm doing the clustering is I was hoping that with the "stable" non-updating partitions I could execute a CLUSTER at night (slow...) and it would compact the tables into their most efficient state for querying which always involves a date range. bad idea? In this fillfactor feature, will you be able to set it to 100% once you know that no more updates will occur? Or will doing a cluster effectively do this? Will the fill factor only apply for inserts?
"Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care."
This sounds interesting, I could create a RULE/INSERT on the unstable table, I will know during the update if it is ready to be put in the stable table. What would be an efficient way to do the transfer? Since the updates occur somewhat randomly, wouldnt the tuples in the stable table then be out of natural timestamp order?
thanks for all of your help and comments! it is greatly appreciated!
Gene Hart
--
Eugene Hart
The reason I'm doing the clustering is I was hoping that with the "stable" non-updating partitions I could execute a CLUSTER at night (slow...) and it would compact the tables into their most efficient state for querying which always involves a date range. bad idea? In this fillfactor feature, will you be able to set it to 100% once you know that no more updates will occur? Or will doing a cluster effectively do this? Will the fill factor only apply for inserts?
"Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care."
This sounds interesting, I could create a RULE/INSERT on the unstable table, I will know during the update if it is ready to be put in the stable table. What would be an efficient way to do the transfer? Since the updates occur somewhat randomly, wouldnt the tuples in the stable table then be out of natural timestamp order?
thanks for all of your help and comments! it is greatly appreciated!
Gene Hart
On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us > wrote:
Gene <genekhart@gmail.com> writes:
> I have a table that inserts lots of rows (million+ per day) int8 as primary
> key, and I cluster by a timestamp which is approximately the timestamp of
> the insert beforehand and is therefore in increasing order and doesn't
> change. Most of the rows are updated about 3 times over time roughly within
> the next 30 minutes.
ISTM you should hardly need to worry about clustering that --- the data
will be in timestamp order pretty naturally.
The main problem you're going to have is the update-3-times bit. You
could keep updated rows on the same page as the original if you ran the
table at fillfactor 25% (which you'll be able to do in 8.2) ... but
while this might be sane for the leading edge of the table, you hardly
want such low storage density in the stable part.
You could reduce the fillfactor requirement if you could vacuum the
table constantly (every 10 minutes or so) but I assume the table is
large enough to make that unattractive. (Eventually we should have
a version of vacuum that understands where the dirty stuff is, which
might make this approach tenable ... but not in 8.2.)
Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care.
regards, tom lane
--
Eugene Hart
Gene wrote: > You are correct the main part I'm worried about is the updates, being > so far from the originals. Yeah, you won't benefit from the patch at all. > The reason I'm doing the clustering is I was hoping that with the > "stable" non-updating partitions I could execute a CLUSTER at night > (slow...) and it would compact the tables into their most efficient > state for querying which always involves a date range. bad idea? In > this fillfactor feature, will you be able to set it to 100% once you > know that no more updates will occur? Or will doing a cluster > effectively do this? Will the fill factor only apply for inserts? That sounds like a good approach. CLUSTER obeys the fillfactor, so you'll want to set it to 100 for the older partitions before you CLUSTER. You might want to experiment with the fillfactor. You might get the best performance if you just set it to 100 even for the latest partition, if your queries usually have to scan most of it anyway. Fillfactor 100 will help to keep it dense and in memory, so it won't matter so much if it's disorganized. > "Your best bet might be to partition the table into two subtables, one > with "stable" data and one with the fresh data, and transfer rows from > one to the other once they get stable. Storage density in the "fresh" > part would be poor, but it should be small enough you don't care." > > This sounds interesting, I could create a RULE/INSERT on the unstable > table, I will know during the update if it is ready to be put in the > stable table. What would be an efficient way to do the transfer? Since > the updates occur somewhat randomly, wouldnt the tuples in the stable > table then be out of natural timestamp order? I'm not sure I understand the last sentence. I thought the updates usually occur within 30 minutes of the insert. So if you transfer the rows to the stable table after 30 minutes, there won't be updates to the stable table. - Heikki
Jonah H. Harris wrote: > On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> UPDATE tries to place the new tuple on the same page it's already >> on. > > I think he meant for INSERT. > Right. Update is indeed taken care of already. One example where this would help would be a customer_history table that stores all transactions of a customer. Primary key is (customer_id, transaction_id). You would want to keep the table clustered by customer_id to make it quick to retrieve all transactions of a customer. In general, any table with more or less random insert/delete activity that you want to keep in order would benefit. - Heikki
Gene <genekhart@gmail.com> writes: > "Your best bet might be to partition the table into two subtables, one > with "stable" data and one with the fresh data, and transfer rows from > one to the other once they get stable. Storage density in the "fresh" > part would be poor, but it should be small enough you don't care." > > This sounds interesting, I could create a RULE/INSERT on the unstable table, > I will know during the update if it is ready to be put in the stable table. > What would be an efficient way to do the transfer? Since the updates occur > somewhat randomly, wouldnt the tuples in the stable table then be out of > natural timestamp order? You may find it easier to handle some of the logic in a low level application layer or layer of stored procedures rather than trying to make it entirely transparent with rules. If you do want it to be transparent you might also consider whether you want triggers instead of rules. Another direction you might want to consider is whether the columns that you're updating would be more normalized in a separate table. You might really want to have a record of those past states as well. So you might find having three records in this other table for each of your regular records in your main table might actually work out better. Even if you only have a 1-1 relationship sometimes this kind of horizontal partitioning (or do people consider this vertical partitioning?) is still worthwhile. If the columns being updated are very small or often not needed at all then it may be reasonably efficient to look them up separately and still let you store the bulk of the data efficiently and access it in a fast sequential scan. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com