Re: partitionning - Mailing list pgsql-general
From | Greg Stark |
---|---|
Subject | Re: partitionning |
Date | |
Msg-id | 87oeds5rfq.fsf@stark.xeocode.com Whole thread Raw |
In response to | Re: partitionning (Scott Marlowe <smarlowe@g2switchworks.com>) |
Responses |
Re: partitionning
Re: partitionning Re: partitionning |
List | pgsql-general |
Scott Marlowe <smarlowe@g2switchworks.com> writes: > Wait, I'm not sure I understand your point here yet. Are you saying > something along the lines of that with a 1TB storage array, and putting > all the data in one big partitions, the DBAs had problems, but when they > partitioned it down to say 10 100G partitions, and split up the data set > across it they had fewer problems? I'm not sure we're talking about the same thing here. Partitioned tables are typically used when the data falls naturally into different buckets and the different buckets need to be treated differently. Especially when all the data in a single bucket often needs to be moved en masse. A typical scenarios are for accounting systems that need to make past year's books read-only or archive detail records for past years en masse. Without partitioning you're stuck doing massive deletes to archive the old data for an entire year. The bigger your database (and I imagine if you're paying $100k+ for a storage subsystem you have a pretty massive database) the more such data you're going to be trying to delete at once. A similar situation also arises when you have to reload this data -- either from a backup or for a DSS system backing your primary database. If you want to access the data in a single normalized table you're stuck doing massive inserts to move the data around. With partitioned tables you can pull an entire partition out of the table with a single DDL command. It's instantaneous, it doesn't even have to look at the data in the partition. Similarly you can load data into your table quickly (once it's been loaded into the database that is.) I think it's obvious that these features are particularly useful for large databases. Especially ones that have the ability to grow very quickly or load lots of data from elsewhere very quickly. Ie, precisely the ones that have humongous fast storage subsystems. Yes you can implement this using views of unions in postgres. Or you can implement it using inherited tables. But implementing it by hand has disadvantages. Oracle already went through this. They had DBAs using union views as early as version 7. They even had an optimizer hack that you could enable to take advantage of it. But it was a tremendous kludge. The real support they added in 8.1 is *much* better. > Was this some kind of massive batch processing system? Actually it was a web site, so an OLTP database. About 1-3M new records per day in the most important table. Before we implemented partitioning we had a regular job that spent 10+ hours deleting old records. If it failed in any way things became very unhappy quickly. After partitioning we could create a new partition and move the old partition out of the table at peak hours. Then we could back up and drop the table containing the old partition's data at our leisure. > While I'm sure administration overhead is the major problem, I'm > wondering what other ones you've seen, i.e. performance, reliability, > user error, that are more common on a home rolled partitioning. Well we never even tried to implement it as a home brew solution. I certainly would have been pretty afraid of trying anything so complex myself with live data. > But what I'm really saying is that between good home grown partitioning > and fast hardware, the need for the pg devel team to implement > partitioning is pretty low. Ah. I thought you were saying that the fast hardware made partitioning in any form unnecessary. Not merely that it made home brew partitioning an acceptable solution. But that's a bit of a silly proviso though isn't it? I mean you could do anything with enough plpgsql code and fast enough hardware. The real question is where is the best place for this to be implemented. Issuing a single atomic command sure makes me feel much better about something than trying to set up a half dozen triggers/rules on a view and hoping I get it all set up right. Especially when you think that I'll probably have to do this for several tables at the same time. Actually I have a strong feeling what really _ought_ to happen here is that the inherited tables support in postgres, which never really worked anyways, should be deprecated and eventually removed. All that infrastructure should be repurposed into partitioned tables. That seems like it would be a nice fit. -- greg
pgsql-general by date: