Thread: partitionning
hi
does the table partitionning exists under PostgreSQL ? or maybe an alternative exists ?
Oracle implements this system : it allows to spread rows on differents partitions depending on an attribute.
For example, my application store xml documents in database, an the partitionning is used to spread the differents documents on differents partitions.
thanks
On Wed, 2005-03-09 at 10:31, FERREIRA William (COFRAMI) wrote: > hi > > does the table partitionning exists under PostgreSQL ? or maybe an > alternative exists ? > > Oracle implements this system : it allows to spread rows on differents > partitions depending on an attribute. > For example, my application store xml documents in database, an the > partitionning is used to spread the differents documents on differents > partitions. Currently, doing this in PostgreSQL is more of a "roll your own" thing. you create the base tables on different table spaces, then create an updatable view with the proper triggers to make sure that the incoming records go to the right tables. With the advent of very large raid arrays with very fast caching controllers, this methodology is becoming less and less necessary.
Tablespaces are the closest thing. They were introduced in 8.0: http://www.postgresql.org/docs/8.0/static/sql-createtablespace.html -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Mar 9, 2005, at 10:31 AM, FERREIRA William (COFRAMI) wrote: > hi > > does the table partitionning exists under PostgreSQL ? or maybe an > alternative exists ? > > Oracle implements this system : it allows to spread rows on differents > partitions depending on an attribute. > For example, my application store xml documents in database, an the > partitionning is used to spread the differents documents on differents > partitions. > > thanks
Scott Marlowe <smarlowe@g2switchworks.com> writes: > With the advent of very large raid arrays with very fast caching > controllers, this methodology is becoming less and less necessary. I think the evidence is to the contrary. Witness the rather dramatic surge in inquiries about this on this list. A year ago there were only two or three of us pining for this feature. Now it's a weekly refrain. Very large very fast raid arrays just mean that people want to gather that much more data. They would still like to make the best use of their hardware and not waste their resources on tremendously inefficient purging and loading procedures when it would be possible to do these things instantaneously. That only becomes more important as the investment they want to leverage becomes larger. -- greg
Scott Marlowe wrote: >On Wed, 2005-03-09 at 10:31, FERREIRA William (COFRAMI) wrote: > > >>hi >> >>does the table partitionning exists under PostgreSQL ? or maybe an >>alternative exists ? >> >>Oracle implements this system : it allows to spread rows on differents >>partitions depending on an attribute. >>For example, my application store xml documents in database, an the >>partitionning is used to spread the differents documents on differents >>partitions. >> >> > >Currently, doing this in PostgreSQL is more of a "roll your own" thing. >you create the base tables on different table spaces, then create an >updatable view with the proper triggers to make sure that the incoming >records go to the right tables. > >With the advent of very large raid arrays with very fast caching >controllers, this methodology is becoming less and less necessary. > > I don't agree with this. There are many postgresql users who have 100, 200, 600 GB databases that don't have the budget to purchase a 20,000 array. Table partitioning is a way to keep things efficient. That should be regardless of technology. RAM is cheap, so lets just use as much of it as we can even if it means we swap. Sincerely, Joshua D. Drake >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Attachment
On Wed, 2005-03-09 at 13:07, Greg Stark wrote: > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > > With the advent of very large raid arrays with very fast caching > > controllers, this methodology is becoming less and less necessary. > > I think the evidence is to the contrary. Witness the rather dramatic surge in > inquiries about this on this list. A year ago there were only two or three of > us pining for this feature. Now it's a weekly refrain. > > Very large very fast raid arrays just mean that people want to gather that > much more data. They would still like to make the best use of their hardware > and not waste their resources on tremendously inefficient purging and loading > procedures when it would be possible to do these things instantaneously. That > only becomes more important as the investment they want to leverage becomes > larger. Actually, I think it is the more common scenario of migrating off of oracle or db2 and onto postgresql, and bringing along the experience gained there over the years that has caused this refrain to sprout up more and more often. With a database sitting on an enterpries class storage subsystem with 8 gigs of battery backed cache onboard, the needs for partitioning are less and less necessary. Not completely so, and there are times when they can come in handy (i.e. when you have to "roll your own" on a more limited budget). I'm note sure what your point about purging and loading is. A properly built home rolled partitioning setup reqiures none of that. it's just that whereas Oracle does it semi-automagically, the postgresql dba sets up the equivalent by hand. No purging or loading that I'm aware of is needed. Witness how often OTHER issues pop up (=NULL versus IS NULL, autonomous transactions, quotas) that are from oracle land nowadays. It isn't that the Oracle way is always the best way, it's just what folks are often used to. I really didn't see anything in your post that argued against my point that a large enterprise class raid array largely eliminates the needs for application level partitioning of data.
On Wed, 2005-03-09 at 13:21, Joshua D. Drake wrote: > Scott Marlowe wrote: > > >On Wed, 2005-03-09 at 10:31, FERREIRA William (COFRAMI) wrote: > > > > > >>hi > >> > >>does the table partitionning exists under PostgreSQL ? or maybe an > >>alternative exists ? > >> > >>Oracle implements this system : it allows to spread rows on differents > >>partitions depending on an attribute. > >>For example, my application store xml documents in database, an the > >>partitionning is used to spread the differents documents on differents > >>partitions. > >> > >> > > > >Currently, doing this in PostgreSQL is more of a "roll your own" thing. > >you create the base tables on different table spaces, then create an > >updatable view with the proper triggers to make sure that the incoming > >records go to the right tables. > > > >With the advent of very large raid arrays with very fast caching > >controllers, this methodology is becoming less and less necessary. > > > > > I don't agree with this. There are many postgresql users > who have 100, 200, 600 GB databases that don't have the budget > to purchase a 20,000 array. A Good AMI Megaraid card with 512 meg of battery backed cache will work the same here. I.e. eliminate the need for partitioning. it doesn't have to cost $20,000 to do it. And let's never forget that the data is where the value is, not the hardware. Notice I wasn't saying there was never a need anyway, just that it's becoming less of an issue each day. which it is. Back in the day a decent one scsi interface RAID card with battery backed cache was >$2,000. nowadays, they've dropped to the $300 to $500 range. I can't see the cost of a DBA figuring out and implementing partitioning schemes costing less than that in time. Plus it puts your data on a more reliable system. Penny wise and pound foolish to go cheap in my opinion. Anyone keeping 600 gigs of data and trying to save $500 in hardware costs is saving costs in the wrong places, imnsho... > Table partitioning is a way to keep things efficient. That > should be regardless of technology. Yes, it is a way. It's just a less necessary one than it once was, with hardware now able to provide the same performance increase with little or no work on the users part. We've got to weigh the increased complexity it would take to implement it in Postgresql and maintain it versus the gain, and I say the gain is smaller every day. > RAM is cheap, so lets just use as much of it as we can even > if it means we swap. What does table partitioning have to do with RAM?
Scott Marlowe <smarlowe@g2switchworks.com> writes: > Actually, I think it is the more common scenario of migrating off of > oracle or db2 and onto postgresql, and bringing along the experience > gained there over the years that has caused this refrain to sprout up > more and more often. With a database sitting on an enterpries class > storage subsystem with 8 gigs of battery backed cache onboard, the needs > for partitioning are less and less necessary. Not completely so, and > there are times when they can come in handy (i.e. when you have to "roll > your own" on a more limited budget). You don't think the people using Oracle are even *more* likely to have an big storage subsystem with gobs of cache? At my previous job we had a Hitachi system that was really ludicrously fast. Nonetheless when we implemented partitioning it was a real life saver for the DBAs. Their workload went from being >50% dealing with problems with the large weekly jobs to basically being able to ignore these jobs. In fact the job to handle the changover was moved to the middle of the peak period because it was just more convenient that way. The bigger the storage the *more* important partitioning is. Not less. That's why it's such a huge feature for Oracle and it's why databases used on smaller projects don't find it a compelling feature. > I'm note sure what your point about purging and loading is. A properly > built home rolled partitioning setup reqiures none of that. Well that's sort of the point. But home rolled partitioning setups have other problems. That's why it would be good to have a solid implementation that didn't have these problems. > Witness how often OTHER issues pop up (=NULL versus IS NULL, autonomous > transactions, quotas) that are from oracle land nowadays. It isn't that > the Oracle way is always the best way, it's just what folks are often > used to. =NULL is an Access thing actually. But yes, these other features are also things that the bigger boys need. But the most common requested one these days seems to be partitioning. Maybe I'm biased though. > I really didn't see anything in your post that argued against my point > that a large enterprise class raid array largely eliminates the needs > for application level partitioning of data. Firstly it's not application level if it's native. The postgres options such as inherited tables or union views do indeed impose application level constraints, but a good native implementation is completely transparent to the programmer. Well, so you're saying that you believe me that on my 1GB database I find it more convenient to be able to pull off 100M of data instantaneously and without generating any garbage for vacuum to clean up. But that you don't believe someone running a 1TB storage subsystem would appreciate the same feature as much when they have to pull off 10GB of data because their system is 10x faster at doing this unnecessary work than mine would be, so it only takes 100x as much time? -- greg
On Wed, 2005-03-09 at 14:47, Greg Stark wrote: > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > > Actually, I think it is the more common scenario of migrating off of > > oracle or db2 and onto postgresql, and bringing along the experience > > gained there over the years that has caused this refrain to sprout up > > more and more often. With a database sitting on an enterpries class > > storage subsystem with 8 gigs of battery backed cache onboard, the needs > > for partitioning are less and less necessary. Not completely so, and > > there are times when they can come in handy (i.e. when you have to "roll > > your own" on a more limited budget). > > You don't think the people using Oracle are even *more* likely to have an big > storage subsystem with gobs of cache? At my previous job we had a Hitachi > system that was really ludicrously fast. Nonetheless when we implemented > partitioning it was a real life saver for the DBAs. Their workload went from > being >50% dealing with problems with the large weekly jobs to basically being > able to ignore these jobs. In fact the job to handle the changover was moved > to the middle of the peak period because it was just more convenient that way. Wait, I'm not sure I understand your point here yet. Are you saying something along the lines of that with a 1TB storage array, and putting all the data in one big partitions, the DBAs had problems, but when they partitioned it down to say 10 100G partitions, and split up the data set across it they had fewer problems? Or are you talking more about quick access because when you have a table with three fields, and one is rather narrow, that field can be accessed much quicker because you aren't dragging around the extra fields you don't need? Was this some kind of massive batch processing system? > The bigger the storage the *more* important partitioning is. Not less. That's > why it's such a huge feature for Oracle and it's why databases used on smaller > projects don't find it a compelling feature. > > > I'm note sure what your point about purging and loading is. A properly > > built home rolled partitioning setup reqiures none of that. > > Well that's sort of the point. But home rolled partitioning setups have other > problems. That's why it would be good to have a solid implementation that > didn't have these problems. While I'm sure administration overhead is the major problem, I'm wondering what other ones you've seen, i.e. performance, reliability, user error, that are more common on a home rolled partitioning. > > I really didn't see anything in your post that argued against my point > > that a large enterprise class raid array largely eliminates the needs > > for application level partitioning of data. > > Firstly it's not application level if it's native. sorry, by application, I meant Postgresql client layer. I.e. implemented in the user layer of postgresql, not in the core of it. > Well, so you're saying that you believe me that on my 1GB database I find it > more convenient to be able to pull off 100M of data instantaneously and > without generating any garbage for vacuum to clean up. But that you don't > believe someone running a 1TB storage subsystem would appreciate the same > feature as much when they have to pull off 10GB of data because their system > is 10x faster at doing this unnecessary work than mine would be, so it only > takes 100x as much time? Why in the world would a view encased union of several tables generate garbage for a vacuum to clean up? I'd think a well rolled view/union/trigger based solution would operated identically to a postgresql internally implemented partitioning system. But what I'm really saying is that between good home grown partitioning and fast hardware, the need for the pg devel team to implement partitioning is pretty low.
Scott Marlowe <smarlowe@g2switchworks.com> writes: > Wait, I'm not sure I understand your point here yet. Are you saying > something along the lines of that with a 1TB storage array, and putting > all the data in one big partitions, the DBAs had problems, but when they > partitioned it down to say 10 100G partitions, and split up the data set > across it they had fewer problems? I'm not sure we're talking about the same thing here. Partitioned tables are typically used when the data falls naturally into different buckets and the different buckets need to be treated differently. Especially when all the data in a single bucket often needs to be moved en masse. A typical scenarios are for accounting systems that need to make past year's books read-only or archive detail records for past years en masse. Without partitioning you're stuck doing massive deletes to archive the old data for an entire year. The bigger your database (and I imagine if you're paying $100k+ for a storage subsystem you have a pretty massive database) the more such data you're going to be trying to delete at once. A similar situation also arises when you have to reload this data -- either from a backup or for a DSS system backing your primary database. If you want to access the data in a single normalized table you're stuck doing massive inserts to move the data around. With partitioned tables you can pull an entire partition out of the table with a single DDL command. It's instantaneous, it doesn't even have to look at the data in the partition. Similarly you can load data into your table quickly (once it's been loaded into the database that is.) I think it's obvious that these features are particularly useful for large databases. Especially ones that have the ability to grow very quickly or load lots of data from elsewhere very quickly. Ie, precisely the ones that have humongous fast storage subsystems. Yes you can implement this using views of unions in postgres. Or you can implement it using inherited tables. But implementing it by hand has disadvantages. Oracle already went through this. They had DBAs using union views as early as version 7. They even had an optimizer hack that you could enable to take advantage of it. But it was a tremendous kludge. The real support they added in 8.1 is *much* better. > Was this some kind of massive batch processing system? Actually it was a web site, so an OLTP database. About 1-3M new records per day in the most important table. Before we implemented partitioning we had a regular job that spent 10+ hours deleting old records. If it failed in any way things became very unhappy quickly. After partitioning we could create a new partition and move the old partition out of the table at peak hours. Then we could back up and drop the table containing the old partition's data at our leisure. > While I'm sure administration overhead is the major problem, I'm > wondering what other ones you've seen, i.e. performance, reliability, > user error, that are more common on a home rolled partitioning. Well we never even tried to implement it as a home brew solution. I certainly would have been pretty afraid of trying anything so complex myself with live data. > But what I'm really saying is that between good home grown partitioning > and fast hardware, the need for the pg devel team to implement > partitioning is pretty low. Ah. I thought you were saying that the fast hardware made partitioning in any form unnecessary. Not merely that it made home brew partitioning an acceptable solution. But that's a bit of a silly proviso though isn't it? I mean you could do anything with enough plpgsql code and fast enough hardware. The real question is where is the best place for this to be implemented. Issuing a single atomic command sure makes me feel much better about something than trying to set up a half dozen triggers/rules on a view and hoping I get it all set up right. Especially when you think that I'll probably have to do this for several tables at the same time. Actually I have a strong feeling what really _ought_ to happen here is that the inherited tables support in postgres, which never really worked anyways, should be deprecated and eventually removed. All that infrastructure should be repurposed into partitioned tables. That seems like it would be a nice fit. -- greg
Scott Marlowe <smarlowe@g2switchworks.com> writes: > Yes, it is a way. It's just a less necessary one than it once was, with > hardware now able to provide the same performance increase with little > or no work on the users part. We've got to weigh the increased > complexity it would take to implement it in Postgresql and maintain it > versus the gain, and I say the gain is smaller every day. Now I think you're contradicting your argument in the other subthread. It's certainly *much* more complex to have to implement this yourself for each table than to have it as a native postgres feature. So I think you're saying using partitioning in any form, whether native or home-brew, is better because of the simplicity. But if that's the argument then you're wrong about the high end controllers making this less urgent. High end hardware controllers only make it easier to gather the kind of data that requires some form of partitioning in one form or another to make it manageable. In any case partitioning offers algorithmic improvements in performance. No matter how fast your controller is it's not going to be able to delete 100G of data and match the speed of simply dropping a partition using DDL. Partitioning is something DBAs are doing more and more often as the data sets grow. And it's something Postgres DBAs are doing more and more often as Postgres moves into problem domains that were previously the domain of Oracle and DB2 DBAs. The only choice is whether they're doing it by kludging a failure-prone and suboptimal system or whether it's built into the database in a reliable, convenient, and well designed form. -- greg
Greg Stark wrote: > > Actually I have a strong feeling what really _ought_ to happen here is that > the inherited tables support in postgres, which never really worked anyways, > should be deprecated and eventually removed. All that infrastructure should be > repurposed into partitioned tables. That seems like it would be a nice fit. > I don't know about deprecating inheritance, but I agree with pretty much everything Greg has said on this thread. In particular, I have felt for several years now that the inheritance infrastructure could be used to implement table partitioning. We're using inheritance for DIY table partitioning on very expensive storage hardware (~$500K), and we'd be dead in the water without it. Joe
On Wed, 2005-03-09 at 17:29, Greg Stark wrote: > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > But what I'm really saying is that between good home grown partitioning > > and fast hardware, the need for the pg devel team to implement > > partitioning is pretty low. > > Ah. I thought you were saying that the fast hardware made partitioning in any > form unnecessary. Not merely that it made home brew partitioning an acceptable > solution. > > But that's a bit of a silly proviso though isn't it? I mean you could do > anything with enough plpgsql code and fast enough hardware. The real question > is where is the best place for this to be implemented. Well, this is PostgreSQL, an extensible database, so the answer, to me, is to implement it with a simple set of UDFs in userland as a proof of concept. much like the materialized views recently discussed here. After that, if someone thinks the basic concept is sound, it should get migrated into the backend. > Issuing a single atomic command sure makes me feel much better about something > than trying to set up a half dozen triggers/rules on a view and hoping I get > it all set up right. Especially when you think that I'll probably have to do > this for several tables at the same time. Sure, I'd love that too. But I think it's a bit too far ahead of the game right now. > Actually I have a strong feeling what really _ought_ to happen here is that > the inherited tables support in postgres, which never really worked anyways, > should be deprecated and eventually removed. All that infrastructure should be > repurposed into partitioned tables. That seems like it would be a nice fit. I would imagine that both might be saved by the same task. i.e. if indexes and triggers could span across multiple tables etc... then partitioned tables would be a pretty simple view creation.
> Actually I have a strong feeling what really _ought_ to happen here is that > the inherited tables support in postgres, which never really worked anyways, > should be deprecated and eventually removed. Hopefully not. They are useful for other things, too. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Back to the original question on this thread, and using PG 8.0.1. Perhaps someone would like to poke holes in this (other than the need to set up fkeys and indexes on the inherited tables...): ------------------------------------------------------------------------------- begin; create schema partition_test; set search_path to partition_test,public; create table test_base ( id serial, value text, partitioner timestamp with time zone default now()); create table test_2005_03 () inherits (test_base); create table test_2005_04 () inherits (test_base); create rule base_partitioner_test_2005_03 as on insert to test_base where partitioner AT TIME ZONE 'UTC' between '2005-03-01'::timestamp and '2005-04-01'::timestamp do instead insert into test_2005_03 values (NEW.*); create rule base_partitioner_test_2005_04 as on insert to test_base where partitioner AT TIME ZONE 'UTC' between '2005-04-01'::timestamp and '2005-05-01'::timestamp do instead insert into test_2005_04 values (NEW.*); insert into test_base (value) values ('first string'); insert into test_base (value, partitioner) values ('a string', '2004-01-30 10:17:08'); insert into test_base (value, partitioner) values ('a string', '2005-04-01 14:17:08'); explain analyze select * from test_base; select tableoid::regclass,* from test_base; rollback; ------------------------------------------------------------------------------- This could easily be wrapped up in a 'create_partition_by_date(base_table,partition_start,partition_length)' function, I just haven't gotten around to that part yet. The function could even look up the indexes and fkeys on the base table using the INFORMATION_SCHEMA views. One thing to note about PG not having indexes across tables, if we do get in-memory bitmap indexes I believe that the indexes on each inherited table would actually be combined (if the planner sees it as a win). Comments? On Thu, 10 Mar 2005 12:59:35 +0100, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > > Actually I have a strong feeling what really _ought_ to happen here is that > > the inherited tables support in postgres, which never really worked anyways, > > should be deprecated and eventually removed. > Hopefully not. They are useful for other things, too. > > Karsten > -- > GPG key ID E4071346 @ wwwkeys.pgp.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
Hi, On Sat, 12 Mar 2005, Mike Rylander wrote: > Back to the original question on this thread, and using PG 8.0.1. > Perhaps someone would like to poke holes in this (other than the need > to set up fkeys and indexes on the inherited tables...): > > ------------------------------------------------------------------------------- > > begin; > create schema partition_test; > set search_path to partition_test,public; > > create table test_base ( id serial, value text, partitioner timestamp > with time zone default now()); > create table test_2005_03 () inherits (test_base); > create table test_2005_04 () inherits (test_base); > > create rule base_partitioner_test_2005_03 as on insert > to test_base where partitioner AT TIME ZONE 'UTC' between > '2005-03-01'::timestamp and '2005-04-01'::timestamp > do instead insert into test_2005_03 values (NEW.*); > > create rule base_partitioner_test_2005_04 as on insert > to test_base where partitioner AT TIME ZONE 'UTC' between > '2005-04-01'::timestamp and '2005-05-01'::timestamp > do instead insert into test_2005_04 values (NEW.*); > > insert into test_base (value) values ('first string'); > insert into test_base (value, partitioner) values ('a string', > '2004-01-30 10:17:08'); > insert into test_base (value, partitioner) values ('a string', > '2005-04-01 14:17:08'); > > explain analyze select * from test_base; > select tableoid::regclass,* from test_base; > > rollback; > > ------------------------------------------------------------------------------- any ideas how we would get select * from test_base where partitioner between a and b to just look in the correct partition and not doing a full sweep over the other partitions or the base. Greetings Christian -- Christian Kratzer ck@cksoft.de CK Software GmbH http://www.cksoft.de/ Phone: +49 7452 889 135 Fax: +49 7452 889 136
On Sat, 12 Mar 2005 17:39:38 +0100 (CET), Christian Kratzer <ck-lists@cksoft.de> wrote: > Hi, > > On Sat, 12 Mar 2005, Mike Rylander wrote: > > > Back to the original question on this thread, and using PG 8.0.1. > > Perhaps someone would like to poke holes in this (other than the need > > to set up fkeys and indexes on the inherited tables...): > > > > ------------------------------------------------------------------------------- > > > > begin; > > create schema partition_test; > > set search_path to partition_test,public; > > > > create table test_base ( id serial, value text, partitioner timestamp > > with time zone default now()); > > create table test_2005_03 () inherits (test_base); > > create table test_2005_04 () inherits (test_base); > > > > create rule base_partitioner_test_2005_03 as on insert > > to test_base where partitioner AT TIME ZONE 'UTC' between > > '2005-03-01'::timestamp and '2005-04-01'::timestamp > > do instead insert into test_2005_03 values (NEW.*); > > > > create rule base_partitioner_test_2005_04 as on insert > > to test_base where partitioner AT TIME ZONE 'UTC' between > > '2005-04-01'::timestamp and '2005-05-01'::timestamp > > do instead insert into test_2005_04 values (NEW.*); > > > > insert into test_base (value) values ('first string'); > > insert into test_base (value, partitioner) values ('a string', > > '2004-01-30 10:17:08'); > > insert into test_base (value, partitioner) values ('a string', > > '2005-04-01 14:17:08'); > > > > explain analyze select * from test_base; > > select tableoid::regclass,* from test_base; > > > > rollback; > > > > ------------------------------------------------------------------------------- > > any ideas how we would get > > select * from test_base where partitioner between a and b > > to just look in the correct partition and not doing a full sweep over > the other partitions or the base. There is now way to have PG completely skip the unused partitions. However, with an index on the "partitioner" column of each partition table PG will basically fall of the end of the index almost imediately, and then use an appropiate plan for the useful partitions. It's not the same as Oracle partitioning, but without expicit Planner work it can't be. It should be about the same speed as one gigantic table, perhaps faster for some queries where there are many total partitions and few useful ones, but for my money that's not where the biggest benefit of partitioning lies. My favorite feature of classic partitioning is that you can load and unload large chunks of the data without affecting everyone else; you can very easily archive unused partitions or transfer them to a statistics server for data mining with no impact on other writers. Another benefit would to be the ability to put the currently write-active partition on a very fast/expensive storage area and move the older data to a slower/cheaper storage area using tablespaces. -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
On Sat, Mar 12, 2005 at 08:24:20PM +0000, Mike Rylander wrote: > There is now way to have PG completely skip the unused partitions. > However, with an index on the "partitioner" column of each partition > table PG will basically fall of the end of the index almost > imediately, and then use an appropiate plan for the useful partitions. > It's not the same as Oracle partitioning, but without expicit Planner > work it can't be. Note, the planner does have most of the intelligence needed to determine when it can skip a partition. It has code that can determine if the intersection of: Partition constraint: lower bound < value < upper bound Query clause: value [operator] constant Will be always false or can be true. It's just a matter of getting the clause to the right place to make it skip... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Sun, 13 Mar 2005 00:18:39 +0100, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Sat, Mar 12, 2005 at 08:24:20PM +0000, Mike Rylander wrote: > > There is now way to have PG completely skip the unused partitions. > > However, with an index on the "partitioner" column of each partition > > table PG will basically fall of the end of the index almost > > imediately, and then use an appropiate plan for the useful partitions. > > It's not the same as Oracle partitioning, but without expicit Planner > > work it can't be. > > Note, the planner does have most of the intelligence needed to > determine when it can skip a partition. It has code that can determine > if the intersection of: > > Partition constraint: lower bound < value < upper bound > Query clause: value [operator] constant > > Will be always false or can be true. It's just a matter of getting the > clause to the right place to make it skip... I see. I should have qualified my statement with "*as of now* there is no way". If I felt comfortable digging into the planner I'd offer to take that on, but... Thanks for pointing out that detail! -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
Martijn van Oosterhout <kleptog@svana.org> writes: > Note, the planner does have most of the intelligence needed to > determine when it can skip a partition. The issue isn't really whether the planner *could* use a constraint to decide that a table need not be scanned at all. As you say, we have practically all the infrastructure needed for such inferences. To me the real reason why we don't do that is the lack of a mechanism to force the plan to be reconsidered if the constraint is dropped. Without the constraint, the plan will still run, but it may deliver wrong answers. (The closest current equivalent is the possibility of dropping an index that a plan depends on --- but at least we will fail outright if the index isn't there anymore.) In short, I won't accept any such patch until after we build a mechanism for invalidating cached plans. Which we need anyway for other reasons. So the path ahead is clear, in my mind anyway ... but this is not the next step to take. regards, tom lane
On Sat, Mar 12, 2005 at 11:35:51PM -0500, Tom Lane wrote: > The issue isn't really whether the planner *could* use a constraint > to decide that a table need not be scanned at all. As you say, we > have practically all the infrastructure needed for such inferences. > To me the real reason why we don't do that is the lack of a mechanism > to force the plan to be reconsidered if the constraint is dropped. > Without the constraint, the plan will still run, but it may deliver > wrong answers. (The closest current equivalent is the possibility of > dropping an index that a plan depends on --- but at least we will fail > outright if the index isn't there anymore.) Wow, now that's a corner case I hadn't though of. Actually, it seems to me a similar thing may happen if you have a query on a table and you create a new inherited table from that. Is the new table included in this old plan? There are any number of DDL statements that can affect planning. All of them I guess... > In short, I won't accept any such patch until after we build a mechanism > for invalidating cached plans. Which we need anyway for other reasons. > So the path ahead is clear, in my mind anyway ... but this is not the > next step to take. People still keep running into the invalid plan issue, consider temporary tables. Would if be feasable to have a plan maintain a list of objects it depends on, every column, table, constraint, domain, function, type, etc referred to and regenerate the plan if any of them change. This list could become huge for any moderatly complicated query and there doesn't seem to be a lot of gain most of the time. But it would be needed for completeness. Actually, it would probably be enough to just list tables and columns and have changes in constraints, triggers and default values be changes on the columns they refer to. I'm sure this has been discussed to death already though and is just waiting for a suitable implementation. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.