Thread: What needs to be done for real Partitioning?
Folks, I may (or may not) soon have funding for implementing full table partitioning in PostgreSQL. I thought it would be a good idea to discuss with people here who are already using pseudo-partitioning what things need to be added to Postgresql in order to make full paritioning a reality; that is, what do other databases do that we don't? Implementations are seperated into phases I and II, II being harder-and-optional-stuff that may get done later, I being essential features. Ph. I -- CREATE TABLE ... WITH PARTITION ON {expression} ---- should automatically create expression index on {expression} -- INSERT INTO should automatically create new partitions where necessary ---- new tables should automatically inherit all constraints, indexes, keys of "parent" table -- UPDATE should automatically move rows between partitions where applicable -- Query Planner/Executor should be improved to not always materialize paritioned tables used in subqueries and joins. Ph. II -- Foreign Keys to/from partitioned tables should become possible -- Query Planner/Executor should be improved to only join partitions which are compliant with the query's WHERE or JOIN clauses where reasonable -- DELETE FROM should automatically drop empty partitions -- setting of WITH PARTITION ON {expression} TABLESPACE should automatically create a new tablespace for each new partition and its indexes. -- It should be possible to create new, empty partitions via a CREATE TABLE PARTITION OF {table} ON {value} expression. All syntax above is, of course, highly debatable. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > -- CREATE TABLE ... WITH PARTITION ON {expression} I'd rather see the partition control stuff as ALTER TABLE commands, not decoration on CREATE TABLE. See the WITH OIDS business we just went through: adding nonstandard decoration to a standard command isn't good. > -- INSERT INTO should automatically create new partitions where necessary > -- DELETE FROM should automatically drop empty partitions I am not sure I agree with either of those, and the reason is that they would turn low-lock operations into high-lock operations. DELETE FROM would be particularly bad. Furthermore, who wants to implement DROP PARTITION as a DELETE FROM? ISTM the whole point of partitioning is to be able to load and unload whole partitions quickly, and having to DELETE all the rows in a partition isn't my idea of quick. > -- setting of WITH PARTITION ON {expression} TABLESPACE should automatically > create a new tablespace for each new partition and its indexes. This is a bad idea. Where are you going to create these automatic tablespaces? What will they be named? Won't this require superuser privileges? And what's the point anyway? > -- It should be possible to create new, empty partitions via a CREATE TABLE > PARTITION OF {table} ON {value} expression. Huh? ISTM this confuses establishment of a table's partition rule with the act of pre-creating empty partitions for not-yet-used ranges of partition keys. Or are you trying to suggest that a table could be partitioned more than one way at a time? If so, how? regards, tom lane
This is really great ! Think about altering the partitioning (this is quite complex) : imagine a table split in several partitions "archive" and "current" where a row is moved from current to archive when it will not be updated anymore. Sometimes you can partition on a simple numeric value, or even a boolean value in this case. Other times you'd have to partition on a date, (current month, current year, archive...). So, how to move the partition between the two tables so that the oldest rows in the current month table are moved to the current year table at the end of each month ? Some ideas : hidden field (like oid was) to indicate in which partition the tuple is ? On Sat, 19 Mar 2005 21:02:38 +0100, Josh Berkus <josh@agliodbs.com> wrote: > Folks, > > I may (or may not) soon have funding for implementing full table > partitioning > in PostgreSQL. I thought it would be a good idea to discuss with people > here > who are already using pseudo-partitioning what things need to be added to > Postgresql in order to make full paritioning a reality; that is, what do > other databases do that we don't? > > Implementations are seperated into phases I and II, II being > harder-and-optional-stuff that may get done later, I being essential > features. > > Ph. I > -- CREATE TABLE ... WITH PARTITION ON {expression} > ---- should automatically create expression index on {expression} > -- INSERT INTO should automatically create new partitions where necessary > ---- new tables should automatically inherit all constraints, > indexes, > keys of "parent" table > -- UPDATE should automatically move rows between partitions where > applicable > -- Query Planner/Executor should be improved to not always materialize > paritioned tables used in subqueries and joins. > > Ph. II > -- Foreign Keys to/from partitioned tables should become possible > -- Query Planner/Executor should be improved to only join partitions > which are > compliant with the query's WHERE or JOIN clauses where reasonable > -- DELETE FROM should automatically drop empty partitions > -- setting of WITH PARTITION ON {expression} TABLESPACE should > automatically > create a new tablespace for each new partition and its indexes. > -- It should be possible to create new, empty partitions via a CREATE > TABLE > PARTITION OF {table} ON {value} expression. > > All syntax above is, of course, highly debatable. >
From: "Tom Lane" <tgl@sss.pgh.pa.us> > Josh Berkus <josh@agliodbs.com> writes: > > -- INSERT INTO should automatically create new partitions where necessary > > -- DELETE FROM should automatically drop empty partitions > > I am not sure I agree with either of those, and the reason is that they > would turn low-lock operations into high-lock operations. I second this. We're current using an inheritance based partitioning scheme with automatic partition creation in the application code, and have seen at least one case of deadlock due to partition creation. Other phase II/III items might include: - Modify the partitioning scheme of a table. In the above example, adding a '200504' partition, and moving the '200502' orders into 'ARCHIVE' - The ability to place a partition in a tablespace. In the example above, it would be nice to put the 'ARCHIVE' partition would likely be placed on a slower set of disks than the most recent month's partition. - Global indexes (that is to say, an index spanning the the table rather than an individual partition). This seems counterintuitive, but they've dramatically increased performance on one of our Oracle systems and should at least be worth considering.
On Sat, Mar 19, 2005 at 11:24:39PM +0100, PFC wrote: > Some ideas : > hidden field (like oid was) to indicate in which partition the tuple > is ? I think that to make partitioning really possible we need to have multi-relfilenode tables. We probably also need multi-table indexes. Implementing these would be good for inheritance too. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) Dios hizo a Adán, pero fue Eva quien lo hizo hombre.
PFC <lists@boutiquenumerique.com> writes: > Some ideas : > hidden field (like oid was) to indicate in which partition the tuple is ? tableoid would accomplish that already, assuming that the "partitioned table" is effectively a view on separate physical tables. regards, tom lane
Tom, Stacy, Alvaro, > I'd rather see the partition control stuff as ALTER TABLE commands, > not decoration on CREATE TABLE. See the WITH OIDS business we just went > through: adding nonstandard decoration to a standard command isn't good. OK, sure. > > -- INSERT INTO should automatically create new partitions where necessary > > -- DELETE FROM should automatically drop empty partitions > > I am not sure I agree with either of those, and the reason is that they > would turn low-lock operations into high-lock operations. For INSERT, I think that's a problem we need to work through. Partitioning on any scheme where you have to depend on the middleware to create new partitions could never be more than a halfway implementation. For one thing, if we can't have 100% dependence on the idea that Table M, Partition 34 contains index values Y-Z, then that form of advanced query rewriting (which is a huge performance gain on really large tables) becomes inaccessable. Or are you proposing, instead, that attempts to insert beyond the range raise an error? > DELETE FROM > would be particularly bad. Furthermore, who wants to implement DROP > PARTITION as a DELETE FROM? ISTM the whole point of partitioning is to > be able to load and unload whole partitions quickly, and having to > DELETE all the rows in a partition isn't my idea of quick. I mostly threw DELETE in for obvious symmetry. If it's complicated, we can drop it. And you're right, I forgot DROP PARTITION. > This is a bad idea. Where are you going to create these automatic > tablespaces? What will they be named? Won't this require superuser > privileges? And what's the point anyway? Stacy White suggests the more sensible version of this: ALTER TABLE {table} CREATE PARTITION WITH VALUE {value} ON TABLESPACE {tablespacename}. Manually creating the partitions in the appropriate location probably makes the most sense. The point, btw, is that if you have a 2TB table, you probably want to put its partitions on several seperate disk arrays. > Huh? ISTM this confuses establishment of a table's partition rule with > the act of pre-creating empty partitions for not-yet-used ranges of > partition keys. I don't understand why this would be confusing. If INSERT isn't creating partitions on new value breakpoint, then CREATE PARTITION needs to. > Or are you trying to suggest that a table could be > partitioned more than one way at a time? If so, how? No. > - Modify the partitioning scheme of a table. In the above example, adding > a '200504' partition, and moving the '200502' orders into 'ARCHIVE' Hmmm ... I don't see the point in automating this. Can you explain? > - Global indexes (that is to say, an index spanning the the table rather > than an individual partition). This seems counterintuitive, but they've > dramatically increased performance on one of our Oracle systems and should > at least be worth considering. Hmmm, again can you detail this? Maybe some performance examples? It seems to me that global indexes might interfere with the maintenance advantages of partitioning. > We probably also need multi-table indexes. Implementing these would be > good for inheritance too. They would be nice, but I don't see them as a requirement for making partitioning work. -- Josh Berkus Aglio Database Solutions San Francisco
On Sat, Mar 19, 2005 at 12:02:38PM -0800, Josh Berkus wrote: > Folks, > > I may (or may not) soon have funding for implementing full table partitioning > in PostgreSQL. I thought it would be a good idea to discuss with people here > who are already using pseudo-partitioning what things need to be added to > Postgresql in order to make full paritioning a reality; that is, what do > other databases do that we don't? > > Implementations are seperated into phases I and II, II being > harder-and-optional-stuff that may get done later, I being essential > features. > > Ph. I > -- CREATE TABLE ... WITH PARTITION ON {expression} > ---- should automatically create expression index on {expression} ALTER TABLE might be cleaner, perhaps? > -- INSERT INTO should automatically create new partitions where necessary > ---- new tables should automatically inherit all constraints, indexes, > keys of "parent" table > -- UPDATE should automatically move rows between partitions where applicable > -- Query Planner/Executor should be improved to not always materialize > paritioned tables used in subqueries and joins. Would the SELECT also look at the parent table, if it weren't empty? I can think of cases where that'd be useful, especially if an existing table can be partitioned with an ALTER TABLE. This covers almost everything I'd want from table partitioning in the short term. > Ph. II > -- Foreign Keys to/from partitioned tables should become possible > -- Query Planner/Executor should be improved to only join partitions which are > compliant with the query's WHERE or JOIN clauses where reasonable > -- DELETE FROM should automatically drop empty partitions > -- setting of WITH PARTITION ON {expression} TABLESPACE should automatically > create a new tablespace for each new partition and its indexes. > -- It should be possible to create new, empty partitions via a CREATE TABLE > PARTITION OF {table} ON {value} expression. > > All syntax above is, of course, highly debatable. Multi-table indexes would be nice too, though that leads to some problems when a partition is truncated or dropped, I guess. Cheers, Steve
> tableoid would accomplish that already, assuming that the "partitioned > table" is effectively a view on separate physical tables. > > regards, tom lane Very good. Also note the possibility to mark a partition READ ONLY. Or even a table. It does not seem very useful but just think that for instance the "1999", "2000" ... "2004" partitions of a big archive probably never change. READLONY means we're sure they never change, thus no need to backup them every time. Keeping the example of some DB arranged by years / current year / current month, Just backup the "current month" part every day and the "current year" every month when you switch partitions. This could be achieved also by storing the time of last modification of a table somewhere.
Josh Berkus <josh@agliodbs.com> writes: >>> -- INSERT INTO should automatically create new partitions where necessary >>> -- DELETE FROM should automatically drop empty partitions >> >> I am not sure I agree with either of those, and the reason is that they >> would turn low-lock operations into high-lock operations. > For INSERT, I think that's a problem we need to work through. Possibly, but I'm concerned about locking and deadlock issues. The reason that this is iffy is you would start the operation with only an INSERT-grade lock, and then discover that you needed to add a partition, which is surely something that needs an exclusive-grade lock (consider two sessions trying to add the same partition at the same time). So I don't see how to do it without lock upgrading, and lock upgrading is always a recipe for deadlocks. The DELETE case is even worse because you can't physically release storage until you're sure nothing in it is needed anymore by any open transaction --- that introduces VACUUM-like issues as well as the deadlock problem. > Or are you proposing, instead, that attempts to insert beyond the > range raise an error? That was what I had in mind --- then adding partitions would require a manual operation. This would certainly be good enough for "phase I" IMHO. regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > We probably also need multi-table indexes. As Josh says, that seems antithetical to the main point of partitioning, which is to be able to rapidly remove (and add) partitions of a table. If you have to do index cleaning before you can drop a partition, what's the point of partitioning? regards, tom lane
On Sat, Mar 19, 2005 at 07:03:19PM -0500, Tom Lane wrote: > Possibly, but I'm concerned about locking and deadlock issues. The > reason that this is iffy is you would start the operation with only > an INSERT-grade lock, and then discover that you needed to add a > partition, which is surely something that needs an exclusive-grade > lock (consider two sessions trying to add the same partition at the > same time). So I don't see how to do it without lock upgrading, > and lock upgrading is always a recipe for deadlocks. What about letting something periodical (say, vacuum) do this? /* Steinar */ -- Homepage: http://www.sesse.net/
On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > We probably also need multi-table indexes. > > As Josh says, that seems antithetical to the main point of partitioning, > which is to be able to rapidly remove (and add) partitions of a table. > If you have to do index cleaning before you can drop a partition, what's > the point of partitioning? Hmm. You are right, but without that we won't be able to enforce uniqueness on the partitioned table (we could only enforce it on each partition, which would mean we can't partition on anything else than primary keys if the tables have one). IMHO this is something to consider. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)
On Sun, 2005-03-20 at 00:29 -0400, Alvaro Herrera wrote: > On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > We probably also need multi-table indexes. > > > > As Josh says, that seems antithetical to the main point of partitioning, > > which is to be able to rapidly remove (and add) partitions of a table. > > If you have to do index cleaning before you can drop a partition, what's > > the point of partitioning? > > Hmm. You are right, but without that we won't be able to enforce > uniqueness on the partitioned table (we could only enforce it on each > partition, which would mean we can't partition on anything else than > primary keys if the tables have one). IMHO this is something to > consider. Could uniqueness across partitions be checked for using a mechanism similar to what a deferred unique constraint would use (trigger / index combination)?
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Hmm. You are right, but without that we won't be able to enforce > uniqueness on the partitioned table (we could only enforce it on each > partition, which would mean we can't partition on anything else than > primary keys if the tables have one). IMHO this is something to > consider. Well, partitioning on the primary key would be Good Enough for 95% or 99% of the real problems out there. I'm not excited about adding a large chunk of complexity to cover another few percent. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: > -- INSERT INTO should automatically create new partitions where necessary > ---- new tables should automatically inherit all constraints, indexes, > keys of "parent" table I think you're going about this backwards. Phase I should be an entirely manual system where you add and remove partitions manually and create and drop indexes you want manually. You need these low level interfaces anyways for a complete system, it doesn't make sense to have everything automatic and then later try to wedge in a low level interface. Only once you have that do you then start offering options to do these things automatically. I also think there are a few other components mixed up in your proposal that are really not integral to partitioned tables. Tablespaces and expression indexes may well be useful features to use in combination with partitioned tables, but they shouldn't be required or automatic. From my experience with Oracle I think there's one big concept that makes the whole system make a lot more sense: individual partitions are really tables. The partitioned tables themselves are just meta-objects like views. Once you get that concept the whole featureset makes a lot more sense. You can pull a partition out of a partitioned table and it becomes a normal table. You can take a normal table and put it into a partitioned table. Creating a new partition or altering a partition is just the same as creating or altering a new table (except for the actual data definition part). Given that understanding it's clear that tablespaces are an entirely orthogonal feature. One that happens to play well with partitioned tables, but not one that partitioned tables need any special support for. When you create a new partition or create a table intending to add it as a partition to a partitioned table you specify the tablespace just as you would normally do. It's also clear that the last thing you want is an index on the partition key. A big part of the advantage of partitioned tables is precisely that you get the advantage of an index on a column without the extra expense. It would also be reasonable to allow clustering individual partitions; creating table or column constraints on some partitions and not others; or even allow having indexes on some partitions and not others. In general the only operations that you wouldn't be able to do on an individual partition would be operations that make the column definitions incompatible with the parent table. The $64 question is how to specify the partitioning rules. That is, the rule for determining which partition an insert should go into and which partitions to look for records in. Oracle handles this by specifying a list of columns when creating the partitioned table and then specifying either a range or specific values for each individual partition. I can imagine other approaches but none that allow for the planner and optimizer to take as much advantage of the information. So I think Phase I should look like: An ALTER TABLE command to make an inherited table "abstract" in the object oriented sense. That is, no records can be inserted in the parent table. If you follow the oracle model this is also where you specify the partition key. There's no index associated with this partition key though. A command to create a new partition, essentially syntactic sugar for a CREATE TABLE with an implied INHERITS clause and a constraint on the partition key. If you follow the oracle model then you explicitly specify which range or specific value of the partition key this partition holds. A command to remove a partition from the partitioned table and turn it into a regular table. A command to take a regular table and turn it into a partition. Again here you specify the range or value of the partition key. There has to be some verification that the table really holds the correct data though. Perhaps this could be skipped by providing a table with a properly constructed constraint in place. Magic to make INSERT/UPDATE figure out the correct partition to insert the new record. (Normally I would have suggested that UPDATE wasn't really necessary but in Postgres it seems like it would fall out naturally from having INSERT.) Phase II would be planner and executor improvements to take advantage of the information to speed up queries and allow for individual partitions to be read-only or otherwise inaccessible without impeding queries that don't need that partition. Phase III would be autopilot features like having new partitions automatically created and destroyed and being able to specify in advance rules for determining which tablespaces to use for these new partitions. I'm not sure whether to put global indexes under Phase II or III. Personally I think there's no point to them at all. They defeat the whole point of partitioned tables. Once you have global indexes adding and removing partitions becomes a lot harder and slower. You may as well have kept everything in one table in the first place. But apparently some people find them useful. -- greg
> It would also be reasonable to allow clustering individual partitions; > creating table or column constraints on some partitions and not others; I have a session mamagement which works like that, using views now. sessions.online is a table of the online sessions. It has a UNIQUE on user_id. sessions.archive contains all the closed sessions. Obviously it does not have a UNIQUE on user_id.
Greg Stark <gsstark@mit.edu> writes: > So I think Phase I should look like: > An ALTER TABLE command to make an inherited table "abstract" in the object > oriented sense. That is, no records can be inserted in the parent table. If > you follow the oracle model this is also where you specify the partition > key. There's no index associated with this partition key though. Check. > A command to create a new partition, essentially syntactic sugar for a > CREATE TABLE with an implied INHERITS clause and a constraint on the > partition key. If you follow the oracle model then you explicitly specify > which range or specific value of the partition key this partition holds. Check. > A command to remove a partition from the partitioned table and turn it into > a regular table. Ugh. Why? You can access the table directly anyway. > A command to take a regular table and turn it into a partition. Double ugh. Verifying that the table matches the partition scheme seems like a lot of ugly, bug-prone, unnecessary code. What's the use case for this anyway? Those last two are *certainly* not Phase I requirements, and I don't think we need them at all ever. regards, tom lane
From: "Tom Lane" <tgl@sss.pgh.pa.us> > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > We probably also need multi-table indexes. > As Josh says, that seems antithetical to the main point of partitioning, > which is to be able to rapidly remove (and add) partitions of a table. > If you have to do index cleaning before you can drop a partition, what's > the point of partitioning? Global indexes (as opposed to partition local indexes) are useful in cases where you have a large number of partitions, index columns different than the partition key, and index values that limit the query to just a subset of the partitions. The two domains that I'm most familiar with are warehouse management, and the film industry. In both these cases it's logical to partition on day/week/month, it's frequently important to keep a lot of history, and it's common to have products that only show activity for a few months. In one of our production systems we have 800 partitions (by week, with a lot of history), but a popular product might have only 20 weeks worth of activity. Selecting records for the product requires at least 800 random-access reads if you have local indexes on 'product_no', 780 of which just tell the executor that the partition doesn't include any information on the product. This is definitely a phase II item, but as I said before it's worth considering since good DBAs can do a lot with global indexes. FWIW, we see large benefits from partitioning other than the ability to easily drop data, for example: - We can vacuum only the active portions of a table - Postgres automatically keeps related records clustered together on disk, which makes it more likely that the blocks used by common queries can be found in cache - The query engine uses full table scans on the relevant sections of data, and quickly skips over the irrelevant sections - 'CLUSTER'ing a single partition is likely to be significantly more performant than clustering a large table In fact, we have yet to drop a partition on any of our Oracle or Postgres production systems.
Alvaro, Greg, Tom, > Hmm. You are right, but without that we won't be able to enforce > uniqueness on the partitioned table (we could only enforce it on each > partition, which would mean we can't partition on anything else than > primary keys if the tables have one). IMHO this is something to > consider. Sure. However, for most partitioned use cases, the partition column will be part of the real key of the table (for example, for a security log, the real key might be (timestamp, machine, application, event_type) with the partition on extract(hour from timestamp)). As a result, there is no need to enforce inter-partition uniqueness; the paritioning scheme enforces it already. The only need for inter-partition uniqueness is on surrogate integer keys. This can already be enforced de-facto simply by using a sequence. While it would be possible to create a uniqueness check that spans partitions, it would be very expensive to do so, thus elminating some of the advantage of partitioning in the first place. I'm not saying that we won't want this some day as an option, I just see it as a Phase III refinement. Greg, first of all, thanks for helping clean up my muddy thinking about implementing partitions. Comments below: > Phase I should be an entirely manual system where you add and remove > partitions manually and create and drop indexes you want manually. You need > these low level interfaces anyways for a complete system, it doesn't make > sense to have everything automatic and then later try to wedge in a low > level interface. Only once you have that do you then start offering options > to do these things automatically. This makes sense. Thanks! > whole system make a lot more sense: individual partitions are really > tables. The partitioned tables themselves are just meta-objects like views. So, like the current pseudo-partitioning implementation, partitions would be "full tables" just with some special rules for query-rewriting when they are pulled. This makes sense, I think I just got carried away in another direction. > It's also clear that the last thing you want is an index on the partition > key. A big part of the advantage of partitioned tables is precisely that > you get the advantage of an index on a column without the extra expense. Well, you need it with the current pseudo-partitioning. What would allow us to eliminate indexing the partition key is special re-writing rules that only pull the partitions compliant with the outer query. Until that step takes place, the indexes are very much needed. So maybe the advanced planner rewriting is a Phase I item, not a Phase II item? > The $64 question is how to specify the partitioning rules. That is, the > rule for determining which partition an insert should go into and which > partitions to look for records in. Oracle handles this by specifying a list > of columns when creating the partitioned table and then specifying either a > range or specific values for each individual partition. I can imagine other > approaches but none that allow for the planner and optimizer to take as > much advantage of the information. Well, I would think that specifying an expression that defines a new partition at each change in value (like EXTRACT(day FROM timestamp) on a time-based partitioning) would cover 90% of implemenations and be a lot simpler to administer. The Oracle approach has the advantage of allowing "custom paritioning" at the expense of greater complexity. > A command to remove a partition from the partitioned table and turn it > into a regular table. > > A command to take a regular table and turn it into a partition. Again > here you specify the range or value of the partition key. There has to be > some verification that the table really holds the correct data though. > Perhaps this could be skipped by providing a table with a properly > constructed constraint in place. Like Tom, I don't see the point in these. What do they do that CREATE TABLE AS and/or INSERT INTO do not? -- Josh Berkus Aglio Database Solutions San Francisco
On Sun, 20 Mar 2005, Josh Berkus wrote: > >> whole system make a lot more sense: individual partitions are really >> tables. The partitioned tables themselves are just meta-objects like views. If partition is a table, so I could define different indices for them ? In our prototype of scaled full text search we create another index which is optimized for "archived" (not changed) data - it's sort of standard inverted index which is proven to be scaled, while tsearch2's index is good for "online" data. All interfaces ( dictionaries, parsers, ranking) are the same, so it's possible to combine search results. This is rather easy to implement using table inheritance, but I'd like to do this with partitioning Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Tom Lane <tgl@sss.pgh.pa.us> writes: > > A command to remove a partition from the partitioned table and turn it into > > a regular table. > > Ugh. Why? You can access the table directly anyway. > > > A command to take a regular table and turn it into a partition. > > Double ugh. Verifying that the table matches the partition scheme seems > like a lot of ugly, bug-prone, unnecessary code. What's the use case > for this anyway? > > Those last two are *certainly* not Phase I requirements, and I don't > think we need them at all ever. These are effectively equivalent to "ALTER TABLE RENAME". Without these commands you would be in pretty much the same position as a DBA without the ability to rename tables. The heart of partitioned tables is being able to load and unload entire partitions quickly. You have to have somewhere to "unload" them too. Most people aren't happy just watching their data disappear entirely. They want to move them other tables or even other databases. Similarly, they have to have somewhere to load them from. They're usually not happy loading data directly into their production data warehouse tables without manipulating the data, or doing things like clustering or indexing. You could argue for some sort of setup where you could take a partition "offline" during which you could safely do things like export or manipulate the data. But that's awfully limiting. What if I want to do things like add columns, or change data types, or any other manipulation that breaks the symmetry with the production partitioned table. I don't think it's really hard at all to check that the table matches the partition scheme. You can just require that there be an existing table constraint in place that matches the partitioning scheme. I think you can even be fascist about the exact syntax of the constraint fitting precisely a specified format. -- greg
"Stacy White" <harsh@computer.org> writes: > FWIW, we see large benefits from partitioning other than the ability to > easily drop data, for example: > - We can vacuum only the active portions of a table > - Postgres automatically keeps related records clustered together on disk, > which makes it more likely that the blocks used by common queries can be > found in cache > - The query engine uses full table scans on the relevant sections of data, > and quickly skips over the irrelevant sections > - 'CLUSTER'ing a single partition is likely to be significantly more > performant than clustering a large table Global indexes would seriously reduce the performance of both vacuum and cluster for a single partition, and if you want seq scans you don't need an index for that at all. So the above doesn't strike me as a strong argument for global indexes ... regards, tom lane
Greg Stark <gsstark@mit.edu> writes: > You could argue for some sort of setup where you could take a partition > "offline" during which you could safely do things like export or manipulate > the data. But that's awfully limiting. What if I want to do things like add > columns, or change data types, or any other manipulation that breaks the > symmetry with the production partitioned table. [ scrapes eyebrows off ceiling... ] You don't really expect to be able to do that kind of thing to just one partition do you? regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: > Well, I would think that specifying an expression that defines a new partition > at each change in value (like EXTRACT(day FROM timestamp) on a time-based > partitioning) would cover 90% of implemenations and be a lot simpler to > administer. The Oracle approach has the advantage of allowing "custom > paritioning" at the expense of greater complexity. Hm. This is where I might be less helpful. Once you're submersed in one way of doing things it can be hard to think outside the box like this. But I fear this scheme might be harder to actually take advantage of. If I do a query like WHERE timestamp BETWEEN '2005-01-01 11:00' AND '2005-01-01 12:00' How do you determine which partitions that range will cover? Also, it seems like it would be inconvenient to try to construct expressions to handle things like "start a new partition ever 1 million values". And worse, how would you handle changing schemes with this? Like, say we want to switch from starting one partition per month to starting one partition per week? I think some actual use cases might be helpful for you. I can contribute an interesting one, though I have to be intentionally vague even though I don't work on that system any more. We had a table with a layout like: txnid serial, groupid integer, data... Each day a cron job created 6 new groups (actually later that was changed to some other number). It then added a new partition to handle the range of the new day's groups. Later another cron job exchanged out the partition from a week earlier and exported that table, transfered it to another machine and loaded it there. txnid was a unique identifier but we couldn't have a unique constraint because that would have required a global index. That didn't cause any problems since it was a sequence generated column anyways. We did have a unique index on <groupid,txnid> which is a local index because groupid was the partition key. In reality nothing in our system ever really needed a txn without knowing which group it came from anyways, so it was easy to change our queries to take advantage of this. We had a lot of jobs, some *extremely* performance sensitive that depended on being able to scan the entire list of txns for a given day or a given set of groupids. The partitions meant it could do a full table scan which made these extremely fast. This was with Oracle 8i. All partition keys in 8i were ranges. In 9 Oracle added the ability to make partition reference specific id values. Sort of like how you're describing having a key expression. We might have considered using that scheme with groupid but then it would have meant adding a bunch of new partitions each day and having some queries that would involve scanning multiple partitions. -- Greg
Tom Lane <tgl@sss.pgh.pa.us> writes: > Global indexes would seriously reduce the performance of both vacuum and > cluster for a single partition, and if you want seq scans you don't need > an index for that at all. So the above doesn't strike me as a strong > argument for global indexes ... I think he means some sort of plan for queries like select * from invoices where customer_id = 1 where customer 1 only did business with us for two years. One could imagine some kind of very coarse grained bitmap index that just knows which partitions customer_id=1 appears in, and then does a sequential scan of those partitions. But I think you can do nearly as well without using global indexes of any type. Assuming you had local indexes on customer_id for each partition and separate histograms for each partition the planner could conclude that it needs sequential scans for some partitions and a quick index lookup expecting 0 records for other partitions. Not as good as pruning partitions entirely but if you're doing a sequential scan the performance hit of a few index lookups isn't a problem. -- greg
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > You could argue for some sort of setup where you could take a partition > > "offline" during which you could safely do things like export or manipulate > > the data. But that's awfully limiting. What if I want to do things like add > > columns, or change data types, or any other manipulation that breaks the > > symmetry with the production partitioned table. > > [ scrapes eyebrows off ceiling... ] You don't really expect to be able > to do that kind of thing to just one partition do you? Well no. That's exactly why I would want to pull the partition out of the partitioned table so that I can then do whatever work I need to archive it without affecting the partitioned table. Take an analogous situation. I have a huge log file I want to rotate. The quickest most efficient way to do this would be to move it aside, HUP the daemon (or whatever else I have to do to get it to open a new file) then gzip and archive the old log files. -- greg
From: "Tom Lane" <tgl@sss.pgh.pa.us> > "Stacy White" <harsh@computer.org> writes: > > FWIW, we see large benefits from partitioning other than the ability to > > easily drop data, for example: > > > - We can vacuum only the active portions of a table > > - Postgres automatically keeps related records clustered together on disk, > > which makes it more likely that the blocks used by common queries can be > > found in cache > > - The query engine uses full table scans on the relevant sections of data, > > and quickly skips over the irrelevant sections > > - 'CLUSTER'ing a single partition is likely to be significantly more > > performant than clustering a large table > Global indexes would seriously reduce the performance of both vacuum and > cluster for a single partition, and if you want seq scans you don't need > an index for that at all. So the above doesn't strike me as a strong > argument for global indexes ... Tom, this list was in response to your question "If you have to do index cleaning before you can drop a partition, what's the point of partitioning?". I was trying to make the point that partioning isn't just about being able to quickly drop data. The argument for global indexes came in the form of my war story and the description of the conditions under which global indexes will perform better than local indexes (see my original email for details) . But, like I said, this would definitely be a phase II/III item.
From: "Greg Stark" <gsstark@mit.edu> > Tom Lane <tgl@sss.pgh.pa.us> writes: > Not as good as pruning partitions entirely but if you're doing a sequential > scan the performance hit of a few index lookups isn't a problem. Greg, I think you've got the right idea. For large databases, though, it won't be uncommon to have large numbers of partitions, in which case we're not talking about a few index lookups. The database I used in my example wasn't huge, but the table in question had over 800 partitions. A larger database could have thousands. I suppose the importance of global indexes depends on the sizes of the databases your target audience is running. Here's some more detail on our real-world experience: The group made the decision to partition some of the larger tables for better performance. The idea that global indexes aren't useful is pretty common in the database world, and 2 or 3 good DBAs suggested that the 'product_no' index be local. But with the local indexes, performance on some queries was bad enough that the group actually made the decision to switch back to unpartitioned tables. (The performance problems came about because of the overhead involved in searching >800 indices to find the relevant rows). Luckily they that had the chance to work with a truly fantastic DBA (the author of an Oracle Press performance tuning book even) before they could switch back. He convinced them to make some of their indexes global. Performance dramatically improved (compared with both the unpartitioned schema, and the partitioned-and-locally-indexed schema), and they've since stayed with partitioned tables and a mix of local and global indexes. But once again, I think that global indexes aren't as important as the Phase I items in any of the Phase I/Phase II breakdowns that have been proposed in this thread.
Stacy, > Luckily they that had the chance to work with a truly fantastic DBA (the > author of an Oracle Press performance tuning book even) before they could > switch back. He convinced them to make some of their indexes global. > Performance dramatically improved (compared with both the unpartitioned > schema, and the partitioned-and-locally-indexed schema), and they've since > stayed with partitioned tables and a mix of local and global indexes. Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information on what values are found in what partition also solve this? Without 1/2 of the overhead imposed by global indexes? I can actually see such a bitmap as being universally useful to the partitioning concept ... for one, it would resolve the whole "partition on {value}" issue. -- Josh Berkus Aglio Database Solutions San Francisco
On Mon, Mar 21, 2005 at 09:55:03AM -0800, Josh Berkus wrote: > Stacy, > > > Luckily they that had the chance to work with a truly fantastic DBA (the > > author of an Oracle Press performance tuning book even) before they could > > switch back. He convinced them to make some of their indexes global. > > Performance dramatically improved (compared with both the unpartitioned > > schema, and the partitioned-and-locally-indexed schema), and they've since > > stayed with partitioned tables and a mix of local and global indexes. > > Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information on > what values are found in what partition also solve this? Without 1/2 of > the overhead imposed by global indexes? > > I can actually see such a bitmap as being universally useful to the > partitioning concept ... for one, it would resolve the whole "partition on > {value}" issue. I suspect both will have their uses. I've read quite a bit about global v. local indexs in Oracle, and there are definately cases where global is much better than local. Granted, there's some things with how Oracle handles their catalog, etc. that might make local indexes more expensive for them than they would be for PostgreSQL. It's also not clear how much a 'partition bitmap' index would help. As for the 'seqscan individual partitions' argument, that's not going to work well at all for a case where you need to hit a relatively small percentage of rows in a relatively large number of partitions. SELECT ... WHERE customer_id = 1 would be a good example of such a query (assuming the table is partitioned on something like invoice_date). -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > We probably also need multi-table indexes. > > As Josh says, that seems antithetical to the main point of partitioning, > which is to be able to rapidly remove (and add) partitions of a table. > If you have to do index cleaning before you can drop a partition, what's > the point of partitioning? Why would you need to do index cleaning first? Presumably the code that goes to check a heap tuple that an index pointed at to ensure that it was visible in the current transaction would be able to recognize if the partition that tuple was in had been removed, and just ignore that index entry. Granted, you'd need to clean the index up at some point (presumably via vacuum), but it doesn't need to occur at partition drop time. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
I think Greg's email did a good job of putting this on track. Phase 1 should be manual, low-level type of support. Oracle has had partitioning for years now, and IF they've added automated partition management, it's only happened in 10g which is pretty recent. For inserts that don't currently have a defined partition to fit in, the Oracle model might be better than tossing an error: a partitioned table in Oracle also contains a default partition. Any rows that don't match a defined partition go into the default partition. For many cases you'll never have anything in the default partition, but sometimes you'll have some partition values that occur infrequenttly enough in the table so as not to warrant their own partition. There's also another partitioning application that I think is often overlooked. I have a table with about 130M rows that is 'pseudo-partitioned' by project_id. Right now, there are 5 different project IDs that account for the bulk of those 130M rows. Oracle provides a means to partition on discreet values. When you do this, there's not actually any reason to even store the partition field in the partition tables, since it will be the same for every row in the partition. In my case, since the smallint project ID is being aligned to a 4 byte boundary, having this feature would save ~120M rows * 4 bytes = 480MB in the table. Granted, 480MB isn't anything for today's disk sizes, but it makes a huge difference when you look at it from an I/O standpoint. Knowing that a partition contains only one value of a field or set of fields also means you can drop those fields from local indexes without losing any effectiveness. In my case, I have 2 indexes I could drop project_id from. Does each node in a B-tree index have the full index key? If so, then there would be substantial I/O gains to be had there, as well. Even if each node doesn't store the full key, there could still be faster to handle a narrower index. I realize this might be a more difficult case to support. It probably couldn't be done using inheritance, though I don't know if inheritence or a union view is better for partitioning. In either case, this case might not be a good candidate for phase 1, but I think partitioning should be designed with it in mind. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Sun, 2005-03-20 at 01:14 -0500, Greg Stark wrote: > Josh Berkus <josh@agliodbs.com> writes: > > > -- INSERT INTO should automatically create new partitions where necessary > > ---- new tables should automatically inherit all constraints, indexes, > > keys of "parent" table > > I think you're going about this backwards. Certainly, there are two schools of thought here. I have been in two minds about which those two designs previously, and indeed here which one to support. > Phase I should be an entirely manual system where you add and remove > partitions manually and create and drop indexes you want manually. You need > these low level interfaces anyways for a complete system, it doesn't make > sense to have everything automatic and then later try to wedge in a low level > interface. Only once you have that do you then start offering options to do > these things automatically. Maybe its just me, but ISTM that implementing an automatic system is actually easier to begin with. No commands, no syntax etc. You're right, you need the low level interfaces anyway... > From my experience with Oracle I think there's one big concept that makes the > whole system make a lot more sense: individual partitions are really tables. > The partitioned tables themselves are just meta-objects like views. Hmmm. Oracle provides a very DBA-intensive implementation that as Stacy points out, many people still do not understand. It does work, well. And has many of the wrinkles ironed out, even if not all of them are easy to understand why they exist at first glance. I think it most likely that Phase I should be a simplified blend of both ideas, with a clear view towards minimum impact and implementability, otherwise it may not make the cut for 8.1 Best Regards, Simon Riggs
On E, 2005-03-21 at 09:55 -0800, Josh Berkus wrote: > Stacy, > > > Luckily they that had the chance to work with a truly fantastic DBA (the > > author of an Oracle Press performance tuning book even) before they could > > switch back. He convinced them to make some of their indexes global. > > Performance dramatically improved (compared with both the unpartitioned > > schema, and the partitioned-and-locally-indexed schema), and they've since > > stayed with partitioned tables and a mix of local and global indexes. > > Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information on > what values are found in what partition also solve this? Without 1/2 of > the overhead imposed by global indexes? > > I can actually see such a bitmap as being universally useful to the > partitioning concept ... for one, it would resolve the whole "partition on > {value}" issue. I once (maybe about a year ago) tried to elaborate using bitmap index(es) with page granularity as a tool for simultaneous clustering and lookup for data warehousing using postgres. the main idea was to determine storage location from AND of all "clustered" bitmap indexes and corresponding fast and clustered lookups. This could/should/maybe :) possibly be combined with clustering as well. -- Hannu Krosing <hannu@tm.ee>
On L, 2005-03-19 at 12:02 -0800, Josh Berkus wrote: > Folks, > > I may (or may not) soon have funding for implementing full table partitioning > in PostgreSQL. If you don't get it, contact me as there is a small possibility that I know a company interested enough to fund (some) of it :) > I thought it would be a good idea to discuss with people here > who are already using pseudo-partitioning what things need to be added to > Postgresql in order to make full paritioning a reality; that is, what do > other databases do that we don't? As these are already discussed in this thread, I'll try to outline a method of providing a global index (unique or not) in a way that will still make it possible to quickly remove (and not-quite-so-quickly add) a partition. The structure is inspired by the current way of handling >1Gb tables. As each tid consists of 32 bit page pointer we have pointerspace of 35184372088832 bytes/index (4G of 8k pages). currently this is directly partitioned mapped to 1Gbyte/128kpage files, but we can, with minimal changes to indexes, put a lookup table between index and page lookup. In case of global index over partitions this table could point to 1G subtables from different partition tables. The drop partition table can also be fast - just record the pages in lookup table as deleted - means one change per 1G of dropped table. The next vacuum should free pointers to deleted subfiles. Adding partitions is trickier - If the added table forms part of partitioning index (say names from C to E), and there is a matching index on subtable, Then that part of btree can probably copied into the main btree index as a tree btanch, which should be relatively fast (compared to building it one tid at a time). Else adding the the index could probably also be sped up by some kind of index merge - faster than building from scratch but slower than above. To repeat - the global index over partitioned table should have te same structure as our current b-tree index, only with added map of 128k index partitions to 1G subfiles of (possibly different) tables. This map will be quite small - for 1Tb of data it will be only 1k entries - this will fit in cache on all modern processors and thus should add only tiny slowdown from current direct tid.page/128k method -- Hannu Krosing <hannu@tm.ee>
On P, 2005-03-20 at 00:52 +0100, PFC wrote: > > tableoid would accomplish that already, assuming that the "partitioned > > table" is effectively a view on separate physical tables. > > > > regards, tom lane > > Very good. > > Also note the possibility to mark a partition READ ONLY. Or even a table. > It does not seem very useful but just think that for instance the "1999", > "2000" ... "2004" partitions of a big archive probably never change. > READLONY means we're sure they never change, thus no need to backup them > every time. Keeping the example of some DB arranged by years / current > year / current month, Just backup the "current month" part every day and > the "current year" every month when you switch partitions. > This could be achieved also by storing the time of last modification of a > table somewhere. Would we still need regular VACUUMing of read-only table to avoid OID-wraparound ? -- Hannu Krosing <hannu@tm.ee>
On L, 2005-03-19 at 19:03 -0500, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >>> -- INSERT INTO should automatically create new partitions where necessary > >>> -- DELETE FROM should automatically drop empty partitions > >> > >> I am not sure I agree with either of those, and the reason is that they > >> would turn low-lock operations into high-lock operations. > > > For INSERT, I think that's a problem we need to work through. > > Possibly, but I'm concerned about locking and deadlock issues. The > reason that this is iffy is you would start the operation with only > an INSERT-grade lock, and then discover that you needed to add a > partition, which is surely something that needs an exclusive-grade > lock (consider two sessions trying to add the same partition at the > same time). So I don't see how to do it without lock upgrading, > and lock upgrading is always a recipe for deadlocks. > > The DELETE case is even worse because you can't physically release > storage until you're sure nothing in it is needed anymore by any open > transaction --- that introduces VACUUM-like issues as well as the > deadlock problem. > If we go with my proposal (other post in this thread) of doing most of the partitioning in the level between logical file and physikal 1Gb storage files, then adding a partition should be nearly the same as crossing the 1G boundary is now. removing the partition would be just plain vacuum (if we can make pg shring each 1G subfile independently) > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Hannu Krosing <hannu@tm.ee>
On L, 2005-03-19 at 23:47 -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Hmm. You are right, but without that we won't be able to enforce > > uniqueness on the partitioned table (we could only enforce it on each > > partition, which would mean we can't partition on anything else than > > primary keys if the tables have one). IMHO this is something to > > consider. > > Well, partitioning on the primary key would be Good Enough for 95% or > 99% of the real problems out there. I'm not excited about adding a > large chunk of complexity to cover another few percent. That automatically means that partitioning expression has to be a range over PK. (you dont want to have every tuple in separate tabel :) And it also means that you have to automatically create new partitions. Are you sure that partitioning on anything else than PK would be significantly harder ? I have a case where I do manual partitioning over start_time (timestamp), but the PK is an id from a sequence. They are almost, but not exactly in the same order. And I don't think that moving the PK to be (start_time, id) just because of "partitioning on PK only" would be a good design in any way. So please don't design the system to partition on PK only. -- Hannu Krosing <hannu@tm.ee>
On Mon, Mar 21, 2005 at 08:26:24PM +0200, Hannu Krosing wrote: > On P, 2005-03-20 at 00:52 +0100, PFC wrote: > > Also note the possibility to mark a partition READ ONLY. Or even a table. > Would we still need regular VACUUMing of read-only table to avoid > OID-wraparound ? You could VACUUM FREEZE the table or partition, so you wouldn't need to vacuum it again. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Someone said that it is at least an order of magnitude more work to do production software than a prototype. I think he is wrong by at least an order of magnitude." (Brian Kernighan)
On T, 2005-03-22 at 09:10 -0400, Alvaro Herrera wrote: > On Mon, Mar 21, 2005 at 08:26:24PM +0200, Hannu Krosing wrote: > > On P, 2005-03-20 at 00:52 +0100, PFC wrote: > > > > Also note the possibility to mark a partition READ ONLY. Or even a table. > > > Would we still need regular VACUUMing of read-only table to avoid > > OID-wraparound ? > > You could VACUUM FREEZE the table or partition, so you wouldn't need to > vacuum it again. But when I do just VACUUM; will this know to avoid vacuuming VACUUM FREEZE'd partitions ? Or could this be somehow liked to READ ONLY + VACUUM FREEZE state ? -- Hannu Krosing <hannu@tm.ee>
Hannu, > If you don't get it, contact me as there is a small possibility that I > know a company interested enough to fund (some) of it :) Enough people have been interested in this that if we get our acts together, we may do it as multi-funded. Easier on our budget ... > As these are already discussed in this thread, I'll try to outline a > method of providing a global index (unique or not) in a way that will > still make it possible to quickly remove (and not-quite-so-quickly add) > a partition. <snip> > To repeat - the global index over partitioned table should have te same > structure as our current b-tree index, only with added map of 128k index > partitions to 1G subfiles of (possibly different) tables. This map will > be quite small - for 1Tb of data it will be only 1k entries - this will > fit in cache on all modern processors and thus should add only tiny > slowdown from current direct tid.page/128k method I think this is a cool idea. It would need to be linked to clustering, so that each partition can be an iteration of the clustered index instead of a specifc # of bytes. But it would give us the "fully automated partitioning" which is one fork of the two we want. Plus I'm keen on any idea that presents an alternative to aping Oracle. How difficult would your proposal be to code? -- Josh Berkus Aglio Database Solutions San Francisco
Added to TODO: * Support table partitioning that allows a single table to be stored in subtables that are partitioned based on the primary key or a WHERE clause --------------------------------------------------------------------------- Josh Berkus wrote: > Hannu, > > > If you don't get it, contact me as there is a small possibility that I > > know a company interested enough to fund (some) of it :) > > Enough people have been interested in this that if we get our acts together, > we may do it as multi-funded. Easier on our budget ... > > > As these are already discussed in this thread, I'll try to outline a > > method of providing a global index (unique or not) in a way that will > > still make it possible to quickly remove (and not-quite-so-quickly add) > > a partition. > <snip> > > To repeat - the global index over partitioned table should have te same > > structure as our current b-tree index, only with added map of 128k index > > partitions to 1G subfiles of (possibly different) tables. This map will > > be quite small - for 1Tb of data it will be only 1k entries - this will > > fit in cache on all modern processors and thus should add only tiny > > slowdown from current direct tid.page/128k method > > I think this is a cool idea. It would need to be linked to clustering, so > that each partition can be an iteration of the clustered index instead of a > specifc # of bytes. But it would give us the "fully automated partitioning" > which is one fork of the two we want. > > Plus I'm keen on any idea that presents an alternative to aping Oracle. > > How difficult would your proposal be to code? > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On March 21, 2005 8:07 AM, Hannu Krosing wrote: > On L, 2005-03-19 at 23:47 -0500, Tom Lane wrote: > > Well, partitioning on the primary key would be Good Enough for 95% or > > 99% of the real problems out there. I'm not excited about adding a > > large chunk of complexity to cover another few percent. > > Are you sure that partitioning on anything else than PK would be > significantly harder ? > > I have a case where I do manual partitioning over start_time > (timestamp), but the PK is an id from a sequence. They are almost, but > not exactly in the same order. And I don't think that moving the PK to > be (start_time, id) just because of "partitioning on PK only" would be a > good design in any way. > > So please don't design the system to partition on PK only. I agree. I have used table partitioning to implement pseudo-partitioning, and I am very pleased with the results so far.Real partitioning would be even better, but I am partitioning by timestamp, and this is not the PK, and I don't wishto make it one. -Roger
Hi, On Sun, Mar 20, 2005 at 06:01:49PM -0500, Tom Lane wrote: > Global indexes would seriously reduce the performance of both vacuum and > cluster for a single partition, and if you want seq scans you don't need > an index for that at all. So the above doesn't strike me as a strong > argument for global indexes ... I'd like to describe a usecase where a global index is usefull. We have a datawarehouse with invoices for a rolling window of a few years. Each invoice has several positions so a uk is (invoice,position). Dur to the fact that most of the queries are only on a few months or some quarters of a year, our pk starts with the time-attribute (followed by the dimension ids) which is the partition key (range). During the nightly update, we receive each updated invoice so we have to update that special (global unique) row which is resolved very fast by using the uk. So you can see, that there is a usefull case for providing a global index while using partitining and local indexes as well. Regards, Yann