Thread: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
just remembering that -patches is a dead list, so i'm sending this to -hackers where it will have more visibility... ---------- Forwarded message ---------- From: Jaime Casanova <jcasanov@systemguards.com.ec> Date: Oct 22, 2008 9:43 AM Subject: Re: [PATCHES] Auto Partitioning Patch - WIP version 1 To: Nikhil Sontakke <nikhil.sontakke@enterprisedb.com> Cc: Bruce Momjian <bruce@momjian.us>, NikhilS <nikkhils@gmail.com>, Simon Riggs <simon@2ndquadrant.com>, pgsql-patches@postgresql.org On 10/22/08, Nikhil Sontakke <nikhil.sontakke@enterprisedb.com> wrote: > Hi, > > > > > > >> Thanks for taking a look. But if I am not mistaken Gavin and co. are > working > > >> on a much exhaustive proposal. In light of that maybe this patch might > not > > >> be needed in the first place? > > >> > > >> I will wait for discussion and a subsequent collective consensus here, > > >> before deciding the further course of actions. > > > > > > I think it is unwise to wait on Gavin for a more complex implemention > > > --- we might end up with nothing for 8.4. As long as your syntax is > > > compatible with whatever Gavin proposed Gavin can add on to your patch > > > once it is applied. > > > > > > > seems like you're a prophet... or i miss something? > > > > :) > > Maybe I will try to summarize the functionality of this patch, rebase it > against latest CVS head and try to get it on the commitfest queue atleast > for further feedback to keep the ball rolling on auto-partitioning... > yeah! i was thinking on doing that but still have no time... and frankly you're the best man for the job ;) one thing i was thinking of is to use triggers instead of rules just as our current docs recommends http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html with the benefit that a trigger can check if the child table exists for the range being inserted and if not it can create it first... haven't looked at the code in the detail but seems that your patch is still missing the "create rule" part so we are in time to change that... no? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Another advantage of triggers over rules is that it would work with COPY which is probably a desired feature. Emmanuel Jaime Casanova wrote: > just remembering that -patches is a dead list, so i'm sending this to > -hackers where it will have more visibility... > > ---------- Forwarded message ---------- > From: Jaime Casanova <jcasanov@systemguards.com.ec> > Date: Oct 22, 2008 9:43 AM > Subject: Re: [PATCHES] Auto Partitioning Patch - WIP version 1 > To: Nikhil Sontakke <nikhil.sontakke@enterprisedb.com> > Cc: Bruce Momjian <bruce@momjian.us>, NikhilS <nikkhils@gmail.com>, > Simon Riggs <simon@2ndquadrant.com>, pgsql-patches@postgresql.org > > > On 10/22/08, Nikhil Sontakke <nikhil.sontakke@enterprisedb.com> wrote: > >> Hi, >> >> >>>>> Thanks for taking a look. But if I am not mistaken Gavin and co. are >>>>> >> working >> >>>>> on a much exhaustive proposal. In light of that maybe this patch might >>>>> >> not >> >>>>> be needed in the first place? >>>>> >>>>> I will wait for discussion and a subsequent collective consensus here, >>>>> before deciding the further course of actions. >>>>> >>>> I think it is unwise to wait on Gavin for a more complex implemention >>>> --- we might end up with nothing for 8.4. As long as your syntax is >>>> compatible with whatever Gavin proposed Gavin can add on to your patch >>>> once it is applied. >>>> >>>> >>> seems like you're a prophet... or i miss something? >>> >>> >> :) >> >> Maybe I will try to summarize the functionality of this patch, rebase it >> against latest CVS head and try to get it on the commitfest queue atleast >> for further feedback to keep the ball rolling on auto-partitioning... >> >> > > yeah! i was thinking on doing that but still have no time... and > frankly you're the best man for the job ;) > > one thing i was thinking of is to use triggers instead of rules just > as our current docs recommends > http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html > > with the benefit that a trigger can check if the child table exists > for the range being inserted and if not it can create it first... > haven't looked at the code in the detail but seems that your patch is > still missing the "create rule" part so we are in time to change > that... no? > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet
Hi,
http://www.enterprisedb.com
On Wed, Oct 22, 2008 at 8:14 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
Yes triggers should be used instead of rules. Automatic generation of rules/triggers would be kind of hard and needs some looking into. Also there are issues like checking mutual exclusivity of the partition clauses specified too (I have been maintaining that the onus of ensuring sane partition ranges/clauses should rest with the users atleast initially..).
I will take a stab at this again whenever I get some free cycles.
Regards,
Nikhils
-- just remembering that -patches is a dead list, so i'm sending this to
-hackers where it will have more visibility...
---------- Forwarded message ----------
On 10/22/08, Nikhil Sontakke <nikhil.sontakke@enterprisedb.com> wrote:
> Hi,
>
> > >
> > >> Thanks for taking a look. But if I am not mistaken Gavin and co. are
> working
> > >> on a much exhaustive proposal. In light of that maybe this patch might
> not
> > >> be needed in the first place?
> > >>
> > >> I will wait for discussion and a subsequent collective consensus here,
> > >> before deciding the further course of actions.
> > >
> > > I think it is unwise to wait on Gavin for a more complex implemention
> > > --- we might end up with nothing for 8.4. As long as your syntax is
> > > compatible with whatever Gavin proposed Gavin can add on to your patch
> > > once it is applied.
> > >
> >
> > seems like you're a prophet... or i miss something?
> >
>
> :)
>
> Maybe I will try to summarize the functionality of this patch, rebase it
> against latest CVS head and try to get it on the commitfest queue atleast
> for further feedback to keep the ball rolling on auto-partitioning...
>
yeah! i was thinking on doing that but still have no time... and
frankly you're the best man for the job ;)
one thing i was thinking of is to use triggers instead of rules just
as our current docs recommends
http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
with the benefit that a trigger can check if the child table exists
for the range being inserted and if not it can create it first...
haven't looked at the code in the detail but seems that your patch is
still missing the "create rule" part so we are in time to change
that... no?
Yes triggers should be used instead of rules. Automatic generation of rules/triggers would be kind of hard and needs some looking into. Also there are issues like checking mutual exclusivity of the partition clauses specified too (I have been maintaining that the onus of ensuring sane partition ranges/clauses should rest with the users atleast initially..).
I will take a stab at this again whenever I get some free cycles.
Regards,
Nikhils
http://www.enterprisedb.com
Hi,
> > >
> > >> Thanks for taking a look. But if I am not mistaken Gavin and co. are
> working
> > >> on a much exhaustive proposal. In light of that maybe this patch might
> not
> > >> be needed in the first place?
> > >>
> > >> I will wait for discussion and a subsequent collective consensus here,
> > >> before deciding the further course of actions.
> > >
> > > I think it is unwise to wait on Gavin for a more complex implemention
> > > --- we might end up with nothing for 8.4. As long as your syntax is
> > > compatible with whatever Gavin proposed Gavin can add on to your patch
> > > once it is applied.
> > >
> >
> > seems like you're a prophet... or i miss something?
> >
>
> :)
>
> Maybe I will try to summarize the functionality of this patch, rebase it
> against latest CVS head and try to get it on the commitfest queue atleast
> for further feedback to keep the ball rolling on auto-partitioning...
>
yeah! i was thinking on doing that but still have no time... and
frankly you're the best man for the job ;)
one thing i was thinking of is to use triggers instead of rules just
as our current docs recommends
http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
with the benefit that a trigger can check if the child table exists
for the range being inserted and if not it can create it first...
haven't looked at the code in the detail but seems that your patch is
still missing the "create rule" part so we are in time to change
that... no?
Yes triggers should be used instead of rules. Automatic generation of rules/triggers would be kind of hard and needs some looking into. Also there are issues like checking mutual exclusivity of the partition clauses specified too (I have been maintaining that the onus of ensuring sane partition ranges/clauses should rest with the users atleast initially..).
I will take a stab at this again whenever I get some free cycles.
I have synced up and modified the patch against latest CVS sources. Am attaching the latest WIP patch here.
Am restating that its a WIP patch, more so because we really need feedback on this before trying to expend any energy trying to come up with a commit-able patch.
As per me, the syntax introduced by this patch should be similar to what was proposed by Gavin quite a while back and this patch essentially tries to bring together a bunch of ddl that would otherwise have been performed step-by-step in a manual fashion earlier. To summarize this patch provides a one-shot mechanism to:
-- * create master table
-- * create several child tables that inherit from this master table
-- * add appropriate constraints to each of the child tables
-- * create a trigger function to redirect insert, updates, deletes to
-- appropriate child tables (plpgsql language)
-- * create the trigger using the trigger function
I have created a new file (src/test/regress/sql/partition.sql) to show a couple of examples of the grammar and the working functionality:
There are TODOs like:
-- logic to ensure unique trigger function and trigger names
-- The trigger function body could raise an exception if the insert/update/delete operation does not fit into any single partition
-- logic to check mutual exclusivity of ranges/lists
-- misc. issues to convert it from wip to commit-ready
If we think this is ok as a first step towards auto-partitioning then we can do something more with this patch.
Regards,
Nikhils
http://www.enterprisedb.com
Attachment
Hi Nikhil, Here are a couple of questions: - How do you ALTER the table to repartition it? - The trigger function for inserts could be improved by using ELSE instead of independent IFs. This would ensure that the row is inserted in at most 1 partition. The last ELSE should raise an exception if there was no match (that would solve point 2 of your TODO list). - Another option is to have a separate trigger per child table and chain them to the master table. For example something like: CREATE OR REPLACE FUNCTION child_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF (NEW.date >= DATE(TG_ARGV[1]) AND NEW.date < DATE(TG_ARGV[2]) ) THEN INSERT INTO TG_ARGV[0] VALUES (NEW.*); RETURN NULL; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS insert_child_trigger ON master; CREATE TRIGGER insert_child_trigger_y2008m01 BEFORE INSERT ON master FOR EACH ROW EXECUTE PROCEDURE child_insert_trigger_date(child_y2008m01, '2008-01-01', '2008-02-01'); CREATE TRIGGER insert_child_trigger_y2008m02 BEFORE INSERT ON master FOR EACH ROW EXECUTE PROCEDURE child_insert_trigger_date(child_y2008m02, '2008-02-01', '2008-03-01'); CREATE TRIGGER insert_child_trigger_y2008m03 BEFORE INSERT ON master FOR EACH ROW EXECUTE PROCEDURE child_insert_trigger_date(child_y2008m03, '2008-03-01', '2008-04-01'); This might make it easier when you want to alter a specific partition rather than rewriting the whole trigger. Performance-wise, I am not sure how chained triggers will compare to the big if/then/else trigger. - In the case of an insert, could it be possible to avoid the cost of a new INSERT statement (parser, planner, executor, etc...) by moving directly the tuple in the right table like the COPY code does? If we had an INSERT trigger code in C, given a HeapTuple and a target Relation we should be able to call heap_insert_tuple directly, with no parsing, planning, etc. required. Thanks for your time, Emmanuel > Hi, > > > > > > > > > >> Thanks for taking a look. But if I am not mistaken > Gavin and co. are > > working > > > >> on a much exhaustive proposal. In light of that maybe > this patch might > > not > > > >> be needed in the first place? > > > >> > > > >> I will wait for discussion and a subsequent collective > consensus here, > > > >> before deciding the further course of actions. > > > > > > > > I think it is unwise to wait on Gavin for a more complex > implemention > > > > --- we might end up with nothing for 8.4. As long as > your syntax is > > > > compatible with whatever Gavin proposed Gavin can add on > to your patch > > > > once it is applied. > > > > > > > > > > seems like you're a prophet... or i miss something? > > > > > > > :) > > > > Maybe I will try to summarize the functionality of this > patch, rebase it > > against latest CVS head and try to get it on the commitfest > queue atleast > > for further feedback to keep the ball rolling on > auto-partitioning... > > > > yeah! i was thinking on doing that but still have no time... and > frankly you're the best man for the job ;) > > one thing i was thinking of is to use triggers instead of > rules just > as our current docs recommends > http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html > > with the benefit that a trigger can check if the child table > exists > for the range being inserted and if not it can create it first... > haven't looked at the code in the detail but seems that your > patch is > still missing the "create rule" part so we are in time to change > that... no? > > > Yes triggers should be used instead of rules. Automatic generation > of rules/triggers would be kind of hard and needs some looking > into. Also there are issues like checking mutual exclusivity of > the partition clauses specified too (I have been maintaining that > the onus of ensuring sane partition ranges/clauses should rest > with the users atleast initially..). > > I will take a stab at this again whenever I get some free cycles. > > > I have synced up and modified the patch against latest CVS sources. Am > attaching the latest WIP patch here. > > Am restating that its a WIP patch, more so because we really need > feedback on this before trying to expend any energy trying to come up > with a commit-able patch. > > As per me, the syntax introduced by this patch should be similar to > what was proposed by Gavin quite a while back and this patch > essentially tries to bring together a bunch of ddl that would > otherwise have been performed step-by-step in a manual fashion > earlier. To summarize this patch provides a one-shot mechanism to: > > -- * create master table > -- * create several child tables that inherit from this master table > -- * add appropriate constraints to each of the child tables > -- * create a trigger function to redirect insert, updates, deletes to > -- appropriate child tables (plpgsql language) > -- * create the trigger using the trigger function > > I have created a new file (src/test/regress/sql/partition.sql) to show > a couple of examples of the grammar and the working functionality: > > There are TODOs like: > -- logic to ensure unique trigger function and trigger names > -- The trigger function body could raise an exception if the > insert/update/delete operation does not fit into any single partition > -- logic to check mutual exclusivity of ranges/lists > -- misc. issues to convert it from wip to commit-ready > > If we think this is ok as a first step towards auto-partitioning then > we can do something more with this patch. > > Regards, > Nikhils > -- > http://www.enterprisedb.com > ------------------------------------------------------------------------ > > > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet
On Fri, Oct 31, 2008 at 7:42 PM, Emmanuel Cecchet <manu@frogthinker.org> wrote: > Hi Nikhil, > i'm looking at this one: http://archives.postgresql.org/message-id/a301bfd90810310750pf108c69x36499546f406650f@mail.gmail.com > Here are a couple of questions: > - How do you ALTER the table to repartition it? fair question. but the patch was advertized to only automate some tasks that we do manually... so keeping the same limitations seems reasonably to me... > - Another option is to have a separate trigger per child table and chain > them to the master table. For example something like: that sounds like a lot of overhead... --------------- Now, about the patch... - seems strange the need to create plpgsql language before we can create any partitioned table but given that the trigger is a plpgsql function (and a c function can't be used because we could need to add new partitions) it seems necesary... ideas? - the update part of the trigger looks very simplistic... if the new values isn't in the range accepted by the partition it errors out because of the check constraint... can't we be a little smarter, delete from the actual partition and insert in the new one... for the rest, the patch passes all regression tests and seems to work as advertized.... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Wed, Nov 5, 2008 at 11:47 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Fri, Oct 31, 2008 at 7:42 PM, Emmanuel Cecchet <manu@frogthinker.org> wrote: >> Hi Nikhil, >> > > i'm looking at this one: > http://archives.postgresql.org/message-id/a301bfd90810310750pf108c69x36499546f406650f@mail.gmail.com > 'cause the great interest this one has (i'm being ironic, just in case ;) can we safely say this was returned with feedback and remove it from the list of pending patches? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
> 'cause the great interest this one has (i'm being ironic, just in case > ;) can we safely say this was returned with feedback and remove it > from the list of pending patches? Um... are you referring to lack of interest from the patch author, or from the community? If the patch author is no longer interested in the patch, of course it should be withdrawn. But as for the community, the patch is on the commitfest wiki[1] and you are listed as the reviewer, so I wouldn't necessarily expect anyone else to comment at this point - although, in fact, Emmanuel Cecchet wrote in as well, so I would say you have exactly the opposite of a lack of interest. If you think the patch needs further review from another reviewer, say so. I'm sure someone else can be assigned to do an additional review. If you think the patch is ready to commit, say so, and update the wiki accordingly. ...Robert [1] http://wiki.postgresql.org/wiki/CommitFest_2008-11
On Wed, Nov 26, 2008 at 10:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> 'cause the great interest this one has (i'm being ironic, just in case >> ;) can we safely say this was returned with feedback and remove it >> from the list of pending patches? > > the patch is on the > commitfest wiki[1] and you are listed as the reviewer, so I wouldn't > necessarily expect anyone else to comment at this point - although, in > fact, Emmanuel Cecchet wrote in as well, so I would say you have > exactly the opposite of a lack of interest. > i review it on nov 6, and there were open questions by me and by Emmanuel none of those has been answered: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00362.php -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
> i review it on nov 6, and there were open questions by me and by > Emmanuel none of those has been answered: > http://archives.postgresql.org/pgsql-hackers/2008-11/msg00362.php Hmm, there's only one actual question in that email, which is a request for ideas about PL/pgsql vs. C. I suspect you didn't get any responses because the rest of the email seems to indicate that the patch is not very mature at this point: for example, being able to handle updates that move rows between partitions would seem to me to be an essential feature for a project of this type, even though there are many practical scenarios were it's unimportant. Likewise, being able to repartition sounds important. With respect to the specific question about PL/pgsql vs C, I suspect it's very unlikely that any patch of this type that relies on PL/pgsql being loaded would be accepted into core. However, it's possible that a useful contrib module or pgfoundry project could be spawned on that basis, and that might be a good place to start. I think having a useful toolkit, or a core language feature, that supports table partitioning would be awesome and would find very broad application... but it sounds like there is quite a bit of work left to be done to get there. ...Robert
Hi,
This patch does introduce some basic syntax to help create partitions.
The status has always being WIP, because what has not happened is that we have not had consensus on whether this is a logical first baby step ahead with partitioning. I haven't seen core members commenting on whether trying to aggregate the current set of manual operations together via this approach is worth spending further efforts, to get it into commitable shape.
To summarize, the community should decide if this is indeed the first step ahead.
Regards,
Nikhils
--
http://www.enterprisedb.com
i review it on nov 6, and there were open questions by me and byHmm, there's only one actual question in that email, which is a
> Emmanuel none of those has been answered:
> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00362.php
request for ideas about PL/pgsql vs. C. I suspect you didn't get any
responses because the rest of the email seems to indicate that the
patch is not very mature at this point: for example, being able to
handle updates that move rows between partitions would seem to me to
be an essential feature for a project of this type, even though there
are many practical scenarios were it's unimportant. Likewise, being
able to repartition sounds important.
With respect to the specific question about PL/pgsql vs C, I suspect
it's very unlikely that any patch of this type that relies on PL/pgsql
being loaded would be accepted into core. However, it's possible that
a useful contrib module or pgfoundry project could be spawned on that
basis, and that might be a good place to start.
I think having a useful toolkit, or a core language feature, that
supports table partitioning would be awesome and would find very broad
application... but it sounds like there is quite a bit of work left
to be done to get there.
This patch does introduce some basic syntax to help create partitions.
The status has always being WIP, because what has not happened is that we have not had consensus on whether this is a logical first baby step ahead with partitioning. I haven't seen core members commenting on whether trying to aggregate the current set of manual operations together via this approach is worth spending further efforts, to get it into commitable shape.
Regards,
Nikhils
--
http://www.enterprisedb.com
Nikhil Sontakke escribió: > The status has always being WIP, because what has not happened is that we > have not had consensus on whether this is a logical first baby step ahead > with partitioning. I haven't seen core members commenting on whether trying > to aggregate the current set of manual operations together via this approach > is worth spending further efforts, to get it into commitable shape. There was a lenghty, interesting discussion about this topic in the developer meeting in Ottawa. http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#Partitioning_Roadmap -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Nov 27, 2008 at 7:04 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Nikhil Sontakke escribió: > >> The status has always being WIP, because what has not happened is that we >> have not had consensus on whether this is a logical first baby step ahead >> with partitioning. I haven't seen core members commenting on whether trying >> to aggregate the current set of manual operations together via this approach >> is worth spending further efforts, to get it into commitable shape. > > There was a lenghty, interesting discussion about this topic in the > developer meeting in Ottawa. > http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#Partitioning_Roadmap Interesting - too bad there aren't some mode detailed notes. The semantics of PARTITION ON (<expr>) are unclear to me. I was thinking maybe it would make sense to do something like: CREATE PARTITION <name> ON <table> WHERE <expr> Then you could: CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date BETWEEN '2008-11-01' AND '2008-11-30'; I like the idea of using table inheritance as a foundation for this feature, but I think it's not going to be very useful for real-world applications without cross-table indexes. Suppose for example that I have five years worth of data (thus, 60 partitions) and each transaction has a unique identifier of some sort that is unrelated to the date. It's bad enough that a query like this has to check every partition: SELECT * FROM transaction WHERE uuid = ? What's even worse (at least IMHO) is that there's no way to use transaction (uuid) as a reference for a foreign key. ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date > BETWEEN '2008-11-01' AND '2008-11-30'; I think the main advantage to a better partitioning method would be teaching Postgres about the partition key. Instead of a collection of different constraints Postgres would know that "record_date" is *always* the partition key. So it wouldn't have to be specified every time you declare a partition. > I like the idea of using table inheritance as a foundation for this > feature, but I think it's not going to be very useful for real-world > applications without cross-table indexes. Well we could add support for cross-table indexes. It's not hard from the point of low level implementation -- just include the table oid in the index pointers. Figuring out how to represent such a thing at the index description point of view would be quite tricky though. *But*... in practice I would suggest that cross-table indexes are actually very rarely useful. Having them defeats much of the advantage of partitioning in the first place. Suddenly you would not be able to instantly drop and load whole partitions. They're a big check-list item that people want to have before they partition in case they need them but then they find out that the down-sides of actually using them makes them quite useless. Postgres's current architecture actually has a big advantage over more methodical partitioning methods in this case. You can always add additional constraints on other columns even if they aren't the "real" partitioning key. So for example if you partition the invoice table by month once you close the books for a previous month you can add a constraint WHERE invoice_id < 'xxx'. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Hi,
The discussion is indeed interesting. But again the notes do not indicate any broad consensus on the roadmap :).
The current inheritance based mechanism has its pros-cons and there seem to be a multitude of requests/expectations around partitioning from different quarters. Even basic consensus about the syntax is missing. What we need is a step-by-step approach (starting with fixing up the syntax - if it can be done like that) and working our way downwards towards the underlying representation/planning for partitions...
Regards,
Nikhils
--
http://www.enterprisedb.com
There was a lenghty, interesting discussion about this topic in the
> The status has always being WIP, because what has not happened is that we
> have not had consensus on whether this is a logical first baby step ahead
> with partitioning. I haven't seen core members commenting on whether trying
> to aggregate the current set of manual operations together via this approach
> is worth spending further efforts, to get it into commitable shape.
developer meeting in Ottawa.
http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#Partitioning_Roadmap
The discussion is indeed interesting. But again the notes do not indicate any broad consensus on the roadmap :).
The current inheritance based mechanism has its pros-cons and there seem to be a multitude of requests/expectations around partitioning from different quarters. Even basic consensus about the syntax is missing. What we need is a step-by-step approach (starting with fixing up the syntax - if it can be done like that) and working our way downwards towards the underlying representation/planning for partitions...
Regards,
Nikhils
http://www.enterprisedb.com
On Thu, Nov 27, 2008 at 8:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > The semantics of PARTITION ON (<expr>) are unclear to me. I was > thinking maybe it would make sense to do something like: > > CREATE PARTITION <name> ON <table> WHERE <expr> > At first look seems nice but s Gregory said the ideal would be to identify the key partition. > > I like the idea of using table inheritance as a foundation for this > feature, but I think it's not going to be very useful for real-world > applications without cross-table indexes. Suppose for example that I > have five years worth of data (thus, 60 partitions) and each > transaction has a unique identifier of some sort that is unrelated to > the date. It's bad enough that a query like this has to check every > partition: > you haven't. the WHERE clause in your hipotetical CREATE PARTITION should create a check constraint on the child (inherited) table and if you have constraint_exclusion to on you will check just the partition(s) that match with the check constraint. > > What's even worse (at least IMHO) is that there's no way to use > transaction (uuid) as a reference for a foreign key. > not directly, but you always can create a trigger instead of the foreign key constraint... mmm...the docs says that there is no good workaround, what about mention a trigger? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark@enterprisedb.com> wrote: >> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date >> BETWEEN '2008-11-01' AND '2008-11-30'; > > I think the main advantage to a better partitioning method would be teaching > Postgres about the partition key. Instead of a collection of different > constraints Postgres would know that "record_date" is *always* the partition > key. So it wouldn't have to be specified every time you declare a partition. Hmm... I thought the main advantage would be that you wouldn't have to manually add constraints to all of the child tables, and you wouldn't have to manually add rules/triggers to the parent table to redirect DML operations. What do you see as the advantage of pre-declaring record_date as the partition key? The major advantage I can think of is that it should simplify constraint exclusion calculations considerably. Also, you can easily enforce that partitions are non-overlapping. The disadvantage is that you can't support more complex partitioning schemes that can't be expressed in terms of ranges on a single key (an obvious case is when you want to partition by date AND transaction type, though that could probably be made to work if you allow specifying multiple partition keys; less tractable cases are imaginable). I guess we could decide we don't care about the more complex scenarios. Or we could offer: CREATE TABLE (...) WITH PARTITIONING; -- ad-hoc partitioning CREATE TABLE (...) WITH PARTITIONING ON (...); -- partition keys must be non-overlapping slices based only on the given columns > *But*... in practice I would suggest that cross-table indexes are actually > very rarely useful. Having them defeats much of the advantage of partitioning > in the first place. Suddenly you would not be able to instantly drop and load > whole partitions. They're a big check-list item that people want to have > before they partition in case they need them but then they find out that the > down-sides of actually using them makes them quite useless. That's possible. My every attempt to use inheritance has been stymied by lack of this feature, but my attempts may not be representative. In any case, the projects are severable. > Postgres's current architecture actually has a big advantage over more > methodical partitioning methods in this case. You can always add additional > constraints on other columns even if they aren't the "real" partitioning key. > So for example if you partition the invoice table by month once you close the > books for a previous month you can add a constraint WHERE invoice_id < 'xxx'. That's cool. ...Robert
>> I like the idea of using table inheritance as a foundation for this >> feature, but I think it's not going to be very useful for real-world >> applications without cross-table indexes. Suppose for example that I >> have five years worth of data (thus, 60 partitions) and each >> transaction has a unique identifier of some sort that is unrelated to >> the date. It's bad enough that a query like this has to check every >> partition: > you haven't. the WHERE clause in your hipotetical CREATE PARTITION > should create a check constraint on the child (inherited) table and if > you have constraint_exclusion to on you will check just the > partition(s) that match with the check constraint. The problem is that constraint exclusion will not be able to exclude anything for queries unrelated to the partition key. If my transactions are identified by UUIDs or similar, there's no way to predict which table will contain any particular value. You end up having to scan them all, and even if they all have individual indices on the column in question, that's still 60 index scans instead of 1. >> What's even worse (at least IMHO) is that there's no way to use >> transaction (uuid) as a reference for a foreign key. > not directly, but you always can create a trigger instead of the > foreign key constraint... > mmm...the docs says that there is no good workaround, what about > mention a trigger? I think it's pretty hard to make this bulletproof. I think the triggers that enforce ordinary foreign key constraints contain some magical cross-checks on transaction commit that can't easily be emulated by user-written triggers. In any case, it's a long way from "Oh, yeah, that just works." ...Robert
On Thu, Nov 27, 2008 at 9:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark@enterprisedb.com> wrote: >>> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date >>> BETWEEN '2008-11-01' AND '2008-11-30'; >> >> I think the main advantage to a better partitioning method would be teaching >> Postgres about the partition key. Instead of a collection of different >> constraints Postgres would know that "record_date" is *always* the partition >> key. So it wouldn't have to be specified every time you declare a partition. > > Hmm... I thought the main advantage would be that you wouldn't have > to manually add constraints to all of the child tables, and you > wouldn't have to manually add rules/triggers to the parent table to > redirect DML operations. > ok. what about let CREATE TABLE WITH PARTITIONING to create an entry in a catalog indicating the key of the partition and install the triggers and let the trigger decide if it has the partition to insert the new row (making UPDATE working almost as DELETE+INSERT if it needs to change of partitions) or create the new partition maybe with an apropiate CREATE PARTITION... that way i don't need to create triggers nor inherit tables manually... and because of that maybe we can make possible to add <expr> as partition key... PS: i'm against using CREATE TABLE because we are inventing new syntax but it seems like using ALTER TABLE is a *lot* of work altough ISTM more usefull -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Thu, Nov 27, 2008 at 10:10 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Thu, Nov 27, 2008 at 9:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark@enterprisedb.com> wrote: >>>> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date >>>> BETWEEN '2008-11-01' AND '2008-11-30'; >>> >>> I think the main advantage to a better partitioning method would be teaching >>> Postgres about the partition key. Instead of a collection of different >>> constraints Postgres would know that "record_date" is *always* the partition >>> key. So it wouldn't have to be specified every time you declare a partition. >> >> Hmm... I thought the main advantage would be that you wouldn't have >> to manually add constraints to all of the child tables, and you >> wouldn't have to manually add rules/triggers to the parent table to >> redirect DML operations. >> > > ok. what about let CREATE TABLE WITH PARTITIONING to create an entry > in a catalog indicating the key of the partition and install the > triggers and let the trigger decide if it has the partition to insert > the new row (making UPDATE working almost as DELETE+INSERT if it needs > to change of partitions) or create the new partition maybe with an > apropiate CREATE PARTITION... > i thik i have to clarify this... i intend to say that, the trigger will insert or create the partition and insert... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Hi all, I have been following that discussion very closely but it seems that we are debating solutions without a good specification of the problem/requirements. I would suggest that we collect all the partitioning requirements on a dedicated Wiki page. There might not be a one size fits it all solution for all requirements. We can also look at what other databases are proposing to address these issues. If we can prioritize features, that should also allow us to stage the partitioning implementation. I have a prototype insert trigger in C that directly move inserts in a master table to the appropriate child table (directly moving the tuple). Let me know if anyone is interested. Emmanuel Jaime Casanova wrote: > On Thu, Nov 27, 2008 at 10:10 AM, Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: > >> On Thu, Nov 27, 2008 at 9:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> >>> On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark@enterprisedb.com> wrote: >>> >>>>> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date >>>>> BETWEEN '2008-11-01' AND '2008-11-30'; >>>>> >>>> I think the main advantage to a better partitioning method would be teaching >>>> Postgres about the partition key. Instead of a collection of different >>>> constraints Postgres would know that "record_date" is *always* the partition >>>> key. So it wouldn't have to be specified every time you declare a partition. >>>> >>> Hmm... I thought the main advantage would be that you wouldn't have >>> to manually add constraints to all of the child tables, and you >>> wouldn't have to manually add rules/triggers to the parent table to >>> redirect DML operations. >>> >>> >> ok. what about let CREATE TABLE WITH PARTITIONING to create an entry >> in a catalog indicating the key of the partition and install the >> triggers and let the trigger decide if it has the partition to insert >> the new row (making UPDATE working almost as DELETE+INSERT if it needs >> to change of partitions) or create the new partition maybe with an >> apropiate CREATE PARTITION... >> >> > > i thik i have to clarify this... > > i intend to say that, the trigger will insert or create the partition > and insert... > > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet
> ok. what about let CREATE TABLE WITH PARTITIONING to create an entry > in a catalog indicating the key of the partition and install the > triggers and let the trigger decide if it has the partition to insert > the new row (making UPDATE working almost as DELETE+INSERT if it needs > to change of partitions) or create the new partition maybe with an > apropiate CREATE PARTITION... > > that way i don't need to create triggers nor inherit tables > manually... and because of that maybe we can make possible to add > <expr> as partition key... > > > PS: i'm against using CREATE TABLE because we are inventing new syntax > but it seems like using ALTER TABLE is a *lot* of work altough ISTM > more usefull I think that's one of the useful things that could be done in this area (not the only one, certainly), but I don't think we've defined the semantics well enough to start talking about exactly which commands to use. As to CREATE TABLE and ALTER TABLE, I suspect you'll need both. We have to come to some consensus on whether predefining a partition key is necessary, optional, or not supported. And we need to define ways both to set things up and to change them later. If there is no predefined partition key, there's probably nothing terribly special that needs to be done to prepare a table for partitioning. You could decide that all the data will live in the parent table except for the partitions that are explicitly created. When the user creates a partition, you create the new child table, set it to inherit from the parent, add the necessary constraint, create/update an automatically generated rule/trigger on the parent that redirects DML to the appropriate partition, and move any EXISTING tuples that belong in that partition into it. You'd also need operations to merge a partition back into the parent table (moving the data back), drop a partition (lose the data), and change the definition of a partition (move data around). A significant problem with this design is that you don't know that the partition constraints are mutually exclusive. What do you do with data that matches multiple partition constraints? You'll have to devise some rule, like maybe picking the first partition alphabetically, which will complicate the rearrangement of data when partitions are added or removed. If there IS a predefined partition key, then you'll need a way to tell the parent table what it is (and a way to remove it later if you change your mind). Then it should be possible to validate that child partitions are defined only in terms of that key and that they are mutually exclusive. You'll still need basically all the same operations: create partition, modify partition, merge partition back into parent, drop partition. ...Robert
On Thu, Nov 27, 2008 at 11:09 AM, Emmanuel Cecchet <manu@frogthinker.org> wrote: > I have been following that discussion very closely but it seems that we are > debating solutions without a good specification of the problem/requirements. > I would suggest that we collect all the partitioning requirements on a > dedicated Wiki page. There might not be a one size fits it all solution for > all requirements. We can also look at what other databases are proposing to > address these issues. > If we can prioritize features, that should also allow us to stage the > partitioning implementation. This might be a good idea. Want to take a crack at it? > I have a prototype insert trigger in C that directly move inserts in a > master table to the appropriate child table (directly moving the tuple). Let > me know if anyone is interested. Can't hurt to post it. ...Robert
Hi all, While I was trying to find the right place to add a new page on the wiki, I found the document of Simon on partitioning requirements (http://wiki.postgresql.org/wiki/Image:Partitioning_Requirements.pdf) referenced from http://wiki.postgresql.org/wiki/Development_projects I think this is a good base to start from. Should we convert the doc into a wiki page or get the source for the doc and go from there? I attach what I have come up with so far for the C trigger I was talking about for efficient automatic auto-partitioning of inserts in child tables. Emmanuel Robert Haas wrote: > On Thu, Nov 27, 2008 at 11:09 AM, Emmanuel Cecchet <manu@frogthinker.org> wrote: > >> I have been following that discussion very closely but it seems that we are >> debating solutions without a good specification of the problem/requirements. >> I would suggest that we collect all the partitioning requirements on a >> dedicated Wiki page. There might not be a one size fits it all solution for >> all requirements. We can also look at what other databases are proposing to >> address these issues. >> If we can prioritize features, that should also allow us to stage the >> partitioning implementation. >> > > This might be a good idea. Want to take a crack at it? > > >> I have a prototype insert trigger in C that directly move inserts in a >> master table to the appropriate child table (directly moving the tuple). Let >> me know if anyone is interested. >> > > Can't hurt to post it. > > ...Robert > > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet ### Eclipse Workspace Patch 1.0 #P Postgres-HEAD Index: src/test/regress/regress.c =================================================================== RCS file: /root/cvsrepo/pgsql/src/test/regress/regress.c,v retrieving revision 1.71 diff -u -r1.71 regress.c --- src/test/regress/regress.c 25 Mar 2008 22:42:46 -0000 1.71 +++ src/test/regress/regress.c 13 Nov 2008 06:11:08 -0000 @@ -10,6 +10,9 @@ #include "utils/geo_decls.h" /* includes <math.h> */ #include "executor/executor.h" /* For GetAttributeByName */ #include "commands/sequence.h" /* for nextval() */ +#include "catalog/namespace.h" +#include "executor/executor.h" +#include "executor/tuptable.h" #define P_MAXDIG 12 #define LDELIM '(' @@ -732,3 +735,90 @@ *--walk = '\0'; PG_RETURN_CSTRING(result); } + + +/* + * Partition trigger test + * + * The trigger should be used this way: + * CREATE TRIGGER child_table_name + BEFORE INSERT ON master_table + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); + */ + +extern Datum partition_insert_trigger(PG_FUNCTION_ARGS); + +PG_FUNCTION_INFO_V1(partition_insert_trigger); + +Datum +partition_insert_trigger(PG_FUNCTION_ARGS) +{ + TriggerData *trigdata = (TriggerData *) fcinfo->context; + HeapTuple trigtuple= trigdata->tg_trigtuple; + char *child_table_name; + Relation child_table_relation; + Oid relation_id; + + /* make sure it's called as a trigger at all */ + if (!CALLED_AS_TRIGGER(fcinfo)) + elog(ERROR, "partition_insert_trigger: not called by trigger manager"); + + /* Sanity checks */ + if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) || !TRIGGER_FIRED_BEFORE(trigdata->tg_event)) + elog(ERROR, "partition_insert_trigger: not called on insert before"); + + // Child table name is either given as the unique parameter or it is the name of the trigger + if (trigdata->tg_trigger->tgnargs == 1) + child_table_name = trigdata->tg_trigger->tgargs[0]; + else + child_table_name = trigdata->tg_trigger->tgname; + + // Lookup the child relation + relation_id = RelnameGetRelid(child_table_name); + if (relation_id == InvalidOid) + elog(ERROR, "partition_insert_trigger: Invalid child table %s", child_table_name); + child_table_relation = RelationIdGetRelation(relation_id); + if (child_table_relation == NULL) + elog(ERROR, "partition_insert_trigger: Failed to locate relation for child table %s", child_table_name); + + { // Check the constraints + TupleConstr *constr = child_table_relation->rd_att->constr; + + if (constr->num_check > 0) + { + ResultRelInfo *resultRelInfo; + TupleTableSlot *slot; + EState *estate= CreateExecutorState(); + + resultRelInfo = makeNode(ResultRelInfo); + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */ + resultRelInfo->ri_RelationDesc = child_table_relation; + + estate->es_result_relations = resultRelInfo; + estate->es_num_result_relations = 1; + estate->es_result_relation_info = resultRelInfo; + + /* Set up a tuple slot too */ + slot = MakeSingleTupleTableSlot(trigdata->tg_relation->rd_att); + ExecStoreTuple(trigtuple, slot, InvalidBuffer, false); + + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL) + { // Constraints satisfied, insert the row in the child table + bool use_wal = true; + bool use_fsm=true; + + heap_insert(child_table_relation, trigtuple, GetCurrentCommandId(true), use_wal, use_fsm); + RelationClose(child_table_relation); + ExecDropSingleTupleTableSlot(slot); + return PointerGetDatum(NULL); + } + ExecDropSingleTupleTableSlot(slot); + } + else + elog(ERROR, "partition_insert_trigger: No constraint found for child table %s", child_table_name); + } + RelationClose(child_table_relation); + + return PointerGetDatum(trigdata->tg_trigtuple); +} + Index: src/backend/executor/execMain.c =================================================================== RCS file: /root/cvsrepo/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.314 diff -u -r1.314 execMain.c --- src/backend/executor/execMain.c 31 Oct 2008 21:07:54 -0000 1.314 +++ src/backend/executor/execMain.c 13 Nov 2008 06:11:08 -0000 @@ -1947,7 +1947,7 @@ /* * ExecRelCheck --- check that tuple meets constraints for result relation */ -static const char * +const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { Index: src/test/regress/output/create_function_1.source =================================================================== RCS file: /root/cvsrepo/pgsql/src/test/regress/output/create_function_1.source,v retrieving revision 1.34 diff -u -r1.34 create_function_1.source --- src/test/regress/output/create_function_1.source 31 Oct 2008 19:37:56 -0000 1.34 +++ src/test/regress/output/create_function_1.source 13 Nov 2008 06:11:08 -0000 @@ -47,6 +47,10 @@ RETURNS int4 AS '@libdir@/regress@DLSUFFIX@' LANGUAGE C STRICT; +CREATE FUNCTION partition_insert_trigger () + RETURNS trigger + AS '@libdir@/regress@DLSUFFIX@' + LANGUAGE C STRICT; -- Things that shouldn't work: CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL AS 'SELECT ''not an integer'';'; @@ -80,3 +84,70 @@ CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal AS 'nosuch'; ERROR: there is no built-in function named "nosuch" +-- Partitioning trigger test +CREATE TABLE master ( + id int not null, + date date not null, + value int +); +CREATE TABLE child_y2008m01 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) +) INHERITS (master); +CREATE TABLE child_y2008m02 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) +) INHERITS (master); +CREATE TABLE child_y2008m03 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) +) INHERITS (master); +CREATE TRIGGER insert_child_y2008m01 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01'); +CREATE TRIGGER child_y2008m02 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); +CREATE TRIGGER child_y2008m03 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); +INSERT INTO master VALUES (1, '2008-01-15', 1); +INSERT INTO master VALUES (2, '2008-02-15', 2); +INSERT INTO master VALUES (3, '2008-03-15', 3); +INSERT INTO master VALUES (4, '2008-04-15', 4); +COPY master FROM '/root/pg_partitions/copy_input.txt'; +select * from master; + id | date | value +----+------------+------- + 4 | 04-15-2008 | 4 + 1 | 01-15-2008 | 1 + 11 | 01-10-2008 | 11 + 2 | 02-15-2008 | 2 + 12 | 02-15-2008 | 12 + 3 | 03-15-2008 | 3 + 13 | 03-15-2008 | 13 +(7 rows) + + select * from child_y2008m01; + id | date | value +----+------------+------- + 1 | 01-15-2008 | 1 + 11 | 01-10-2008 | 11 +(2 rows) + + select * from child_y2008m02; + id | date | value +----+------------+------- + 2 | 02-15-2008 | 2 + 12 | 02-15-2008 | 12 +(2 rows) + + select * from child_y2008m03; + id | date | value +----+------------+------- + 3 | 03-15-2008 | 3 + 13 | 03-15-2008 | 13 +(2 rows) + +DROP TABLE master CASCADE; +NOTICE: drop cascades to 3 other objects +DETAIL: drop cascades to table child_y2008m01 +drop cascades to table child_y2008m02 +drop cascades to table child_y2008m03 Index: src/include/executor/executor.h =================================================================== RCS file: /root/cvsrepo/pgsql/src/include/executor/executor.h,v retrieving revision 1.152 diff -u -r1.152 executor.h --- src/include/executor/executor.h 31 Oct 2008 21:07:55 -0000 1.152 +++ src/include/executor/executor.h 13 Nov 2008 06:11:08 -0000 @@ -155,6 +155,8 @@ extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids); extern void ExecConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate); +extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo, + TupleTableSlot *slot, EState *estate); extern TupleTableSlot *EvalPlanQual(EState *estate, Index rti, ItemPointer tid, TransactionId priorXmax); extern PlanState *ExecGetActivePlanTree(QueryDesc *queryDesc); Index: src/test/regress/input/create_function_1.source =================================================================== RCS file: /root/cvsrepo/pgsql/src/test/regress/input/create_function_1.source,v retrieving revision 1.19 diff -u -r1.19 create_function_1.source --- src/test/regress/input/create_function_1.source 1 Oct 2008 22:38:57 -0000 1.19 +++ src/test/regress/input/create_function_1.source 13 Nov 2008 06:11:08 -0000 @@ -52,6 +52,12 @@ AS '@libdir@/regress@DLSUFFIX@' LANGUAGE C STRICT; +CREATE FUNCTION partition_insert_trigger () + RETURNS trigger + AS '@libdir@/regress@DLSUFFIX@' + LANGUAGE C STRICT; + + -- Things that shouldn't work: CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL @@ -77,3 +83,48 @@ CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal AS 'nosuch'; + +-- Partitioning trigger test + +CREATE TABLE master ( + id int not null, + date date not null, + value int +); + +CREATE TABLE child_y2008m01 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) +) INHERITS (master); + +CREATE TABLE child_y2008m02 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) +) INHERITS (master); + +CREATE TABLE child_y2008m03 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) +) INHERITS (master); + +CREATE TRIGGER insert_child_y2008m01 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01'); + +CREATE TRIGGER child_y2008m02 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); + +CREATE TRIGGER child_y2008m03 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); + +INSERT INTO master VALUES (1, '2008-01-15', 1); +INSERT INTO master VALUES (2, '2008-02-15', 2); +INSERT INTO master VALUES (3, '2008-03-15', 3); +INSERT INTO master VALUES (4, '2008-04-15', 4); +COPY master FROM '/root/pg_partitions/copy_input.txt'; + +select * from master; +select * from child_y2008m01; +select * from child_y2008m02; +select * from child_y2008m03; + +DROP TABLE master CASCADE;
Hackers, We don't yet seem to have a clear specification for this feature, and the Other Open Source DB has shown us how problematic it is to get auto-partitioning wrong. Should we defer auto-partitioning to 8.5? --Josh
Josh Berkus <josh@agliodbs.com> writes: > Hackers, > > We don't yet seem to have a clear specification for this feature, and the Other > Open Source DB has shown us how problematic it is to get auto-partitioning > wrong. > > Should we defer auto-partitioning to 8.5? If we're serious about having a "next generation" partitioning with a concept of partition keys then it seems to me to make more sense to do that first and then add on a feature like this. This is still very useful. I haven't looked at the actual patch, does it require core changes or can it be stashed in a pgfoundry or contrib module? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark@enterprisedb.com> wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> Hackers, >> >> We don't yet seem to have a clear specification for this feature, and the Other >> Open Source DB has shown us how problematic it is to get auto-partitioning >> wrong. >> >> Should we defer auto-partitioning to 8.5? > > If we're serious about having a "next generation" partitioning with a concept > of partition keys then it seems to me to make more sense to do that first and > then add on a feature like this. > +1 > This is still very useful. I haven't looked at the actual patch, does it > require core changes or can it be stashed in a pgfoundry or contrib module? > what i don't like about this one is that it creates partitions at create table time and to manually add all new partitions (inherit tables and modify the trigger)... and what i want to see is an automatic creation when it's needed... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Hi all, I will be working on a roadmap for the partitioning features. I think that there are different needs and that we will not be able to address them all in 8.5 or even 8.6. The goal will be to get things done step by step but possibly with a design that will not require major refactoring to support new features. I will try to setup the new wiki page tomorrow. In the meantime, I have made some more tests with the trigger in C (see attached patch). To prevent duplicating too much code, it requires the ExecRelCheck method to be exported (that would be nice to have this function exported in 8.4 so that we can start experimenting in 8.4 and don't have to wait another year for 8.5). If there is locality in the inserts (which might be the case if you COPY sorted data), the performance remains constant regardless the number of child tables. My initial tests to insert 140k rows are as follows: - direct inserts in a child table: 2 seconds - pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds. - C trigger: 4 seconds (actually the overhead is in the constraint check) Right now if the row cannot be inserted in any child table, it is inserted in the parent. If you want to fail, we can add a 'fail trigger' (after all other triggers) that generates an error if previous triggers did not capture the row. If you want to create a new partition, you can have another trigger to handle that. So I think that this trigger approach is pretty flexible like people used AOP in J2EE servers to process requests. It has also the advantage of allowing fast prototyping. It should also be easy to push that functionality down in the core as needed. Is it ok if I move Simon's requirement document under a more generic 'Table partitioning' page on the Wiki? Thanks for your feedback, manu Jaime Casanova wrote: > On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark@enterprisedb.com> wrote: > >> Josh Berkus <josh@agliodbs.com> writes: >> >> >>> Hackers, >>> >>> We don't yet seem to have a clear specification for this feature, and the Other >>> Open Source DB has shown us how problematic it is to get auto-partitioning >>> wrong. >>> >>> Should we defer auto-partitioning to 8.5? >>> >> If we're serious about having a "next generation" partitioning with a concept >> of partition keys then it seems to me to make more sense to do that first and >> then add on a feature like this. >> >> > > +1 > > >> This is still very useful. I haven't looked at the actual patch, does it >> require core changes or can it be stashed in a pgfoundry or contrib module? >> >> > > what i don't like about this one is that it creates partitions at > create table time and to manually add all new partitions (inherit > tables and modify the trigger)... and what i want to see is an > automatic creation when it's needed... > > > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet ### Eclipse Workspace Patch 1.0 #P Postgres-HEAD Index: src/test/regress/regress.c =================================================================== RCS file: /root/cvsrepo/pgsql/src/test/regress/regress.c,v retrieving revision 1.71 diff -u -r1.71 regress.c --- src/test/regress/regress.c 25 Mar 2008 22:42:46 -0000 1.71 +++ src/test/regress/regress.c 16 Dec 2008 01:42:50 -0000 @@ -10,6 +10,9 @@ #include "utils/geo_decls.h" /* includes <math.h> */ #include "executor/executor.h" /* For GetAttributeByName */ #include "commands/sequence.h" /* for nextval() */ +#include "catalog/namespace.h" +#include "executor/executor.h" +#include "executor/tuptable.h" #define P_MAXDIG 12 #define LDELIM '(' @@ -732,3 +735,141 @@ *--walk = '\0'; PG_RETURN_CSTRING(result); } + + +/* + * Partition trigger test + * + * The trigger should be used this way: + * CREATE TRIGGER trigger_name + BEFORE INSERT ON master_table + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(child_table_name, column_number, min_val, max_val); + */ + +extern Datum partition_insert_trigger(PG_FUNCTION_ARGS); + +static Datum +check_constraints_and_insert_tuple(Relation child_table_relation, TriggerData *trigdata ,HeapTuple trigtuple) +{ // Check the constraints + ResultRelInfo *resultRelInfo; + TupleTableSlot *slot; + EState *estate = CreateExecutorState(); + Datum result; + + result = PointerGetDatum(trigdata->tg_trigtuple); + + resultRelInfo = makeNode(ResultRelInfo); + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */ + resultRelInfo->ri_RelationDesc = child_table_relation; + + estate->es_result_relations = resultRelInfo; + estate->es_num_result_relations = 1; + estate->es_result_relation_info = resultRelInfo; + + /* Set up a tuple slot too */ + slot = MakeSingleTupleTableSlot(trigdata->tg_relation->rd_att); + ExecStoreTuple(trigtuple, slot, InvalidBuffer, false); + + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL) + { // Constraints satisfied, insert the row in the child table + bool use_wal = true; + bool use_fsm=true; + + /* BEFORE ROW INSERT Triggers */ + if (resultRelInfo->ri_TrigDesc && + resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0) + { + HeapTuple newtuple; + newtuple = ExecBRInsertTriggers(estate, resultRelInfo, trigtuple); + + if (newtuple != trigtuple) /* modified by Trigger(s) */ + { + heap_freetuple(trigtuple); + trigtuple = newtuple; + } + } + + /* Perform the insert + * TODO: Check that we detect constraint violation if before row insert does something bad */ + heap_insert(child_table_relation, trigtuple, GetCurrentCommandId(true), use_wal, use_fsm); + + /* Update indices */ + ExecOpenIndices(resultRelInfo); + if (resultRelInfo->ri_NumIndices > 0) + ExecInsertIndexTuples(slot, &(trigtuple->t_self), estate, false); + + /* AFTER ROW INSERT Triggers */ + ExecARInsertTriggers(estate, resultRelInfo, trigtuple); + + result = PointerGetDatum(NULL); + } + // Free resources + FreeExecutorState(estate); + ExecDropSingleTupleTableSlot(slot); + + return result; +} + + +PG_FUNCTION_INFO_V1(partition_insert_trigger); + +static Relation last_inserted_relation; + +Datum +partition_insert_trigger(PG_FUNCTION_ARGS) +{ + TriggerData *trigdata = (TriggerData *) fcinfo->context; + HeapTuple trigtuple= trigdata->tg_trigtuple; + TupleConstr *constr; + char *child_table_name; + Relation child_table_relation; + Oid relation_id; + Datum result; + + // Try to exploit locality for bulk inserts + // We expect consecutive insert to go to the same child table + if (last_inserted_relation != NULL) + { // Try the last table we used + result = check_constraints_and_insert_tuple(last_inserted_relation, trigdata, trigtuple); + if (result == PointerGetDatum(NULL)) + return result; + // We got a miss + last_inserted_relation = NULL; + } + + /* make sure it's called as a trigger at all */ + if (!CALLED_AS_TRIGGER(fcinfo)) + elog(ERROR, "partition_insert_trigger: not called by trigger manager"); + + /* Sanity checks */ + if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) || !TRIGGER_FIRED_BEFORE(trigdata->tg_event)) + elog(ERROR, "partition_insert_trigger: not called on insert before"); + + // Child table name is either given as the unique parameter or it is the name of the trigger + if (trigdata->tg_trigger->tgnargs == 1) + child_table_name = trigdata->tg_trigger->tgargs[0]; + else + child_table_name = trigdata->tg_trigger->tgname; + + // Lookup the child relation + relation_id = RelnameGetRelid(child_table_name); + if (relation_id == InvalidOid) + elog(ERROR, "partition_insert_trigger: Invalid child table %s", child_table_name); + child_table_relation = RelationIdGetRelation(relation_id); + if (child_table_relation == NULL) + elog(ERROR, "partition_insert_trigger: Failed to locate relation for child table %s", child_table_name); + + constr = child_table_relation->rd_att->constr; + if (constr->num_check == 0) + elog(ERROR, "partition_insert_trigger: No constraint found for child table %s", child_table_name); + + result = check_constraints_and_insert_tuple(child_table_relation, trigdata, trigtuple); + + if (result == PointerGetDatum(NULL)) + last_inserted_relation = child_table_relation; + + RelationClose(child_table_relation); + + return result; +} + Index: src/backend/executor/execMain.c =================================================================== RCS file: /root/cvsrepo/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.314 diff -u -r1.314 execMain.c --- src/backend/executor/execMain.c 31 Oct 2008 21:07:54 -0000 1.314 +++ src/backend/executor/execMain.c 16 Dec 2008 01:42:50 -0000 @@ -1947,7 +1947,7 @@ /* * ExecRelCheck --- check that tuple meets constraints for result relation */ -static const char * +const char * ExecRelCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate) { Index: src/test/regress/output/create_function_1.source =================================================================== RCS file: /root/cvsrepo/pgsql/src/test/regress/output/create_function_1.source,v retrieving revision 1.34 diff -u -r1.34 create_function_1.source --- src/test/regress/output/create_function_1.source 31 Oct 2008 19:37:56 -0000 1.34 +++ src/test/regress/output/create_function_1.source 16 Dec 2008 01:42:50 -0000 @@ -47,6 +47,10 @@ RETURNS int4 AS '@libdir@/regress@DLSUFFIX@' LANGUAGE C STRICT; +CREATE FUNCTION partition_insert_trigger () + RETURNS trigger + AS '@libdir@/regress@DLSUFFIX@' + LANGUAGE C STRICT; -- Things that shouldn't work: CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL AS 'SELECT ''not an integer'';'; @@ -80,3 +84,70 @@ CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal AS 'nosuch'; ERROR: there is no built-in function named "nosuch" +-- Partitioning trigger test +CREATE TABLE master ( + id int not null, + date date not null, + value int +); +CREATE TABLE child_y2008m01 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) +) INHERITS (master); +CREATE TABLE child_y2008m02 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) +) INHERITS (master); +CREATE TABLE child_y2008m03 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) +) INHERITS (master); +CREATE TRIGGER insert_child_y2008m01 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01'); +CREATE TRIGGER child_y2008m02 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); +CREATE TRIGGER child_y2008m03 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); +INSERT INTO master VALUES (1, '2008-01-15', 1); +INSERT INTO master VALUES (2, '2008-02-15', 2); +INSERT INTO master VALUES (3, '2008-03-15', 3); +INSERT INTO master VALUES (4, '2008-04-15', 4); +COPY master FROM '/root/pg_partitions/copy_input.txt'; +select * from master; + id | date | value +----+------------+------- + 4 | 04-15-2008 | 4 + 1 | 01-15-2008 | 1 + 11 | 01-10-2008 | 11 + 2 | 02-15-2008 | 2 + 12 | 02-15-2008 | 12 + 3 | 03-15-2008 | 3 + 13 | 03-15-2008 | 13 +(7 rows) + + select * from child_y2008m01; + id | date | value +----+------------+------- + 1 | 01-15-2008 | 1 + 11 | 01-10-2008 | 11 +(2 rows) + + select * from child_y2008m02; + id | date | value +----+------------+------- + 2 | 02-15-2008 | 2 + 12 | 02-15-2008 | 12 +(2 rows) + + select * from child_y2008m03; + id | date | value +----+------------+------- + 3 | 03-15-2008 | 3 + 13 | 03-15-2008 | 13 +(2 rows) + +DROP TABLE master CASCADE; +NOTICE: drop cascades to 3 other objects +DETAIL: drop cascades to table child_y2008m01 +drop cascades to table child_y2008m02 +drop cascades to table child_y2008m03 Index: src/include/executor/executor.h =================================================================== RCS file: /root/cvsrepo/pgsql/src/include/executor/executor.h,v retrieving revision 1.152 diff -u -r1.152 executor.h --- src/include/executor/executor.h 31 Oct 2008 21:07:55 -0000 1.152 +++ src/include/executor/executor.h 16 Dec 2008 01:42:50 -0000 @@ -155,6 +155,8 @@ extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids); extern void ExecConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate); +extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo, + TupleTableSlot *slot, EState *estate); extern TupleTableSlot *EvalPlanQual(EState *estate, Index rti, ItemPointer tid, TransactionId priorXmax); extern PlanState *ExecGetActivePlanTree(QueryDesc *queryDesc); Index: src/test/regress/input/create_function_1.source =================================================================== RCS file: /root/cvsrepo/pgsql/src/test/regress/input/create_function_1.source,v retrieving revision 1.19 diff -u -r1.19 create_function_1.source --- src/test/regress/input/create_function_1.source 1 Oct 2008 22:38:57 -0000 1.19 +++ src/test/regress/input/create_function_1.source 16 Dec 2008 01:42:50 -0000 @@ -52,6 +52,12 @@ AS '@libdir@/regress@DLSUFFIX@' LANGUAGE C STRICT; +CREATE FUNCTION partition_insert_trigger () + RETURNS trigger + AS '@libdir@/regress@DLSUFFIX@' + LANGUAGE C STRICT; + + -- Things that shouldn't work: CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL @@ -77,3 +83,48 @@ CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal AS 'nosuch'; + +-- Partitioning trigger test + +CREATE TABLE master ( + id int not null, + date date not null, + value int +); + +CREATE TABLE child_y2008m01 ( + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' ) +) INHERITS (master); + +CREATE TABLE child_y2008m02 ( + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' ) +) INHERITS (master); + +CREATE TABLE child_y2008m03 ( + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' ) +) INHERITS (master); + +CREATE TRIGGER insert_child_y2008m01 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01'); + +CREATE TRIGGER child_y2008m02 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); + +CREATE TRIGGER child_y2008m03 + BEFORE INSERT ON master + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(); + +INSERT INTO master VALUES (1, '2008-01-15', 1); +INSERT INTO master VALUES (2, '2008-02-15', 2); +INSERT INTO master VALUES (3, '2008-03-15', 3); +INSERT INTO master VALUES (4, '2008-04-15', 4); +COPY master FROM '/root/pg_partitions/copy_input.txt'; + +select * from master; +select * from child_y2008m01; +select * from child_y2008m02; +select * from child_y2008m03; + +DROP TABLE master CASCADE;
Hi,
+1.
A similar DELETE trigger should be pretty easy to write up in C. I think the main challenge is with UPDATE triggers especially if the new row will fall into another child table - but we can always throw an error for such a case initially.
One of the work items related to partitioning eventually is to avoid having to APPEND the parent in all queries involving children. Maybe having an overflow child table might help to catch failed triggers for those cases?
Regards,
Nikhils
--
http://www.enterprisedb.com
I will be working on a roadmap for the partitioning features. I think that there are different needs and that we will not be able to address them all in 8.5 or even 8.6.
The goal will be to get things done step by step but possibly with a design that will not require major refactoring to support new features. I will try to setup the new wiki page tomorrow.
+1.
In the meantime, I have made some more tests with the trigger in C (see attached patch). To prevent duplicating too much code, it requires the ExecRelCheck method to be exported (that would be nice to have this function exported in 8.4 so that we can start experimenting in 8.4 and don't have to wait another year for 8.5). If there is locality in the inserts (which might be the case if you COPY sorted data), the performance remains constant regardless the number of child tables.
A similar DELETE trigger should be pretty easy to write up in C. I think the main challenge is with UPDATE triggers especially if the new row will fall into another child table - but we can always throw an error for such a case initially.
Right now if the row cannot be inserted in any child table, it is inserted in the parent. If you want to fail, we can add a 'fail trigger' (after all other triggers) that generates an error if previous triggers did not capture the row. If you want to create a new partition, you can have another trigger to handle that.
One of the work items related to partitioning eventually is to avoid having to APPEND the parent in all queries involving children. Maybe having an overflow child table might help to catch failed triggers for those cases?
Regards,
Nikhils
So I think that this trigger approach is pretty flexible like people used AOP in J2EE servers to process requests. It has also the advantage of allowing fast prototyping. It should also be easy to push that functionality down in the core as needed.
Is it ok if I move Simon's requirement document under a more generic 'Table partitioning' page on the Wiki?
Thanks for your feedback,
manu
Jaime Casanova wrote:On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark@enterprisedb.com> wrote:
Josh Berkus <josh@agliodbs.com> writes:
Hackers,If we're serious about having a "next generation" partitioning with a concept
We don't yet seem to have a clear specification for this feature, and the Other
Open Source DB has shown us how problematic it is to get auto-partitioning
wrong.
Should we defer auto-partitioning to 8.5?
of partition keys then it seems to me to make more sense to do that first and
then add on a feature like this.
+1
This is still very useful. I haven't looked at the actual patch, does it
require core changes or can it be stashed in a pgfoundry or contrib module?
what i don't like about this one is that it creates partitions at
create table time and to manually add all new partitions (inherit
tables and modify the trigger)... and what i want to see is an
automatic creation when it's needed...
--Emmanuel Cecchet
FTO @ Frog Thinker Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet
### Eclipse Workspace Patch 1.0
#P Postgres-HEAD
Index: src/test/regress/regress.c
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/regress.c,v
retrieving revision 1.71
diff -u -r1.71 regress.c
--- src/test/regress/regress.c 25 Mar 2008 22:42:46 -0000 1.71
+++ src/test/regress/regress.c 16 Dec 2008 01:42:50 -0000
@@ -10,6 +10,9 @@
#include "utils/geo_decls.h" /* includes <math.h> */
#include "executor/executor.h" /* For GetAttributeByName */
#include "commands/sequence.h" /* for nextval() */
+#include "catalog/namespace.h"
+#include "executor/executor.h"
+#include "executor/tuptable.h"
#define P_MAXDIG 12
#define LDELIM '('
@@ -732,3 +735,141 @@
*--walk = '\0';
PG_RETURN_CSTRING(result);
}
+
+
+/*
+ * Partition trigger test
+ *
+ * The trigger should be used this way:
+ * CREATE TRIGGER trigger_name
+ BEFORE INSERT ON master_table
+ FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger(child_table_name, column_number, min_val, max_val);
+ */
+
+extern Datum partition_insert_trigger(PG_FUNCTION_ARGS);
+
+static Datum
+check_constraints_and_insert_tuple(Relation child_table_relation, TriggerData *trigdata ,HeapTuple trigtuple)
+{ // Check the constraints
+ ResultRelInfo *resultRelInfo;
+ TupleTableSlot *slot;
+ EState *estate = CreateExecutorState();
+ Datum result;
+
+ result = PointerGetDatum(trigdata->tg_trigtuple);
+
+ resultRelInfo = makeNode(ResultRelInfo);
+ resultRelInfo->ri_RangeTableIndex = 1; /* dummy */
+ resultRelInfo->ri_RelationDesc = child_table_relation;
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ /* Set up a tuple slot too */
+ slot = MakeSingleTupleTableSlot(trigdata->tg_relation->rd_att);
+ ExecStoreTuple(trigtuple, slot, InvalidBuffer, false);
+
+ if (ExecRelCheck(resultRelInfo, slot, estate) == NULL)
+ { // Constraints satisfied, insert the row in the child table
+ bool use_wal = true;
+ bool use_fsm=true;
+
+ /* BEFORE ROW INSERT Triggers */
+ if (resultRelInfo->ri_TrigDesc &&
+ resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0)
+ {
+ HeapTuple newtuple;
+ newtuple = ExecBRInsertTriggers(estate, resultRelInfo, trigtuple);
+
+ if (newtuple != trigtuple) /* modified by Trigger(s) */
+ {
+ heap_freetuple(trigtuple);
+ trigtuple = newtuple;
+ }
+ }
+
+ /* Perform the insert
+ * TODO: Check that we detect constraint violation if before row insert does something bad */
+ heap_insert(child_table_relation, trigtuple, GetCurrentCommandId(true), use_wal, use_fsm);
+
+ /* Update indices */
+ ExecOpenIndices(resultRelInfo);
+ if (resultRelInfo->ri_NumIndices > 0)
+ ExecInsertIndexTuples(slot, &(trigtuple->t_self), estate, false);
+
+ /* AFTER ROW INSERT Triggers */
+ ExecARInsertTriggers(estate, resultRelInfo, trigtuple);
+
+ result = PointerGetDatum(NULL);
+ }
+ // Free resources
+ FreeExecutorState(estate);
+ ExecDropSingleTupleTableSlot(slot);
+
+ return result;
+}
+
+
+PG_FUNCTION_INFO_V1(partition_insert_trigger);
+
+static Relation last_inserted_relation;
+
+Datum
+partition_insert_trigger(PG_FUNCTION_ARGS)
+{
+ TriggerData *trigdata = (TriggerData *) fcinfo->context;
+ HeapTuple trigtuple= trigdata->tg_trigtuple;
+ TupleConstr *constr;
+ char *child_table_name;
+ Relation child_table_relation;
+ Oid relation_id;
+ Datum result;
+
+ // Try to exploit locality for bulk inserts
+ // We expect consecutive insert to go to the same child table
+ if (last_inserted_relation != NULL)
+ { // Try the last table we used
+ result = check_constraints_and_insert_tuple(last_inserted_relation, trigdata, trigtuple);
+ if (result == PointerGetDatum(NULL))
+ return result;
+ // We got a miss
+ last_inserted_relation = NULL;
+ }
+
+ /* make sure it's called as a trigger at all */
+ if (!CALLED_AS_TRIGGER(fcinfo))
+ elog(ERROR, "partition_insert_trigger: not called by trigger manager");
+
+ /* Sanity checks */
+ if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) || !TRIGGER_FIRED_BEFORE(trigdata->tg_event))
+ elog(ERROR, "partition_insert_trigger: not called on insert before");
+
+ // Child table name is either given as the unique parameter or it is the name of the trigger
+ if (trigdata->tg_trigger->tgnargs == 1)
+ child_table_name = trigdata->tg_trigger->tgargs[0];
+ else
+ child_table_name = trigdata->tg_trigger->tgname;
+
+ // Lookup the child relation
+ relation_id = RelnameGetRelid(child_table_name);
+ if (relation_id == InvalidOid)
+ elog(ERROR, "partition_insert_trigger: Invalid child table %s", child_table_name);
+ child_table_relation = RelationIdGetRelation(relation_id);
+ if (child_table_relation == NULL)
+ elog(ERROR, "partition_insert_trigger: Failed to locate relation for child table %s", child_table_name);
+
+ constr = child_table_relation->rd_att->constr;
+ if (constr->num_check == 0)
+ elog(ERROR, "partition_insert_trigger: No constraint found for child table %s", child_table_name);
+
+ result = check_constraints_and_insert_tuple(child_table_relation, trigdata, trigtuple);
+
+ if (result == PointerGetDatum(NULL))
+ last_inserted_relation = child_table_relation;
+
+ RelationClose(child_table_relation);
+
+ return result;
+}
+
Index: src/backend/executor/execMain.c
===================================================================
RCS file: /root/cvsrepo/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.314
diff -u -r1.314 execMain.c
--- src/backend/executor/execMain.c 31 Oct 2008 21:07:54 -0000 1.314
+++ src/backend/executor/execMain.c 16 Dec 2008 01:42:50 -0000
@@ -1947,7 +1947,7 @@
/*
* ExecRelCheck --- check that tuple meets constraints for result relation
*/
-static const char *
+const char *
ExecRelCheck(ResultRelInfo *resultRelInfo,
TupleTableSlot *slot, EState *estate)
{
Index: src/test/regress/output/create_function_1.source
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/output/create_function_1.source,v
retrieving revision 1.34
diff -u -r1.34 create_function_1.source
--- src/test/regress/output/create_function_1.source 31 Oct 2008 19:37:56 -0000 1.34
+++ src/test/regress/output/create_function_1.source 16 Dec 2008 01:42:50 -0000
@@ -47,6 +47,10 @@
RETURNS int4
AS '@libdir@/regress@DLSUFFIX@'
LANGUAGE C STRICT;
+CREATE FUNCTION partition_insert_trigger ()
+ RETURNS trigger
+ AS '@libdir@/regress@DLSUFFIX@'
+ LANGUAGE C STRICT;
-- Things that shouldn't work:
CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
AS 'SELECT ''not an integer'';';
@@ -80,3 +84,70 @@
CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
AS 'nosuch';
ERROR: there is no built-in function named "nosuch"
+-- Partitioning trigger test
+CREATE TABLE master (
+ id int not null,
+ date date not null,
+ value int
+);
+CREATE TABLE child_y2008m01 (
+ CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
+) INHERITS (master);
+CREATE TABLE child_y2008m02 (
+ CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
+) INHERITS (master);
+CREATE TABLE child_y2008m03 (
+ CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
+) INHERITS (master);
+CREATE TRIGGER insert_child_y2008m01
+ BEFORE INSERT ON master
+ FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01');
+CREATE TRIGGER child_y2008m02
+ BEFORE INSERT ON master
+ FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+CREATE TRIGGER child_y2008m03
+ BEFORE INSERT ON master
+ FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+INSERT INTO master VALUES (1, '2008-01-15', 1);
+INSERT INTO master VALUES (2, '2008-02-15', 2);
+INSERT INTO master VALUES (3, '2008-03-15', 3);
+INSERT INTO master VALUES (4, '2008-04-15', 4);
+COPY master FROM '/root/pg_partitions/copy_input.txt';
+select * from master;
+ id | date | value
+----+------------+-------
+ 4 | 04-15-2008 | 4
+ 1 | 01-15-2008 | 1
+ 11 | 01-10-2008 | 11
+ 2 | 02-15-2008 | 2
+ 12 | 02-15-2008 | 12
+ 3 | 03-15-2008 | 3
+ 13 | 03-15-2008 | 13
+(7 rows)
+
+ select * from child_y2008m01;
+ id | date | value
+----+------------+-------
+ 1 | 01-15-2008 | 1
+ 11 | 01-10-2008 | 11
+(2 rows)
+
+ select * from child_y2008m02;
+ id | date | value
+----+------------+-------
+ 2 | 02-15-2008 | 2
+ 12 | 02-15-2008 | 12
+(2 rows)
+
+ select * from child_y2008m03;
+ id | date | value
+----+------------+-------
+ 3 | 03-15-2008 | 3
+ 13 | 03-15-2008 | 13
+(2 rows)
+
+DROP TABLE master CASCADE;
+NOTICE: drop cascades to 3 other objects
+DETAIL: drop cascades to table child_y2008m01
+drop cascades to table child_y2008m02
+drop cascades to table child_y2008m03
Index: src/include/executor/executor.h
===================================================================
RCS file: /root/cvsrepo/pgsql/src/include/executor/executor.h,v
retrieving revision 1.152
diff -u -r1.152 executor.h
--- src/include/executor/executor.h 31 Oct 2008 21:07:55 -0000 1.152
+++ src/include/executor/executor.h 16 Dec 2008 01:42:50 -0000
@@ -155,6 +155,8 @@
extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids);
extern void ExecConstraints(ResultRelInfo *resultRelInfo,
TupleTableSlot *slot, EState *estate);
+extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo,
+ TupleTableSlot *slot, EState *estate);
extern TupleTableSlot *EvalPlanQual(EState *estate, Index rti,
ItemPointer tid, TransactionId priorXmax);
extern PlanState *ExecGetActivePlanTree(QueryDesc *queryDesc);
Index: src/test/regress/input/create_function_1.source
===================================================================
RCS file: /root/cvsrepo/pgsql/src/test/regress/input/create_function_1.source,v
retrieving revision 1.19
diff -u -r1.19 create_function_1.source
--- src/test/regress/input/create_function_1.source 1 Oct 2008 22:38:57 -0000 1.19
+++ src/test/regress/input/create_function_1.source 16 Dec 2008 01:42:50 -0000
@@ -52,6 +52,12 @@
AS '@libdir@/regress@DLSUFFIX@'
LANGUAGE C STRICT;
+CREATE FUNCTION partition_insert_trigger ()
+ RETURNS trigger
+ AS '@libdir@/regress@DLSUFFIX@'
+ LANGUAGE C STRICT;
+
+
-- Things that shouldn't work:
CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
@@ -77,3 +83,48 @@
CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
AS 'nosuch';
+
+-- Partitioning trigger test
+
+CREATE TABLE master (
+ id int not null,
+ date date not null,
+ value int
+);
+
+CREATE TABLE child_y2008m01 (
+ CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
+) INHERITS (master);
+
+CREATE TABLE child_y2008m02 (
+ CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
+) INHERITS (master);
+
+CREATE TABLE child_y2008m03 (
+ CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
+) INHERITS (master);
+
+CREATE TRIGGER insert_child_y2008m01
+ BEFORE INSERT ON master
+ FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger('child_y2008m01');
+
+CREATE TRIGGER child_y2008m02
+ BEFORE INSERT ON master
+ FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+
+CREATE TRIGGER child_y2008m03
+ BEFORE INSERT ON master
+ FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
+
+INSERT INTO master VALUES (1, '2008-01-15', 1);
+INSERT INTO master VALUES (2, '2008-02-15', 2);
+INSERT INTO master VALUES (3, '2008-03-15', 3);
+INSERT INTO master VALUES (4, '2008-04-15', 4);
+COPY master FROM '/root/pg_partitions/copy_input.txt';
+
+select * from master;
+select * from child_y2008m01;
+select * from child_y2008m02;
+select * from child_y2008m03;
+
+DROP TABLE master CASCADE;
--
http://www.enterprisedb.com
Emmanuel Cecchet <manu@frogthinker.org> wrote: > In the meantime, I have made some more tests with the trigger in C (see > attached patch). Hmm... The inserting partition is passed by trigger arguments. Users must replace triggers when the target is changed (ex. every month). Is it possible to expand all of child paritions from pg_inherits and determine a suitable parition by checking their constraints? We can also use it when there are multiple inserting paritions, something like hash or list paritioning. Fixed target is only applicable to time-based range paritioning. BTW, there is another issue in trigger approach. If INSERT commands are interrupted by triggers, server says "INSERT 0 row" though rows are inserted into child tables. Since using C, we could use some back doors to modify a variable counting affected rows. We could use partitioned tables more transparently if we have it. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Nikhil Sontakke wrote: > A similar DELETE trigger should be pretty easy to write up in C. I > think the main challenge is with UPDATE triggers especially if the new > row will fall into another child table - but we can always throw an > error for such a case initially. I agree. A first implementation could be restricted to updates within the same partition. > > Right now if the row cannot be inserted in any child table, it is > inserted in the parent. If you want to fail, we can add a 'fail > trigger' (after all other triggers) that generates an error if > previous triggers did not capture the row. If you want to create a > new partition, you can have another trigger to handle that. > > > One of the work items related to partitioning eventually is to avoid > having to APPEND the parent in all queries involving children. Maybe > having an overflow child table might help to catch failed triggers for > those cases? This is a good option too. Emmanuel > > Regards, > Nikhils > > > > So I think that this trigger approach is pretty flexible like > people used AOP in J2EE servers to process requests. It has also > the advantage of allowing fast prototyping. It should also be easy > to push that functionality down in the core as needed. > > Is it ok if I move Simon's requirement document under a more > generic 'Table partitioning' page on the Wiki? > > Thanks for your feedback, > manu > > > Jaime Casanova wrote: > > On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark > <stark@enterprisedb.com <mailto:stark@enterprisedb.com>> wrote: > > > Josh Berkus <josh@agliodbs.com <mailto:josh@agliodbs.com>> > writes: > > > > Hackers, > > We don't yet seem to have a clear specification for > this feature, and the Other > Open Source DB has shown us how problematic it is to > get auto-partitioning > wrong. > > Should we defer auto-partitioning to 8.5? > > > If we're serious about having a "next generation" > partitioning with a concept > of partition keys then it seems to me to make more sense > to do that first and > then add on a feature like this. > > > > > +1 > > > > This is still very useful. I haven't looked at the actual > patch, does it > require core changes or can it be stashed in a pgfoundry > or contrib module? > > > > > what i don't like about this one is that it creates partitions at > create table time and to manually add all new partitions (inherit > tables and modify the trigger)... and what i want to see is an > automatic creation when it's needed... >
ITAGAKI Takahiro wrote: > Emmanuel Cecchet <manu@frogthinker.org> wrote >> In the meantime, I have made some more tests with the trigger in C (see >> attached patch). >> > > Hmm... The inserting partition is passed by trigger arguments. > Actually this is just a fallback option. The preferred option is to name the trigger after the child table name. This way the trigger retrieve the table name directly from the trigger name and no argument has to be passed to the trigger. > Users must replace triggers when the target is changed (ex. every month). > I am not sure what you mean. There is one trigger per child table but the trigger is always the same, it is just the name that is given to it that changes. > Is it possible to expand all of child paritions from pg_inherits and > determine a suitable parition by checking their constraints? > Ideally it would be better to do this way. I have not found yet how to automatically get all the child partitions of a parent table from the trigger. This would simplify things by having a single trigger. > We can also use it when there are multiple inserting paritions, > something like hash or list paritioning. Fixed target is only applicable > to time-based range paritioning. > I think there is a misunderstanding on how the trigger works. You have 1 trigger per child table and they are all chained on the parent table. When a tuple is inserted on the parent table, the first trigger is fired, if the constraints of the 1st child table are satisfied, the tuple is moved in the 1st child table and that's it. If it is a miss, the tuple is passed to the next trigger that checks the constraints of the 2nd table. And so on. This will work with any type of partitioning (hash or even UDF) as long as the constraints on the child table reflect the partitioning. > BTW, there is another issue in trigger approach. If INSERT commands > are interrupted by triggers, server says "INSERT 0 row" though > rows are inserted into child tables. Since using C, we could > use some back doors to modify a variable counting affected rows. > We could use partitioned tables more transparently if we have it. > Even if you don't abort the query, the query reports 0 row if it has been moved to another table (you can COPY 100k lines and the server will return 0 if they were all successfully moved to child tables). Technically this is correct since 0 rows were inserted in the parent table. Right now any number >0 is the number of rows that did not satisfy any child table constraint and were inserted in the master table (useful if you don't want the copy command to fail). Regards, Emmanuel > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet
Emmanuel Cecchet <manu@frogthinker.org> wrote: > I think there is a misunderstanding on how the trigger works. You have 1 > trigger per child table and they are all chained on the parent table. Oops, I misunderstand your patch, sorry. > > Is it possible to expand all of child paritions from pg_inherits and > > determine a suitable parition by checking their constraints? > > > Ideally it would be better to do this way. I have not found yet how to > automatically get all the child partitions of a parent table from the > trigger. This would simplify things by having a single trigger. The chained triggers would have better flexibilty, and the auto expanding trigger would have better usability. I'm not sure about performance because expanding child partitions is not always faster than chained calls of triggers. I think chained triggers are hard to maintain. If we drop one of partition tables, we need to reconnect the single-linked-list of the triggers. > > server says "INSERT 0 row" though rows are inserted into child tables. > Technically this is correct since 0 rows were inserted in the parent > table. Yes, but users expect non-0 result normally. Some O/R mapping tools also checks the result exactly and raises errors (it could be turned off, but default is on). Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro wrote: > The chained triggers would have better flexibilty, and the auto expanding > trigger would have better usability. I'm not sure about performance > because expanding child partitions is not always faster than chained > calls of triggers. > > I think chained triggers are hard to maintain. If we drop one of partition > tables, we need to reconnect the single-linked-list of the triggers. > When you drop one child table, you would also have to drop the trigger that has the same name on the parent table. This does not seem too hard but I may be missing something. >>> server says "INSERT 0 row" though rows are inserted into child tables. >>> >> Technically this is correct since 0 rows were inserted in the parent >> table. >> > Yes, but users expect non-0 result normally. Some O/R mapping tools > also checks the result exactly and raises errors (it could be turned > off, but default is on). > If the O/R mapping tool is also creating the table it should be aware of the semantics specifics to partition. But your comment is well taken, this seems counterintuitive and against most API semantics to return 0 when the number of inserted rows is expected. This would certainly require some additional hooks to return the proper value. Best regards, Emmanuel > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center > -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: manu@frogthinker.org Skype: emmanuel_cecchet
On 2008-12-16, at 07:58, ITAGAKI Takahiro wrote: > > >>> server says "INSERT 0 row" though rows are inserted into child >>> tables. >> Technically this is correct since 0 rows were inserted in the parent >> table. > > Yes, but users expect non-0 result normally. Some O/R mapping tools > also checks the result exactly and raises errors (it could be turned > off, but default is on). this is a general problem with triggers on inserts/updates/deletes. To be honest, I would love to see someone fixing it in 8.4, cos it is quite annoying - that developer is unable to figure out number of rows affected - just because there's trigger on that table.
<snip> Is there any progress on this patch? I was asked about this feature last month, during a PostgreSQL talk. I am willing to spend time for testing this patch, if needed. -- Devrim GÜNDÜZ, RHCE devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org