Thread: Feature: triggers on materialized views
Hi! Based on discussion about observing changes on an open query in a reactive manner (to support reactive web applications) [1], I identified that one critical feature is missing to fully implement discussed design of having reactive queries be represented as materialized views, and changes to these materialized views would then be observed and pushed to the client through LISTEN/NOTIFY. This is my first time contributing to PostgreSQL, so I hope I am starting this process well. I would like to propose that support for AFTER triggers are added to materialized views. I experimented a bit and it seems this is mostly just a question of enabling/exposing them. See attached patch. This enabled me to add trigger to a material view which mostly worked. Here are my findings. Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both per statement and per row. There are few improvements which could be done: - Currently only insert and remove operations are done on the materialized view. This is because the current logic just removes changed rows and inserts new rows. - In current concurrently refresh logic those insert and remove operations are made even if there are no changes to be done. Which triggers a statement trigger unnecessary. A small improvement could be to skip the statement in that case, but looking at the code this seems maybe tricky because both each of inserts and deletions are done inside one query each. - Current concurrently refresh logic does never do updates on existing rows. It would be nicer to have that so that triggers are more aligned with real changes to the data. So current two queries could be changed to three, each doing one of the insert, update, and delete. Non-concurrent refresh does not trigger any trigger. But it seems all data to do so is there (previous table, new table), at least for the statement-level trigger. Row-level triggers could also be simulated probably (with TRUNCATE and INSERT triggers). [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Attachment
On Mon, Dec 24, 2018 at 12:59:43PM -0800, Mitar wrote: > Hi! > > Based on discussion about observing changes on an open query in a > reactive manner (to support reactive web applications) [1], I > identified that one critical feature is missing to fully implement > discussed design of having reactive queries be represented as > materialized views, and changes to these materialized views would then > be observed and pushed to the client through LISTEN/NOTIFY. > > This is my first time contributing to PostgreSQL, so I hope I am > starting this process well. You've got the right mailing list, a description of what you want, and a PoC patch. You also got the patch in during the time between Commitfests. You're doing great! > I would like to propose that support for AFTER triggers are added to > materialized views. I experimented a bit and it seems this is mostly > just a question of enabling/exposing them. See attached patch. About that. When there's a change (or possible change) in user-visible behavior, it should come with regression tests, which it would make sense to add to src/tests/regress/matview.sql along with the corresponding changes to src/tests/regress/expected/matview.out > This enabled me to add trigger to a material view which mostly > worked. Here are my findings. > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both > per statement and per row. You'd want at least one test for each of those new features. > There are few improvements which could be > done: > > - Currently only insert and remove operations are done on the > materialized view. This is because the current logic just removes > changed rows and inserts new rows. What other operations might you want to support? > - In current concurrently refresh logic those insert and remove > operations are made even if there are no changes to be done. Which > triggers a statement trigger unnecessary. A small improvement could be > to skip the statement in that case, but looking at the code this seems > maybe tricky because both each of inserts and deletions are done > inside one query each. As far as you can tell, is this just an efficiency optimization, or might it go to correctness of the behavior? > - Current concurrently refresh logic does never do updates on existing > rows. It would be nicer to have that so that triggers are more aligned > with real changes to the data. So current two queries could be changed > to three, each doing one of the insert, update, and delete. I'm not sure I understand the problem being described here. Do you see these as useful to separate for some reason? > Non-concurrent refresh does not trigger any trigger. But it seems > all data to do so is there (previous table, new table), at least for > the statement-level trigger. Row-level triggers could also be > simulated probably (with TRUNCATE and INSERT triggers). Would it make more sense to fill in the missing implementations of NEW and OLD for per-row triggers instead of adding another hack? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi! Thanks for reply! On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote: > You've got the right mailing list, a description of what you want, and > a PoC patch. You also got the patch in during the time between > Commitfests. You're doing great! Great! One thing I am unclear about is how it is determined if this is a viable feature to be eventually included. You gave me some suggestions to improve in my patch (adding tests and so on). Does this mean that the patch should be fully done before a decision is made? Also, the workflow is that I improve things, and resubmit a patch to the mailing list, for now? > > - Currently only insert and remove operations are done on the > > materialized view. This is because the current logic just removes > > changed rows and inserts new rows. > > What other operations might you want to support? Update. So if a row is changing, instead of doing a remove and insert, what currently is being done, I would prefer an update. Then UPDATE trigger operation would happen as well. Maybe the INSERT query could be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this one does UPDATE trigger operation on conflict), and REMOVE changed to remove just rows which were really removed, but not only updated. > As far as you can tell, is this just an efficiency optimization, or > might it go to correctness of the behavior? It is just an optimization. Or maybe even just a surprise. Maybe a documentation addition could help here. In my use case I would loop over OLD and NEW REFERENCING TABLE so if they are empty, nothing would be done. But it is just surprising that DELETE trigger is called even when no rows are being deleted in the materialized view. > I'm not sure I understand the problem being described here. Do you see > these as useful to separate for some reason? So rows which are just updated currently get first DELETE trigger called and then INSERT. The issue is that if I am observing this behavior from outside, it makes it unclear when I see DELETE if this means really that a row has been deleted or it just means that later on an INSERT would happen. Now I have to wait for an eventual INSERT to determine that. But how long should I wait? It makes consuming these notifications tricky. If I just blindly respond to those notifications, this could introduce other problems. For example, if I have a reactive web application it could mean a visible flicker to the user. Instead of updating rendered row, I would first delete it and then later on re-insert it. > > Non-concurrent refresh does not trigger any trigger. But it seems > > all data to do so is there (previous table, new table), at least for > > the statement-level trigger. Row-level triggers could also be > > simulated probably (with TRUNCATE and INSERT triggers). > > Would it make more sense to fill in the missing implementations of NEW > and OLD for per-row triggers instead of adding another hack? You lost me here. But I agree, we should implement this fully, without hacks. I just do not know how exactly. Are you saying that we should support only row-level triggers, or that we should support both statement-level and row-level triggers, but just make sure we implement this properly? I think that my suggestion of using TRUNCATE and INSERT triggers is reasonable in the case of full refresh. This is what happens. If we would want to have DELETE/UPDATE/INSERT triggers, we would have to compute the diff like concurrent version has to do, which would defeat the difference between the two. But yes, all INSERT trigger calls should have NEW provided. So per-statement trigger would have TRUNCATE and INSERT called. And per-row trigger would have TRUNCATE and per-row INSERTs called. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! I made another version of the patch. This one does UPDATEs for changed row instead of DELETE/INSERT. All existing regression tests are still passing (make check). Mitar On Mon, Dec 24, 2018 at 4:13 PM Mitar <mmitar@gmail.com> wrote: > > Hi! > > Thanks for reply! > > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote: > > You've got the right mailing list, a description of what you want, and > > a PoC patch. You also got the patch in during the time between > > Commitfests. You're doing great! > > Great! > > One thing I am unclear about is how it is determined if this is a > viable feature to be eventually included. You gave me some suggestions > to improve in my patch (adding tests and so on). Does this mean that > the patch should be fully done before a decision is made? > > Also, the workflow is that I improve things, and resubmit a patch to > the mailing list, for now? > > > > - Currently only insert and remove operations are done on the > > > materialized view. This is because the current logic just removes > > > changed rows and inserts new rows. > > > > What other operations might you want to support? > > Update. So if a row is changing, instead of doing a remove and insert, > what currently is being done, I would prefer an update. Then UPDATE > trigger operation would happen as well. Maybe the INSERT query could > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this > one does UPDATE trigger operation on conflict), and REMOVE changed to > remove just rows which were really removed, but not only updated. > > > As far as you can tell, is this just an efficiency optimization, or > > might it go to correctness of the behavior? > > It is just an optimization. Or maybe even just a surprise. Maybe a > documentation addition could help here. In my use case I would loop > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would > be done. But it is just surprising that DELETE trigger is called even > when no rows are being deleted in the materialized view. > > > I'm not sure I understand the problem being described here. Do you see > > these as useful to separate for some reason? > > So rows which are just updated currently get first DELETE trigger > called and then INSERT. The issue is that if I am observing this > behavior from outside, it makes it unclear when I see DELETE if this > means really that a row has been deleted or it just means that later > on an INSERT would happen. Now I have to wait for an eventual INSERT > to determine that. But how long should I wait? It makes consuming > these notifications tricky. > > If I just blindly respond to those notifications, this could introduce > other problems. For example, if I have a reactive web application it > could mean a visible flicker to the user. Instead of updating rendered > row, I would first delete it and then later on re-insert it. > > > > Non-concurrent refresh does not trigger any trigger. But it seems > > > all data to do so is there (previous table, new table), at least for > > > the statement-level trigger. Row-level triggers could also be > > > simulated probably (with TRUNCATE and INSERT triggers). > > > > Would it make more sense to fill in the missing implementations of NEW > > and OLD for per-row triggers instead of adding another hack? > > You lost me here. But I agree, we should implement this fully, without > hacks. I just do not know how exactly. > > Are you saying that we should support only row-level triggers, or that > we should support both statement-level and row-level triggers, but > just make sure we implement this properly? I think that my suggestion > of using TRUNCATE and INSERT triggers is reasonable in the case of > full refresh. This is what happens. If we would want to have > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like > concurrent version has to do, which would defeat the difference > between the two. But yes, all INSERT trigger calls should have NEW > provided. > > So per-statement trigger would have TRUNCATE and INSERT called. And > per-row trigger would have TRUNCATE and per-row INSERTs called. > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Attachment
Hi! So I think this makes it work great for REFRESH MATERIALIZED VIEW CONCURRENTLY. I think we can leave empty statement triggers as they are. I have not found a nice way to not do them. For adding triggers to REFRESH MATERIALIZED VIEW I would need some help and pointers. I am not sure how to write calling triggers there. Any reference to an existing code which does something similar would be great. So I think after swapping heaps we should call TRUNCATE trigger and then INSERT for all new rows. Mitar On Mon, Dec 24, 2018 at 6:17 PM Mitar <mmitar@gmail.com> wrote: > > Hi! > > I made another version of the patch. This one does UPDATEs for changed > row instead of DELETE/INSERT. > > All existing regression tests are still passing (make check). > > > Mitar > > On Mon, Dec 24, 2018 at 4:13 PM Mitar <mmitar@gmail.com> wrote: > > > > Hi! > > > > Thanks for reply! > > > > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote: > > > You've got the right mailing list, a description of what you want, and > > > a PoC patch. You also got the patch in during the time between > > > Commitfests. You're doing great! > > > > Great! > > > > One thing I am unclear about is how it is determined if this is a > > viable feature to be eventually included. You gave me some suggestions > > to improve in my patch (adding tests and so on). Does this mean that > > the patch should be fully done before a decision is made? > > > > Also, the workflow is that I improve things, and resubmit a patch to > > the mailing list, for now? > > > > > > - Currently only insert and remove operations are done on the > > > > materialized view. This is because the current logic just removes > > > > changed rows and inserts new rows. > > > > > > What other operations might you want to support? > > > > Update. So if a row is changing, instead of doing a remove and insert, > > what currently is being done, I would prefer an update. Then UPDATE > > trigger operation would happen as well. Maybe the INSERT query could > > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this > > one does UPDATE trigger operation on conflict), and REMOVE changed to > > remove just rows which were really removed, but not only updated. > > > > > As far as you can tell, is this just an efficiency optimization, or > > > might it go to correctness of the behavior? > > > > It is just an optimization. Or maybe even just a surprise. Maybe a > > documentation addition could help here. In my use case I would loop > > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would > > be done. But it is just surprising that DELETE trigger is called even > > when no rows are being deleted in the materialized view. > > > > > I'm not sure I understand the problem being described here. Do you see > > > these as useful to separate for some reason? > > > > So rows which are just updated currently get first DELETE trigger > > called and then INSERT. The issue is that if I am observing this > > behavior from outside, it makes it unclear when I see DELETE if this > > means really that a row has been deleted or it just means that later > > on an INSERT would happen. Now I have to wait for an eventual INSERT > > to determine that. But how long should I wait? It makes consuming > > these notifications tricky. > > > > If I just blindly respond to those notifications, this could introduce > > other problems. For example, if I have a reactive web application it > > could mean a visible flicker to the user. Instead of updating rendered > > row, I would first delete it and then later on re-insert it. > > > > > > Non-concurrent refresh does not trigger any trigger. But it seems > > > > all data to do so is there (previous table, new table), at least for > > > > the statement-level trigger. Row-level triggers could also be > > > > simulated probably (with TRUNCATE and INSERT triggers). > > > > > > Would it make more sense to fill in the missing implementations of NEW > > > and OLD for per-row triggers instead of adding another hack? > > > > You lost me here. But I agree, we should implement this fully, without > > hacks. I just do not know how exactly. > > > > Are you saying that we should support only row-level triggers, or that > > we should support both statement-level and row-level triggers, but > > just make sure we implement this properly? I think that my suggestion > > of using TRUNCATE and INSERT triggers is reasonable in the case of > > full refresh. This is what happens. If we would want to have > > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like > > concurrent version has to do, which would defeat the difference > > between the two. But yes, all INSERT trigger calls should have NEW > > provided. > > > > So per-statement trigger would have TRUNCATE and INSERT called. And > > per-row trigger would have TRUNCATE and per-row INSERTs called. > > > > > > Mitar > > > > -- > > http://mitar.tnode.com/ > > https://twitter.com/mitar_m > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On Mon, Dec 24, 2018 at 04:13:44PM -0800, Mitar wrote: > Hi! > > Thanks for reply! > > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote: > > You've got the right mailing list, a description of what you want, and > > a PoC patch. You also got the patch in during the time between > > Commitfests. You're doing great! > > Great! > > One thing I am unclear about is how it is determined if this is a > viable feature to be eventually included. You gave me some suggestions > to improve in my patch (adding tests and so on). Does this mean that > the patch should be fully done before a decision is made? > > Also, the workflow is that I improve things, and resubmit a patch to > the mailing list, for now? > > > > - Currently only insert and remove operations are done on the > > > materialized view. This is because the current logic just removes > > > changed rows and inserts new rows. > > > > What other operations might you want to support? > > Update. So if a row is changing, instead of doing a remove and insert, > what currently is being done, I would prefer an update. Then UPDATE > trigger operation would happen as well. Maybe the INSERT query could > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this > one does UPDATE trigger operation on conflict), and REMOVE changed to > remove just rows which were really removed, but not only updated. There might be a reason it's the way it is. Looking at the commits that introduced this might shed some light. > > I'm not sure I understand the problem being described here. Do you see > > these as useful to separate for some reason? > > So rows which are just updated currently get first DELETE trigger > called and then INSERT. The issue is that if I am observing this > behavior from outside, it makes it unclear when I see DELETE if this > means really that a row has been deleted or it just means that later > on an INSERT would happen. Now I have to wait for an eventual INSERT > to determine that. But how long should I wait? It makes consuming > these notifications tricky. If it helps you think about it better, all NOTIFICATIONs are sent on COMMIT, i.e. you don't need to worry as much about what things should or shouldn't have arrived. The down side, such as it is, is that they don't convey premature knowledge about a state that may never arrive. > If I just blindly respond to those notifications, this could introduce > other problems. For example, if I have a reactive web application it > could mean a visible flicker to the user. Instead of updating rendered > row, I would first delete it and then later on re-insert it. This is at what I hope is a level quite distinct from database operations. Separation of concerns via the model-view-controller (or similar) architecture and all that. > > > Non-concurrent refresh does not trigger any trigger. But it seems > > > all data to do so is there (previous table, new table), at least for > > > the statement-level trigger. Row-level triggers could also be > > > simulated probably (with TRUNCATE and INSERT triggers). > > > > Would it make more sense to fill in the missing implementations of NEW > > and OLD for per-row triggers instead of adding another hack? > > You lost me here. But I agree, we should implement this fully, without > hacks. I just do not know how exactly. Sorry I was unclear. The SQL standard defines both transition tables, which we have, for per-statement triggers, and transition variables, which we don't, for per-row triggers. Here's the relevant part of the syntax: <trigger definition> ::= CREATE TRIGGER <trigger name> <trigger action time> <trigger event> ON <table name> [ REFERENCING <transition table or variable list> ] <triggered action> <transition table or variable list> ::= <transition table or variable>... <transition table or OLD [ ROW ] [ AS | NEW [ ROW ] [ AS | OLD TABLE [ AS ] | NEW TABLE [ AS ] variable> ::= ] <old transition variable name> ] <new transition variable name> <old transition table name> <new transition table name> > Are you saying that we should support only row-level triggers, or that > we should support both statement-level and row-level triggers, but > just make sure we implement this properly? The latter, although we might need to defer the row-level triggers until we support transition variables. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi! On Tue, Dec 25, 2018 at 10:03 AM David Fetter <david@fetter.org> wrote: > If it helps you think about it better, all NOTIFICATIONs are sent on > COMMIT, i.e. you don't need to worry as much about what things should > or shouldn't have arrived. The down side, such as it is, is that they > don't convey premature knowledge about a state that may never arrive. This fact does not really help me. My client code listening to NOTIFICATIONs does not know when some other client made COMMIT. There is no NOTIFICATION saying "this is the end of the commit for which I just sent you notifications". > This is at what I hope is a level quite distinct from database > operations. Separation of concerns via the model-view-controller (or > similar) architecture and all that. Of course, but garbage in, garbage out. If notifications are superfluous then another abstraction layer has to fix them. I would prefer if this would not have to be the case. But it seems it is relatively easy to fix this logic and have both INSERTs, DELETEs and UPDATEs. The patch I updated and attached previously does that. > Sorry I was unclear. The SQL standard defines both transition tables, > which we have, for per-statement triggers, and transition variables, > which we don't, for per-row triggers. I thought that PostgreSQL has transition variables per-row triggers, only that it is not possible to (re)name them (are depend on the trigger function language)? But there are OLD and NEW variables available in per-row triggers, or equivalent? > The latter, although we might need to defer the row-level triggers > until we support transition variables. Not sure how transition variables are implemented currently for regular tables, but we could probably do the same? Anyway, I do not know how to proceed here to implement or statement-level or row-level triggers here. It could be just a matter a calling some function to call them, but I am not familiar with the codebase enough to know what. So any pointers to existing code which does something similar would be great. So, what to call once material views' heaps are swapped to call triggers? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 2018-Dec-24, Mitar wrote: > I made another version of the patch. This one does UPDATEs for changed > row instead of DELETE/INSERT. > > All existing regression tests are still passing (make check). Okay, but you don't add any for your new feature, which is not good. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi! On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > I made another version of the patch. This one does UPDATEs for changed > > row instead of DELETE/INSERT. > > > > All existing regression tests are still passing (make check). > > Okay, but you don't add any for your new feature, which is not good. Yes, I have not yet done that. I want first to also add calling triggers for non-concurrent refresh, but I would need a bit help there (what to call, example of maybe code which does something similar already). Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 2018-Dec-25, Mitar wrote: > On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > I made another version of the patch. This one does UPDATEs for changed > > > row instead of DELETE/INSERT. > > > > > > All existing regression tests are still passing (make check). > > > > Okay, but you don't add any for your new feature, which is not good. > > Yes, I have not yet done that. I want first to also add calling > triggers for non-concurrent refresh, but I would need a bit help there > (what to call, example of maybe code which does something similar > already). Well, REFRESH CONCURRENTLY runs completely different than non-concurrent REFRESH. The former updates the existing data by observing the differences with the previous data; the latter simply re-runs the query and overwrites everything. So if you simply enabled triggers on non-concurrent refresh, you'd just see a bunch of inserts into a throwaway data area (a transient relfilenode, we call it), then a swap of the MV's relfilenode with the throwaway one. I doubt it'd be useful. But then I'm not clear *why* you would like to do a non-concurrent refresh. Maybe your situation would be best served by forbidding non- concurrent refresh when the MV contains any triggers. Alternatively, maybe reimplement non-concurrent refresh so that it works identically to concurrent refresh (except with a stronger lock). Not sure if this implies any performance penalties. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi! On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > But then I'm not clear *why* you would like to do a non-concurrent > refresh. I mostly wanted to support if for two reasons: - completeness: maybe we cannot imagine the use case yet, but somebody might in the future - getting trigger calls for initial inserts: you can then create materialized view without data, attach triggers, and then run a regular refresh; this allows you to have only one code path to process any (including initial) changes to the view through notifications, instead of handling initial data differently > Maybe your situation would be best served by forbidding non- > concurrent refresh when the MV contains any triggers. If this would be acceptable by the community, I could do it. I worry though that one could probably get themselves into a situation where materialized view losses all data through some WITH NO DATA operation and concurrent refresh is not possible. Currently concurrent refresh works only with data. We could make concurrent refresh also work when materialized view has no data easily (it would just insert data and not compute diff). > Alternatively, maybe reimplement non-concurrent refresh so that it works > identically to concurrent refresh (except with a stronger lock). Not > sure if this implies any performance penalties. Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap. That would then generate reasonable trigger calls. Are there any existing benchmarks for such operations I could use to see if there are any performance changes if I change implementation here? Any guidelines how to evaluate this? Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! I did a bit of benchmarking. It seems my version with UPDATE takes even slightly less time (~5%). Mitar On Mon, Dec 24, 2018 at 6:17 PM Mitar <mmitar@gmail.com> wrote: > > Hi! > > I made another version of the patch. This one does UPDATEs for changed > row instead of DELETE/INSERT. > > All existing regression tests are still passing (make check). > > > Mitar > > On Mon, Dec 24, 2018 at 4:13 PM Mitar <mmitar@gmail.com> wrote: > > > > Hi! > > > > Thanks for reply! > > > > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote: > > > You've got the right mailing list, a description of what you want, and > > > a PoC patch. You also got the patch in during the time between > > > Commitfests. You're doing great! > > > > Great! > > > > One thing I am unclear about is how it is determined if this is a > > viable feature to be eventually included. You gave me some suggestions > > to improve in my patch (adding tests and so on). Does this mean that > > the patch should be fully done before a decision is made? > > > > Also, the workflow is that I improve things, and resubmit a patch to > > the mailing list, for now? > > > > > > - Currently only insert and remove operations are done on the > > > > materialized view. This is because the current logic just removes > > > > changed rows and inserts new rows. > > > > > > What other operations might you want to support? > > > > Update. So if a row is changing, instead of doing a remove and insert, > > what currently is being done, I would prefer an update. Then UPDATE > > trigger operation would happen as well. Maybe the INSERT query could > > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this > > one does UPDATE trigger operation on conflict), and REMOVE changed to > > remove just rows which were really removed, but not only updated. > > > > > As far as you can tell, is this just an efficiency optimization, or > > > might it go to correctness of the behavior? > > > > It is just an optimization. Or maybe even just a surprise. Maybe a > > documentation addition could help here. In my use case I would loop > > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would > > be done. But it is just surprising that DELETE trigger is called even > > when no rows are being deleted in the materialized view. > > > > > I'm not sure I understand the problem being described here. Do you see > > > these as useful to separate for some reason? > > > > So rows which are just updated currently get first DELETE trigger > > called and then INSERT. The issue is that if I am observing this > > behavior from outside, it makes it unclear when I see DELETE if this > > means really that a row has been deleted or it just means that later > > on an INSERT would happen. Now I have to wait for an eventual INSERT > > to determine that. But how long should I wait? It makes consuming > > these notifications tricky. > > > > If I just blindly respond to those notifications, this could introduce > > other problems. For example, if I have a reactive web application it > > could mean a visible flicker to the user. Instead of updating rendered > > row, I would first delete it and then later on re-insert it. > > > > > > Non-concurrent refresh does not trigger any trigger. But it seems > > > > all data to do so is there (previous table, new table), at least for > > > > the statement-level trigger. Row-level triggers could also be > > > > simulated probably (with TRUNCATE and INSERT triggers). > > > > > > Would it make more sense to fill in the missing implementations of NEW > > > and OLD for per-row triggers instead of adding another hack? > > > > You lost me here. But I agree, we should implement this fully, without > > hacks. I just do not know how exactly. > > > > Are you saying that we should support only row-level triggers, or that > > we should support both statement-level and row-level triggers, but > > just make sure we implement this properly? I think that my suggestion > > of using TRUNCATE and INSERT triggers is reasonable in the case of > > full refresh. This is what happens. If we would want to have > > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like > > concurrent version has to do, which would defeat the difference > > between the two. But yes, all INSERT trigger calls should have NEW > > provided. > > > > So per-statement trigger would have TRUNCATE and INSERT called. And > > per-row trigger would have TRUNCATE and per-row INSERTs called. > > > > > > Mitar > > > > -- > > http://mitar.tnode.com/ > > https://twitter.com/mitar_m > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 2018-Dec-25, Mitar wrote: > On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > But then I'm not clear *why* you would like to do a non-concurrent > > refresh. > > I mostly wanted to support if for two reasons: > > - completeness: maybe we cannot imagine the use case yet, but somebody > might in the future Understood. We don't like features that fail to work in conjunction with other features, so this is a good goal to keep in mind. > - getting trigger calls for initial inserts: you can then create > materialized view without data, attach triggers, and then run a > regular refresh; this allows you to have only one code path to process > any (including initial) changes to the view through notifications, > instead of handling initial data differently Sounds like you could do this by fixing concurrent refresh to also work when the MV is WITH NO DATA. > > Maybe your situation would be best served by forbidding non- > > concurrent refresh when the MV contains any triggers. > > If this would be acceptable by the community, I could do it. I think your chances are 50%/50% that this would appear acceptable. > > Alternatively, maybe reimplement non-concurrent refresh so that it works > > identically to concurrent refresh (except with a stronger lock). Not > > sure if this implies any performance penalties. > > Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap. > That would then generate reasonable trigger calls. Right. > Are there any existing benchmarks for such operations I could use to > see if there are any performance changes if I change implementation > here? Any guidelines how to evaluate this? Not that I know of. Typically the developer of a feature comes up with appropriate performance tests also, targetting average and worst cases. If the performance worsens with the different implementation, one idea is to keep both and only use the slow one when triggers are present. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi! On Wed, Dec 26, 2018 at 4:38 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Sounds like you could do this by fixing concurrent refresh to also work > when the MV is WITH NO DATA. Yes, I do not think this would be too hard to fix. I could do this nevertheless. > > Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap. > > That would then generate reasonable trigger calls. > > Right. I have tested this yesterday and performance is 2x worse that heap swap on the benchmark I made. So I do not think this is a viable approach. I am now looking into simply triggering TRUNCATE and INSERT triggers after heap swap simulating the above. I made AFTER STATEMENT triggers and it looks like it is working, only NEW table is not populated for some reason. Any suggestions? See attached patch. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Attachment
Hi! I have made an updated version of the patch, added tests and documentation changes. This is my view now a complete patch. Please provide any feedback or comments you might have for me to improve the patch. I will also add it to commitfest. A summary of the patch: This patch enables adding AFTER triggers (both ROW and STATEMENT) on materialized views. They are fired when doing REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed. Triggers are not fired if you call REFRESH without CONCURRENTLY. This is based on some discussion on the mailing list because implementing it for without CONCURRENTLY would require us to add logic for firing triggers where there was none before (and is just an efficient heap swap). To be able to create a materialized view without data, specify triggers, and REFRESH CONCURRENTLY so that those triggers would be called also for initial data, I have tested and determined that there is no reason why REFRESH CONCURRENTLY could not be run on uninitialized materialized view. So I removed that check and things seem to just work. Including triggers being called for initial data. I think this makes REFRESH CONCURRENTLY have one less special case which is in general nicer. I have run tests and all old tests still succeed. I have added more tests for the new feature. I have run benchmark to evaluate the impact of me changing refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for rows which were just updated. In fact it seems this improves performance slightly (4% in my benchmark, mean over 10 runs). I guess that this is because it is cheaper to just change one column's values (what benchmark is doing when changing rows) instead of removing and inserting the whole row. Because REFRESH MATERIALIZED VIEW CONCURRENTLY is meant to be used when not a lot of data has been changed anyway, I find this a pleasantly surprising additional improvement in this patch. I am attaching the benchmark script I have used. I compared the time of the final refresh query in the script. (I would love if pgbench could take a custom init script to run before the timed part of the script.) Mitar On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote: > > Hi! > > Based on discussion about observing changes on an open query in a > reactive manner (to support reactive web applications) [1], I > identified that one critical feature is missing to fully implement > discussed design of having reactive queries be represented as > materialized views, and changes to these materialized views would then > be observed and pushed to the client through LISTEN/NOTIFY. > > This is my first time contributing to PostgreSQL, so I hope I am > starting this process well. > > I would like to propose that support for AFTER triggers are added to > materialized views. I experimented a bit and it seems this is mostly > just a question of enabling/exposing them. See attached patch. This > enabled me to add trigger to a material view which mostly worked. Here > are my findings. > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both > per statement and per row. There are few improvements which could be > done: > > - Currently only insert and remove operations are done on the > materialized view. This is because the current logic just removes > changed rows and inserts new rows. > - In current concurrently refresh logic those insert and remove > operations are made even if there are no changes to be done. Which > triggers a statement trigger unnecessary. A small improvement could be > to skip the statement in that case, but looking at the code this seems > maybe tricky because both each of inserts and deletions are done > inside one query each. > - Current concurrently refresh logic does never do updates on existing > rows. It would be nicer to have that so that triggers are more aligned > with real changes to the data. So current two queries could be changed > to three, each doing one of the insert, update, and delete. > > Non-concurrent refresh does not trigger any trigger. But it seems all > data to do so is there (previous table, new table), at least for the > statement-level trigger. Row-level triggers could also be simulated > probably (with TRUNCATE and INSERT triggers). > > [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Attachment
Hi! One more version of the patch with slightly more deterministic tests. Mitar On Thu, Dec 27, 2018 at 11:43 PM Mitar <mmitar@gmail.com> wrote: > > Hi! > > I have made an updated version of the patch, added tests and > documentation changes. This is my view now a complete patch. Please > provide any feedback or comments you might have for me to improve the > patch. I will also add it to commitfest. > > A summary of the patch: This patch enables adding AFTER triggers (both > ROW and STATEMENT) on materialized views. They are fired when doing > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed. > Triggers are not fired if you call REFRESH without CONCURRENTLY. This > is based on some discussion on the mailing list because implementing > it for without CONCURRENTLY would require us to add logic for firing > triggers where there was none before (and is just an efficient heap > swap). > > To be able to create a materialized view without data, specify > triggers, and REFRESH CONCURRENTLY so that those triggers would be > called also for initial data, I have tested and determined that there > is no reason why REFRESH CONCURRENTLY could not be run on > uninitialized materialized view. So I removed that check and things > seem to just work. Including triggers being called for initial data. I > think this makes REFRESH CONCURRENTLY have one less special case which > is in general nicer. > > I have run tests and all old tests still succeed. I have added more > tests for the new feature. > > I have run benchmark to evaluate the impact of me changing > refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for > rows which were just updated. In fact it seems this improves > performance slightly (4% in my benchmark, mean over 10 runs). I guess > that this is because it is cheaper to just change one column's values > (what benchmark is doing when changing rows) instead of removing and > inserting the whole row. Because REFRESH MATERIALIZED VIEW > CONCURRENTLY is meant to be used when not a lot of data has been > changed anyway, I find this a pleasantly surprising additional > improvement in this patch. I am attaching the benchmark script I have > used. I compared the time of the final refresh query in the script. (I > would love if pgbench could take a custom init script to run before > the timed part of the script.) > > > Mitar > > On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote: > > > > Hi! > > > > Based on discussion about observing changes on an open query in a > > reactive manner (to support reactive web applications) [1], I > > identified that one critical feature is missing to fully implement > > discussed design of having reactive queries be represented as > > materialized views, and changes to these materialized views would then > > be observed and pushed to the client through LISTEN/NOTIFY. > > > > This is my first time contributing to PostgreSQL, so I hope I am > > starting this process well. > > > > I would like to propose that support for AFTER triggers are added to > > materialized views. I experimented a bit and it seems this is mostly > > just a question of enabling/exposing them. See attached patch. This > > enabled me to add trigger to a material view which mostly worked. Here > > are my findings. > > > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both > > per statement and per row. There are few improvements which could be > > done: > > > > - Currently only insert and remove operations are done on the > > materialized view. This is because the current logic just removes > > changed rows and inserts new rows. > > - In current concurrently refresh logic those insert and remove > > operations are made even if there are no changes to be done. Which > > triggers a statement trigger unnecessary. A small improvement could be > > to skip the statement in that case, but looking at the code this seems > > maybe tricky because both each of inserts and deletions are done > > inside one query each. > > - Current concurrently refresh logic does never do updates on existing > > rows. It would be nicer to have that so that triggers are more aligned > > with real changes to the data. So current two queries could be changed > > to three, each doing one of the insert, update, and delete. > > > > Non-concurrent refresh does not trigger any trigger. But it seems all > > data to do so is there (previous table, new table), at least for the > > statement-level trigger. Row-level triggers could also be simulated > > probably (with TRUNCATE and INSERT triggers). > > > > [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com > > > > > > Mitar > > > > -- > > http://mitar.tnode.com/ > > https://twitter.com/mitar_m > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Attachment
Hi! Hm, why in commitfest it does not display the latest patch? https://commitfest.postgresql.org/21/1953/ It does display correctly the latest e-mail, but not the link to the patch. :-( Mitar On Thu, Dec 27, 2018 at 11:51 PM Mitar <mmitar@gmail.com> wrote: > > Hi! > > One more version of the patch with slightly more deterministic tests. > > > Mitar > > On Thu, Dec 27, 2018 at 11:43 PM Mitar <mmitar@gmail.com> wrote: > > > > Hi! > > > > I have made an updated version of the patch, added tests and > > documentation changes. This is my view now a complete patch. Please > > provide any feedback or comments you might have for me to improve the > > patch. I will also add it to commitfest. > > > > A summary of the patch: This patch enables adding AFTER triggers (both > > ROW and STATEMENT) on materialized views. They are fired when doing > > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed. > > Triggers are not fired if you call REFRESH without CONCURRENTLY. This > > is based on some discussion on the mailing list because implementing > > it for without CONCURRENTLY would require us to add logic for firing > > triggers where there was none before (and is just an efficient heap > > swap). > > > > To be able to create a materialized view without data, specify > > triggers, and REFRESH CONCURRENTLY so that those triggers would be > > called also for initial data, I have tested and determined that there > > is no reason why REFRESH CONCURRENTLY could not be run on > > uninitialized materialized view. So I removed that check and things > > seem to just work. Including triggers being called for initial data. I > > think this makes REFRESH CONCURRENTLY have one less special case which > > is in general nicer. > > > > I have run tests and all old tests still succeed. I have added more > > tests for the new feature. > > > > I have run benchmark to evaluate the impact of me changing > > refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for > > rows which were just updated. In fact it seems this improves > > performance slightly (4% in my benchmark, mean over 10 runs). I guess > > that this is because it is cheaper to just change one column's values > > (what benchmark is doing when changing rows) instead of removing and > > inserting the whole row. Because REFRESH MATERIALIZED VIEW > > CONCURRENTLY is meant to be used when not a lot of data has been > > changed anyway, I find this a pleasantly surprising additional > > improvement in this patch. I am attaching the benchmark script I have > > used. I compared the time of the final refresh query in the script. (I > > would love if pgbench could take a custom init script to run before > > the timed part of the script.) > > > > > > Mitar > > > > On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote: > > > > > > Hi! > > > > > > Based on discussion about observing changes on an open query in a > > > reactive manner (to support reactive web applications) [1], I > > > identified that one critical feature is missing to fully implement > > > discussed design of having reactive queries be represented as > > > materialized views, and changes to these materialized views would then > > > be observed and pushed to the client through LISTEN/NOTIFY. > > > > > > This is my first time contributing to PostgreSQL, so I hope I am > > > starting this process well. > > > > > > I would like to propose that support for AFTER triggers are added to > > > materialized views. I experimented a bit and it seems this is mostly > > > just a question of enabling/exposing them. See attached patch. This > > > enabled me to add trigger to a material view which mostly worked. Here > > > are my findings. > > > > > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both > > > per statement and per row. There are few improvements which could be > > > done: > > > > > > - Currently only insert and remove operations are done on the > > > materialized view. This is because the current logic just removes > > > changed rows and inserts new rows. > > > - In current concurrently refresh logic those insert and remove > > > operations are made even if there are no changes to be done. Which > > > triggers a statement trigger unnecessary. A small improvement could be > > > to skip the statement in that case, but looking at the code this seems > > > maybe tricky because both each of inserts and deletions are done > > > inside one query each. > > > - Current concurrently refresh logic does never do updates on existing > > > rows. It would be nicer to have that so that triggers are more aligned > > > with real changes to the data. So current two queries could be changed > > > to three, each doing one of the insert, update, and delete. > > > > > > Non-concurrent refresh does not trigger any trigger. But it seems all > > > data to do so is there (previous table, new table), at least for the > > > statement-level trigger. Row-level triggers could also be simulated > > > probably (with TRUNCATE and INSERT triggers). > > > > > > [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com > > > > > > > > > Mitar > > > > > > -- > > > http://mitar.tnode.com/ > > > https://twitter.com/mitar_m > > > > > > > > -- > > http://mitar.tnode.com/ > > https://twitter.com/mitar_m > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Hi! False alarm. It just looks like updating patches takes longer than updating e-mails. Mitar On Fri, Dec 28, 2018 at 12:11 AM Mitar <mmitar@gmail.com> wrote: > > Hi! > > Hm, why in commitfest it does not display the latest patch? > > https://commitfest.postgresql.org/21/1953/ > > It does display correctly the latest e-mail, but not the link to the patch. :-( > > > Mitar > > On Thu, Dec 27, 2018 at 11:51 PM Mitar <mmitar@gmail.com> wrote: > > > > Hi! > > > > One more version of the patch with slightly more deterministic tests. > > > > > > Mitar > > > > On Thu, Dec 27, 2018 at 11:43 PM Mitar <mmitar@gmail.com> wrote: > > > > > > Hi! > > > > > > I have made an updated version of the patch, added tests and > > > documentation changes. This is my view now a complete patch. Please > > > provide any feedback or comments you might have for me to improve the > > > patch. I will also add it to commitfest. > > > > > > A summary of the patch: This patch enables adding AFTER triggers (both > > > ROW and STATEMENT) on materialized views. They are fired when doing > > > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed. > > > Triggers are not fired if you call REFRESH without CONCURRENTLY. This > > > is based on some discussion on the mailing list because implementing > > > it for without CONCURRENTLY would require us to add logic for firing > > > triggers where there was none before (and is just an efficient heap > > > swap). > > > > > > To be able to create a materialized view without data, specify > > > triggers, and REFRESH CONCURRENTLY so that those triggers would be > > > called also for initial data, I have tested and determined that there > > > is no reason why REFRESH CONCURRENTLY could not be run on > > > uninitialized materialized view. So I removed that check and things > > > seem to just work. Including triggers being called for initial data. I > > > think this makes REFRESH CONCURRENTLY have one less special case which > > > is in general nicer. > > > > > > I have run tests and all old tests still succeed. I have added more > > > tests for the new feature. > > > > > > I have run benchmark to evaluate the impact of me changing > > > refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for > > > rows which were just updated. In fact it seems this improves > > > performance slightly (4% in my benchmark, mean over 10 runs). I guess > > > that this is because it is cheaper to just change one column's values > > > (what benchmark is doing when changing rows) instead of removing and > > > inserting the whole row. Because REFRESH MATERIALIZED VIEW > > > CONCURRENTLY is meant to be used when not a lot of data has been > > > changed anyway, I find this a pleasantly surprising additional > > > improvement in this patch. I am attaching the benchmark script I have > > > used. I compared the time of the final refresh query in the script. (I > > > would love if pgbench could take a custom init script to run before > > > the timed part of the script.) > > > > > > > > > Mitar > > > > > > On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote: > > > > > > > > Hi! > > > > > > > > Based on discussion about observing changes on an open query in a > > > > reactive manner (to support reactive web applications) [1], I > > > > identified that one critical feature is missing to fully implement > > > > discussed design of having reactive queries be represented as > > > > materialized views, and changes to these materialized views would then > > > > be observed and pushed to the client through LISTEN/NOTIFY. > > > > > > > > This is my first time contributing to PostgreSQL, so I hope I am > > > > starting this process well. > > > > > > > > I would like to propose that support for AFTER triggers are added to > > > > materialized views. I experimented a bit and it seems this is mostly > > > > just a question of enabling/exposing them. See attached patch. This > > > > enabled me to add trigger to a material view which mostly worked. Here > > > > are my findings. > > > > > > > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both > > > > per statement and per row. There are few improvements which could be > > > > done: > > > > > > > > - Currently only insert and remove operations are done on the > > > > materialized view. This is because the current logic just removes > > > > changed rows and inserts new rows. > > > > - In current concurrently refresh logic those insert and remove > > > > operations are made even if there are no changes to be done. Which > > > > triggers a statement trigger unnecessary. A small improvement could be > > > > to skip the statement in that case, but looking at the code this seems > > > > maybe tricky because both each of inserts and deletions are done > > > > inside one query each. > > > > - Current concurrently refresh logic does never do updates on existing > > > > rows. It would be nicer to have that so that triggers are more aligned > > > > with real changes to the data. So current two queries could be changed > > > > to three, each doing one of the insert, update, and delete. > > > > > > > > Non-concurrent refresh does not trigger any trigger. But it seems all > > > > data to do so is there (previous table, new table), at least for the > > > > statement-level trigger. Row-level triggers could also be simulated > > > > probably (with TRUNCATE and INSERT triggers). > > > > > > > > [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com > > > > > > > > > > > > Mitar > > > > > > > > -- > > > > http://mitar.tnode.com/ > > > > https://twitter.com/mitar_m > > > > > > > > > > > > -- > > > http://mitar.tnode.com/ > > > https://twitter.com/mitar_m > > > > > > > > -- > > http://mitar.tnode.com/ > > https://twitter.com/mitar_m > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 28/12/2018 08:43, Mitar wrote: > A summary of the patch: This patch enables adding AFTER triggers (both > ROW and STATEMENT) on materialized views. They are fired when doing > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed. What bothers me about this patch is that it subtly changes what a trigger means. It currently means, say, INSERT was executed on this table. You are expanding that to mean, a row was inserted into this table -- somehow. Triggers should generally refer to user-facing commands. Could you not make a trigger on REFRESH itself? > Triggers are not fired if you call REFRESH without CONCURRENTLY. This > is based on some discussion on the mailing list because implementing > it for without CONCURRENTLY would require us to add logic for firing > triggers where there was none before (and is just an efficient heap > swap). This is also a problem, because it would allow bypassing the trigger accidentally. Moreover, consider that there could be updatable materialized views, just like there are updatable normal views. And there could be triggers on those updatable materialized views. Those would look similar but work quite differently from what you are proposing here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi! I am new to contributing to PostgreSQL and this is my first time having patches in commit fest, so I am not sure about details of the process here, but I assume that replying and discuss the patch during this period is one of the actives, so I am replying to the comment. If I should wait or something like that, please advise. On Fri, Jan 4, 2019 at 3:23 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > A summary of the patch: This patch enables adding AFTER triggers (both > > ROW and STATEMENT) on materialized views. They are fired when doing > > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed. > > What bothers me about this patch is that it subtly changes what a > trigger means. It currently means, say, INSERT was executed on this > table. You are expanding that to mean, a row was inserted into this > table -- somehow. Aren't almost all statements these days generated by some sort of automatic logic? Which generates those INSERTs and then you get triggers on them? I am not sure where is this big difference in your view coming from? Triggers are not defined as "user-made INSERT was executed on this table". I think it has always been defined as "INSERT modified this table", no matter where this insert came from (from user, from some other trigger, by backup process). I mean, this is the beauty of declarative programming. You define it once and you do not care who triggers it. Materialized views are anyway just built-in implementation of tables + triggers to rerun the query. You could reconstruct them manually. And why would not triggers be called if tables is being modified through INSERTs? So if PostgreSQL has such a feature, why make it limited and artificially make it less powerful? It is literally not possible to have triggers only because there is "if trigger on a materialized view, throw an exception". > Triggers should generally refer to user-facing commands So triggers on table A are not run when some other trigger from table B decides to insert data into table A? Not true. I think triggers have never cared who and where an INSERT came from. They just trigger. From user, from another trigger, or from some built-in PostgreSQL procedure called REFRESH. > Could you not make a trigger on REFRESH itself? If you mean if I could simulate this somehow before or after I call REFRESH, then not really. I would not have access to previous state of the table to compute the diff anymore. Moreover, I would have to recompute the diff again, when REFRESH already did it once. I could implement materialized views myself using regular tables and triggers. And then have triggers after change on that table. But this sounds very sad. Or, are you saying that we should introduce a whole new type of of trigger, REFRESH trigger, which would be valid only on materialized views, and get OLD and NEW relations for previous and old state? I think this could be an option, but it would require much more work, and more changes to API. Is this what community would prefer? > This is also a problem, because it would allow bypassing the trigger > accidentally. Sure, this is why it is useful to explain that CONCURRENT REFRESH uses INSERT/UPDATE/DELETE and this is why you get triggers, and REFRESH does not (but it is faster). I explained this in documentation. But yes, this is downside. I checked the idea of calling row-level triggers after regular REFRESH, but it seems it will introduce a lot of overhead and special handling. I tried implementing it as TRUNCATE + INSERTS instead of heap swap and it is 2x slower. > Moreover, consider that there could be updatable materialized views, > just like there are updatable normal views. And there could be triggers > on those updatable materialized views. Those would look similar but > work quite differently from what you are proposing here. Hm, not really. I would claim they would behave exactly the same. AFTER trigger on INSERT on a materialized view would trigger for rows which have changed through user updating materialized view directly, or by calling CONCURRENT REFRESH which inserted a row. In both cases the same trigger would run because materialized view had a row inserted. Pretty nice. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Dear,
You can try https://github.com/ntqvinh/PgMvIncrementalUpdate to generate triggers in C for incremental updates of matviews.
For asynchronous updates, the tool does generate the triggers for collecting updated/inserted/deleted rows and then the codes for doing incremental updating as well.
Tks and best regards,
Vinh
On Sat, Jan 5, 2019 at 5:10 AM Mitar <mmitar@gmail.com> wrote:
Hi!
I am new to contributing to PostgreSQL and this is my first time
having patches in commit fest, so I am not sure about details of the
process here, but I assume that replying and discuss the patch during
this period is one of the actives, so I am replying to the comment. If
I should wait or something like that, please advise.
On Fri, Jan 4, 2019 at 3:23 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> > A summary of the patch: This patch enables adding AFTER triggers (both
> > ROW and STATEMENT) on materialized views. They are fired when doing
> > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
>
> What bothers me about this patch is that it subtly changes what a
> trigger means. It currently means, say, INSERT was executed on this
> table. You are expanding that to mean, a row was inserted into this
> table -- somehow.
Aren't almost all statements these days generated by some sort of
automatic logic? Which generates those INSERTs and then you get
triggers on them? I am not sure where is this big difference in your
view coming from? Triggers are not defined as "user-made INSERT was
executed on this table". I think it has always been defined as "INSERT
modified this table", no matter where this insert came from (from
user, from some other trigger, by backup process). I mean, this is the
beauty of declarative programming. You define it once and you do not
care who triggers it.
Materialized views are anyway just built-in implementation of tables +
triggers to rerun the query. You could reconstruct them manually. And
why would not triggers be called if tables is being modified through
INSERTs? So if PostgreSQL has such a feature, why make it limited and
artificially make it less powerful? It is literally not possible to
have triggers only because there is "if trigger on a materialized
view, throw an exception".
> Triggers should generally refer to user-facing commands
So triggers on table A are not run when some other trigger from table
B decides to insert data into table A? Not true. I think triggers have
never cared who and where an INSERT came from. They just trigger. From
user, from another trigger, or from some built-in PostgreSQL procedure
called REFRESH.
> Could you not make a trigger on REFRESH itself?
If you mean if I could simulate this somehow before or after I call
REFRESH, then not really. I would not have access to previous state of
the table to compute the diff anymore. Moreover, I would have to
recompute the diff again, when REFRESH already did it once.
I could implement materialized views myself using regular tables and
triggers. And then have triggers after change on that table. But this
sounds very sad.
Or, are you saying that we should introduce a whole new type of of
trigger, REFRESH trigger, which would be valid only on materialized
views, and get OLD and NEW relations for previous and old state? I
think this could be an option, but it would require much more work,
and more changes to API. Is this what community would prefer?
> This is also a problem, because it would allow bypassing the trigger
> accidentally.
Sure, this is why it is useful to explain that CONCURRENT REFRESH uses
INSERT/UPDATE/DELETE and this is why you get triggers, and REFRESH
does not (but it is faster). I explained this in documentation.
But yes, this is downside. I checked the idea of calling row-level
triggers after regular REFRESH, but it seems it will introduce a lot
of overhead and special handling. I tried implementing it as TRUNCATE
+ INSERTS instead of heap swap and it is 2x slower.
> Moreover, consider that there could be updatable materialized views,
> just like there are updatable normal views. And there could be triggers
> on those updatable materialized views. Those would look similar but
> work quite differently from what you are proposing here.
Hm, not really. I would claim they would behave exactly the same.
AFTER trigger on INSERT on a materialized view would trigger for rows
which have changed through user updating materialized view directly,
or by calling CONCURRENT REFRESH which inserted a row. In both cases
the same trigger would run because materialized view had a row
inserted. Pretty nice.
Mitar
--
http://mitar.tnode.com/
https://twitter.com/mitar_m
Hi! On Sat, Jan 5, 2019 at 2:53 AM Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com> wrote: > You can try https://github.com/ntqvinh/PgMvIncrementalUpdate to generate triggers in C for incremental updates of matviews. > > For asynchronous updates, the tool does generate the triggers for collecting updated/inserted/deleted rows and then thecodes for doing incremental updating as well. Thank you for sharing this. This looks interesting, but I could not test it myself (not using Windows), so I just read through the code. Having better updating of materialized views using incremental approach would really benefit my use case as well. Then triggers being added through my patch here on materialized view itself could communicate those changes which were done to the client. If I understand things correctly, this IVM would benefit the speed of how quickly we can do refreshes, and also if would allow that we call refresh on a materialized view for every change on the source tables, knowing exactly what we have to update in the materialized view. Really cool. I also see that there was recently more discussion about IVM on the mailing list. [1] [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp [2] https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30@QQdd.eu Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 1/5/19 11:57 PM, Mitar wrote: > > Having better updating of materialized views using incremental > approach would really benefit my use case as well. Then triggers being > added through my patch here on materialized view itself could > communicate those changes which were done to the client. If I > understand things correctly, this IVM would benefit the speed of how > quickly we can do refreshes, and also if would allow that we call > refresh on a materialized view for every change on the source tables, > knowing exactly what we have to update in the materialized view. > Really cool. I also see that there was recently more discussion about > IVM on the mailing list. [1] There doesn't seem to be consensus on whether or not we want this patch. Peter has issues with the way it works and Andres [1] thinks it should be pushed to PG13 or possibly rejected. I'll push this to PG13 for now. Regards, -- -David david@pgmasters.net [1] https://www.postgresql.org/message-id/20190216054526.zss2cufdxfeudr4i%40alap3.anarazel.de
Hi! On Thu, Mar 7, 2019 at 12:13 AM David Steele <david@pgmasters.net> wrote: > There doesn't seem to be consensus on whether or not we want this patch. > Peter has issues with the way it works and Andres [1] thinks it should > be pushed to PG13 or possibly rejected. > > I'll push this to PG13 for now. Sorry, I am new to PostgreSQL development process. So this has been pushed for maybe (if at all) release planned for 2020 and is not anymore in consideration for PG12 to be released this year? From my very inexperienced eye this looks like a very far push. What is expected to happen in the year which would make it clearer if this is something which has a chance of going and/or what should be improved, if improving is even an option? I worry that nothing will happen for a year and we will all forget about this and then we will be all to square zero. I must say that i do not really see a reason why this would not be included. I mean, materialized views are really just a sugar on top of having a table you refresh with a stored query, and if that table can have triggers, why not also a materialized view. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 3/14/19 12:05 PM, Mitar wrote: > Hi! > > On Thu, Mar 7, 2019 at 12:13 AM David Steele <david@pgmasters.net> wrote: >> There doesn't seem to be consensus on whether or not we want this patch. >> Peter has issues with the way it works and Andres [1] thinks it should >> be pushed to PG13 or possibly rejected. >> >> I'll push this to PG13 for now. > > Sorry, I am new to PostgreSQL development process. So this has been > pushed for maybe (if at all) release planned for 2020 and is not > anymore in consideration for PG12 to be released this year? From my > very inexperienced eye this looks like a very far push. What is > expected to happen in the year which would make it clearer if this is > something which has a chance of going and/or what should be improved, > if improving is even an option? I worry that nothing will happen for a > year and we will all forget about this and then we will be all to > square zero. > > I must say that i do not really see a reason why this would not be > included. I mean, materialized views are really just a sugar on top of > having a table you refresh with a stored query, and if that table can > have triggers, why not also a materialized view. The reason is that you have not gotten any committer support for this patch or attracted significant review, that I can see. On the contrary, three committers have expressed doubts about all or some of the patch and it doesn't seem to me that their issues have been addressed. This is also a relatively new patch which makes large changes -- we generally like to get those in earlier than the second-to-last CF. I can only spend so much time looking at each patch, so Peter, Álvaro, or Andres are welcome to jump in and let me know if I have it wrong. What you need to be doing for PG13 is very specifically addressing committer concerns and gathering a consensus that the behavior of this patch is the way to go. Regards, -- -David david@pgmasters.net
Hi! On Fri, Mar 15, 2019 at 2:46 AM David Steele <david@pgmasters.net> wrote: > The reason is that you have not gotten any committer support for this > patch or attracted significant review, that I can see. On the contrary, > three committers have expressed doubts about all or some of the patch > and it doesn't seem to me that their issues have been addressed. To my understanding many comments were to early version of the patch which were or addressed or explained why proposed changes not work (use TRUNCATE/INSERT instead of swapping heads is slower). If I missed anything and have not addressed it, please point this out. The only pending/unaddressed comment is about the philosophical question of what it means to be a trigger. There it seems we simply disagree with the reviewer and I do not know how to address that. I just see this as a very pragmatical feature which provides features you would have if you would not use PostgreSQL abstraction. If you can use features then, why not also with the abstraction? So in my view this looks more like a lack of review feedback on the latest version of the patch. I really do not know how to ask for more feedback or to move the philosophical discussion further. I thought that commit fest is in fact exactly a place to motivate and collect such feedback instead of waiting for it in the limbo. > What you need to be doing for PG13 is very specifically addressing > committer concerns and gathering a consensus that the behavior of this > patch is the way to go. To my understanding the current patch addresses all concerns made by reviewers on older versions of the patch, or explains why proposals cannot work out, modulo the question of "does this change what trigger is". Moreover, it improves performance of CONCURRENT REFRESH for about 5% based on my tests because of the split to INSERT/UPDATE/DELETE instead of TRUNCATE/INSERT, when measuring across a mixed set of queries which include just UPDATEs to source tables. Thank you everyone who is involved in this process for your time, I do appreciate. I am just trying to explain that I am a bit at loss on concrete next steps I could take here. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
On 3/15/19 8:15 PM, Mitar wrote: > > The only pending/unaddressed comment is about the philosophical > question of what it means to be a trigger. There it seems we simply > disagree with the reviewer and I do not know how to address that. I > just see this as a very pragmatical feature which provides features > you would have if you would not use PostgreSQL abstraction. If you can > use features then, why not also with the abstraction? This seems to be a pretty big deal to me. When the reviewer is also a committer I think you need to give serious thought to their objections. > So in my view this looks more like a lack of review feedback on the > latest version of the patch. I really do not know how to ask for more > feedback or to move the philosophical discussion further. I thought > that commit fest is in fact exactly a place to motivate and collect > such feedback instead of waiting for it in the limbo. Yes, but sometimes these things take time. >> What you need to be doing for PG13 is very specifically addressing >> committer concerns and gathering a consensus that the behavior of this >> patch is the way to go. > > To my understanding the current patch addresses all concerns made by > reviewers on older versions of the patch, or explains why proposals > cannot work out, modulo the question of "does this change what trigger > is". Still a pretty important question... > Thank you everyone who is involved in this process for your time, I do > appreciate. I am just trying to explain that I am a bit at loss on > concrete next steps I could take here. This is the last commitfest, so committers and reviewers are focused on what is most likely to make it into PG12. Your patch does not seem to be attracting the attention it needs to make it into this release. Regards, -- -David david@pgmasters.net
On Tue, Dec 25, 2018 at 10:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Well, REFRESH CONCURRENTLY runs completely different than non-concurrent > REFRESH. The former updates the existing data by observing the > differences with the previous data; the latter simply re-runs the query > and overwrites everything. So if you simply enabled triggers on > non-concurrent refresh, you'd just see a bunch of inserts into a > throwaway data area (a transient relfilenode, we call it), then a swap > of the MV's relfilenode with the throwaway one. I doubt it'd be useful. > But then I'm not clear *why* you would like to do a non-concurrent > refresh. Maybe your situation would be best served by forbidding non- > concurrent refresh when the MV contains any triggers. > > Alternatively, maybe reimplement non-concurrent refresh so that it works > identically to concurrent refresh (except with a stronger lock). Not > sure if this implies any performance penalties. Sorry to jump in late, but all of this sounds very strange to me. It's possible for either concurrent or non-concurrent refresh to be faster, depending on the circumstances; for example, if a concurrent refresh would end up deleting all the rows and inserting them again, I think that could be slower than just blowing all the data away and starting over. So disabling non-concurrent refresh sounds like a bad idea. For the same reason, reimplementing it to work like a concurrent refresh also sounds like a bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 4, 2019 at 6:23 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > What bothers me about this patch is that it subtly changes what a > trigger means. It currently means, say, INSERT was executed on this > table. You are expanding that to mean, a row was inserted into this > table -- somehow. Yeah. The fact that a concurrent refresh currently does DELETE+INSERT rather than UPDATE is currently an implementation detail. If you allow users to hook up triggers to the inserts, then suddenly it's no longer an implementation detail: it is a user-visible behavior that can't be changed in the future without breaking backward compatibility. > Triggers should generally refer to user-facing commands. Could you not > make a trigger on REFRESH itself? I'm not sure that would help with the use case... but that seems like something to think about, especially if it could use the transition table machinery somehow. > > Triggers are not fired if you call REFRESH without CONCURRENTLY. This > > is based on some discussion on the mailing list because implementing > > it for without CONCURRENTLY would require us to add logic for firing > > triggers where there was none before (and is just an efficient heap > > swap). > > This is also a problem, because it would allow bypassing the trigger > accidentally. > > Moreover, consider that there could be updatable materialized views, > just like there are updatable normal views. And there could be triggers > on those updatable materialized views. Those would look similar but > work quite differently from what you are proposing here. Yeah. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 21, 2019 at 03:41:08PM -0400, Robert Haas wrote: > Yeah. The fact that a concurrent refresh currently does DELETE+INSERT > rather than UPDATE is currently an implementation detail. If you > allow users to hook up triggers to the inserts, then suddenly it's no > longer an implementation detail: it is a user-visible behavior that > can't be changed in the future without breaking backward > compatibility. We are visibly going nowhere for this thread for v12, so I have marked the proposal as returned with feedback. -- Michael