Thread: Postgresql Materialized views
Dear Friends, In my past development projects, I always used the concept of Materialized VIEW to speed-up SELECTs over INSERTs. You are well aware of Jonathan Gardner preliminary work: http://www.varlena.com/varlena/GeneralBits/Tidbits/matviews.html When do you plan to add MATERIALIZED VIEWS to PostgreSQL? This would be major improvement ni the case of Web applications. I run a 400.000+ message board using PhpBB 3.0. After optimization, some queries still need 30 millisecond to run. With Materialized views, it should be possible to drive these queries to 1 millisecond. This means that in some situations a PostgreSQL backend could handle 10 times more queries. My database handles 10 to 20 queries every second. There are 100 selects for 1 INSERT. But my database could well handle over 500 queries a second using materialized views. At my level, here are my plans: 1) Publish some long query LOGs from my database, longuer than 30 milliseconds. 2) Write some PL code to demonstrate the interest in Materialized Views. Publish benckmarks showing time improvement, like 1 milisecond. 30x faster. 3) Then wait for someone on Hackers mailing list to pick-up this important issue and integrate Materialized views in PostgreSQL schema and SQL language. Any information and discussion about materialized views is welcome. Please pick-up this important issue for developpers. There is no need to concentrate on complex issues, when handling materialized views could boost somme web apps. by a factor of 10 or more. Kind regards and happy new year. I hope that 2008 will be the year of materialized views. Jean-Michel Pouré
Jean-Michel Pouré wrote: > In my past development projects, I always used the concept of > Materialized VIEW to speed-up SELECTs over INSERTs Unless you are going to *pay* for it - you do realize that the best way to get it implemented, would be to open up the source code, and give it a try yourself? If it was so easy, and such a clear win, I think one of the very competent people using PostgreSQL today would have already done it? Cheers, mark -- Mark Mielke <mark@mielke.cc>
On 12/01/2008, Mark Mielke <mark@mark.mielke.cc> wrote: > Jean-Michel Pouré wrote: > > In my past development projects, I always used the concept of > > Materialized VIEW to speed-up SELECTs over INSERTs > > Unless you are going to *pay* for it - you do realize that the best way > to get it implemented, would be to open up the source code, and give it > a try yourself? In fairness to Jean-Michel, he has spent hundreds of hours in the past doing just that and far more for the pgAdmin users in the community - I'm sure we can excuse him for asking for what many do think would be a useful feature in the hopes that someone listening might just decide to pick it up. In the meantime though - have you tried rolling your own materialised views with some triggers Jean-Michel? I have good results doing that in the past. Regards, Dave
Dave Page wrote: <blockquote cite="mid:937d27e10801121029n10fd242gc5a7089e8905b0dc@mail.gmail.com" type="cite"><pre wrap="">On12/01/2008, Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mark.mielke.cc"><mark@mark.mielke.cc></a>wrote: </pre><blockquote type="cite"><pre wrap="">Jean-MichelPouré wrote: </pre><blockquote type="cite"><pre wrap="">In my past development projects, I always usedthe concept of Materialized VIEW to speed-up SELECTs over INSERTs </pre></blockquote><pre wrap="">Unless you are going to *pay* forit - you do realize that the best way to get it implemented, would be to open up the source code, and give it a try yourself? </pre></blockquote><pre wrap=""> In fairness to Jean-Michel, he has spent hundreds of hours in the past doing just that and far more for the pgAdmin users in the community - I'm sure we can excuse him for asking for what many do think would be a useful feature in the hopes that someone listening might just decide to pick it up. In the meantime though - have you tried rolling your own materialised views with some triggers Jean-Michel? I have good results doing that in the past</pre></blockquote> I'm not good with names - I suppose Jean-Michel should be asking his benefactors to returnthe favour then? :-)<br /><br /> In my own case - I use a combination of triggers and application to maintain materializedviews - but the subject does seem complex to me.<br /><br /> The last two uses of materialized views I used:<br/><br /> Counts, because as we all know, PostgreSQL count(*) is slow, and in any case, my count(*) is not on thewhole table, but on a subset. Doing this in a general way seems complex to me as it would need to be able to evaluatewhether a given INSERT or UPDATE or one of the dependent tables would impact the WHERE clause for the materializedview, and it still wouldn't know which rows to add/update/remove without detailed analysis, so it would eitherbe throwing out the entire materialized view and recreating it on INSERT or UPDATE (or deferring until the next query?)in which case it may be very slow, or it may be very complex.<br /><br /> Another one that I use is a complex joinof several tables, and merging 1:N tables including aggregate queries into a 1:1 materialized view. I see this as thesame problem where it needs to do dependency analysis, and it still doesn't know how to INSERT/UPDATE/DELETE materializedrows without complex analysis forcing a re-build. In my case, it is 1 ms to query my materialized view and 1500ms to rebuild the materialized view. I do NOT want to rebuild this view after every update.<br /><br /> In summary, Idon't think materialized views is an easy thing to do. Perhaps the very simplest of cases - but the simplest of cases canbe easily managed with triggers or application logic.<br /><br /> Cheers,<br /> mark<br /><br /><pre class="moz-signature"cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 12 Jan 2008 10:26:49 -0500 Mark Mielke <mark@mark.mielke.cc> wrote: > Jean-Michel Pouré wrote: > > In my past development projects, I always used the concept of > > Materialized VIEW to speed-up SELECTs over INSERTs > > Unless you are going to *pay* for it - you do realize that the best > way to get it implemented, would be to open up the source code, and > give it a try yourself? > > If it was so easy, and such a clear win, I think one of the very > competent people using PostgreSQL today would have already done it? > No actually, and your reply is offensive. There are a lot of things PostgreSQL is missing that are "easy" and a clear win, yet people still don't do them. A simple one is the ridiculous usage of pg_dump and pg_dumpall. Or that we can't use pg_restore to use the plain text backup. I think his email was very well written and a simple request of discussion of alternatives as well as future plans. Sincerely, Joshua D. Drake > Cheers, > mark > - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHiQ4LATb/zqfZUUQRAvEpAKCmLzjPg1+95ddWHBqogK4Ea981AgCfa4FG 2AVhx/5BNwYbCta086Iz1oo= =ng7C -----END PGP SIGNATURE-----
Mark Mielke wrote: > Counts, because as we all know, PostgreSQL count(*) is slow, and in > any case, my count(*) is not on the whole table, but on a subset. > Doing this in a general way seems complex to me as it would need to be > able to evaluate whether a given INSERT or UPDATE or one of the > dependent tables would impact the WHERE clause for the materialized > view, and it still wouldn't know which rows to add/update/remove > without detailed analysis, so it would either be throwing out the > entire materialized view and recreating it on INSERT or UPDATE (or > deferring until the next query?) in which case it may be very slow, or > it may be very complex. Bah. I forgot to add: The feature I've been wondering about (and not necessarily looking for somebody else to do, although I don't think I know the code well enough to do it at this point): Web applications often make the same queries over and over. While memcache can be used to cache results, the memcache interface is different from the web application interfere requiring complex code, and as well, one loses the transaction guarantees as the memcache results are not guaranteed to be up-to-date with the database. I see the greatest overall performance gain for web applications to be for PostgreSQL to hang on to the results of the previous X queries along with transactions numbers of each of the dependent tables as of the snapshot of the table that is used, and if one of them matches, return the results immediately. I believe MySQL does this (although not sure how reliable their implementation is). I believe I have seen this subject talked about on this list in the past. For web applications, I believe this gives most of the benefits that materialized views would provide, with less of the costs? Cheers, mark -- Mark Mielke <mark@mielke.cc>
On Sat, 2008-01-12 at 13:27 +0100, Jean-Michel Pouré wrote: > Please pick-up this important issue for developpers. There is no need to > concentrate on complex issues, when handling materialized views could > boost somme web apps. by a factor of 10 or more. It's more complex than you think, but the main reason was that HOT was a prerequisite for making summary tables work efficiently, which is only now just about to go live into 8.3 -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Joshua D. Drake wrote:<br /><blockquote cite="mid:20080112105920.561e97d4@commandprompt.com" type="cite"><blockquote type="cite"><prewrap="">Unless you are going to *pay* for it - you do realize that the best way to get it implemented, would be to open up the source code, and give it a try yourself? If it was so easy, and such a clear win, I think one of the very competent people using PostgreSQL today would have already done it? </pre></blockquote><pre wrap="">No actually, and yourreply is offensive. There are a lot of things PostgreSQL is missing that are "easy" and a clear win, yet people still don't do them. A simple one is the ridiculous usage of pg_dump and pg_dumpall. Or that we can't use pg_restore to use the plain text backup. I think his email was very well written and a simple request of discussion of alternatives as well as future plans. </pre></blockquote> Offensive is relative. I find it offensive when peopledemand things on one of the many mailing lists I read without providing anything to the community.<br /><br /> I didn'trealize the original poster did not fit this class of person. For this, I apologize. As for tone - I don't see anythingtechnically wrong with my response. The best way to get something done *is* to pay for it, or do it yourself. It'sa tried and true practice in the open source community. Also, I do not think it is as easy as you say - but feel freeto continue the discussion and prove how idiotic I am for calling the problem "not easy". :-)<br /><br /> Cheers,<br/> mark<br /><br /><pre class="moz-signature" cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
On Jan 12, 2008 5:31 PM, Mark Mielke <mark@mark.mielke.cc> wrote: > Joshua D. Drake wrote: >> I think his email was very well written and a simple request of >> discussion of alternatives as well as future plans. Agreed, JD. > Offensive is relative. I find it offensive when people demand things on one > of the many mailing lists I read without providing anything to the > community. I have to agree with JD. Your response was quite negative. Similarly, your statement, "If it was so easy, and such a clear win, I think one of the very competent people using PostgreSQL today would have already done it?" is misplaced. Jean-Michel didn't say it was easy at all, he just explained some of the benefits. Having used materialized views for years, I can tell you they are an important feature Postgres currently lacks (in native form). Also, as I have personally looked into implementing materialized views in Postgres, I can tell you it's not too difficult. However, as Simon stated, it's an iterative process. You don't need to jump on someone for a well-written request with a specific use-case. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
On Sat, 12 Jan 2008, Jean-Michel Pour� wrote: > You are well aware of Jonathan Gardner preliminary work: > http://www.varlena.com/varlena/GeneralBits/Tidbits/matviews.html There's also PostgreSQL::Snapshots ; intro at http://cunha17.cristianoduarte.pro.br/postgresql/snapshots.en_us.php and main project page at http://pgfoundry.org/projects/snapshot/ > I run a 400.000+ message board using PhpBB 3.0...My database handles 10 > to 20 queries every second. There are 100 selects for 1 INSERT. But my > database could well handle over 500 queries a second using materialized > views. I hope you don't take this the wrong way, but if you can't hit 500 queries/second on that volume of messages I would guess that something is wrong with either the design scalability of the PhpBB software running against a PostgreSQL database or some detail of how you've got it setup. A quick read suggests it's not unusual for people to drop PhpBB and use something vBulletin instead exactly because of PhpBB's issues handling larger communities. It's probably out of date but I found the discussion of query optimization for larger message boards at http://www.phpbb.com/community/viewtopic.php?t=135383 to be informative on this topic. > At my level, here are my plans: > 1) Publish some long query LOGs from my database, longuer than 30 > milliseconds. Rather than chasing after core product features that are some distance off, I think what might be more productive for you in the short term is to collect this information--including EXPLAIN ANALYZE plans--and include it along with information about your server and how the postgresql.conf is setup in a message to the performance list. That might get you immediate suggestions. In addition to "query is badly written/indexed for PostgreSQL" (which is another potential side to the query issues discussed in the phpbb forum topic I referenced), problems you might not have caught that could be nailing you include things like not allocating enough memory for use by the database and tables not being analyzed frequently enough. I would love to have materialized views in the core database. But it's a ways off no matter what, is moderately hard to accomplish, and you can emulate some of the benefits using things like Gardner's trigger-based approach. All of that makes it harder to kick off such a project. I don't think you need to convince anyone that it's important--the occasional person has been screaming about needing this feature for years now. The real question is who cares enough about the feature that it's worth their trouble to fund development, and I'm not sure whether your personal attempts to rouse demand will impact that. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> In fairness to Jean-Michel, he has spent hundreds of hours in the past > doing just that and far more for the pgAdmin users in the community - > I'm sure we can excuse him for asking for what many do think would be > a useful feature in the hopes that someone listening might just decide > to pick it up. > > In the meantime though - have you tried rolling your own materialised > views with some triggers Jean-Michel? I have good results doing that > in the past. Dear friends, For of all, thanks for picking up my message and replying. I agree with Dave Page and others that it is hard to find contributors for a Free project. When writing 2% of pgAdmin2 code, I noticed that you could count 10.000 end-users for ONE developper. This is why I usually offer a bounty for any kind of feature. Two months ago, my last bounty for a Kdenlive feature (100€), but no-one aggreed to receive the money after developing the feature, probably because there were dozens of contibutors (people writing code on top of someone else code). But the power of PostgreSQL is to be a real community, like Xorg or Apache. This makes all the difference. This was the heart of my message. This being said, our behind the scene needs are: * We are a non-profit organisation based in France, providing real-estate listing services. * We are going to spend 2000€ buying two servers (Phantom 4-die + 8GB 2U servers), running on Debian: one for PostgreSQL, the other for Apache2. * We plan to launch the first virtual real-estate agency managed by a free community. * We are going to use a mixture of PhpBB 3.0, Drupal and a custom applications. * We plan to handle thousands of simultaneous users. So ... Based on my current logs, I know that I may have to buy and run a web farm. Which I do not want, because it implies extra hosting costs. AND I know that ONE PostgreSQL database can handle 1000 simultaneous queries, when using server-side code. I wrote a small PostgreSQL query optimization HOWTO : http://area51.phpbb.com/phpBB/viewtopic.php?f=3&t=29292 Pardon me if some information is not accurate. I tried my best to educate PhpBB developers choosing the right technology. PostgreSQL... Now, I am going to write the requested server-side code for PhpBB 3.0 Then I will benchmark queries on our large server. In the end I hope that someone will be willing to pick-up this MATERIALIZED VIEW issue, so that any applications running on PostgreSQL can benefit from "lightning speed". If someone is willing to receive a bounty, please contact me. Being a non-profit organisation, I can ask money to contributors on our web site. This will not be any kind of large sum of money, something between 100€ and 500€. Kind regards and happy new year. Jean-Michel Pouré
Mark Mielke wrote: > Joshua D. Drake wrote: >>> Unless you are going to *pay* for it - you do realize that the best >>> way to get it implemented, would be to open up the source code, and >>> give it a try yourself? >>> Because users possibly want to do that - use it? Some of us have better things to do than go through the learning curve of how the internals of a non-trivial system work. Does that really mean its unreasonable to voice an opinion of what would make the system more useful? >> > Offensive is relative. I find it offensive when people demand things > on one of the many mailing lists I read without providing anything to > the community. > If your view of the community is that it should be insular and closed to those who can't or won't be developers, then fine. But taking that attitude will pretty much guarantee that your system will never amount to more than a hill of beans. One of the major problems with open source as a whole is that you get this 'fix it yourself or pay for it' business which provides no way to spread the cost over many users who would all have something to gain - but none of whom can justify footing the bill for the entire development. Most of us are in that position as users, even if we do have skills that would enable us to help - we have our own issues to deal with. Buying support isn't the answer - its not support that's needed, after all, so much as an ability to buy a share of influence over a roadmap.. Do you want ensure that only the very rich in cash or time can have any influence? You're going the right way about it with your attitude, which appears deliberately user-hostile. What do you want? James
FYI: I don't like being attacked for stating the truth, nor distracting the mailing list with these emotional discussions. However, there are things that need to be clarified. Feel free to kill the thread in your mail browser. James Mansion wrote: > Mark Mielke wrote: >> Joshua D. Drake wrote: >>>> Unless you are going to *pay* for it - you do realize that the best >>>> way to get it implemented, would be to open up the source code, and >>>> give it a try yourself? > Because users possibly want to do that - use it? Some of us have > better things to do than go > through the learning curve of how the internals of a non-trivial > system work. Does that really > mean its unreasonable to voice an opinion of what would make the > system more useful? It's unreasonable to demand or set a time schedule like "drop everything and work on materialized views in 2008 because you are wasting your time on other things - web developers can get 10X improvement today if you do this one thing". It's not unreasonable to ask for support for an idea. As it was, I believe I misinterpreted Jean-Michel's message as a demand (the words seemed demanding), whereas it may have been a plea. I was far more sympathetic to his second post. >> Offensive is relative. I find it offensive when people demand things >> on one of the many mailing lists I read without providing anything to >> the community. >> > If your view of the community is that it should be insular and closed > to those who can't or won't be developers, then fine. But taking that > attitude will pretty much guarantee that your system will never amount > to more than a hill of beans. First point: I don't speak for the developers and it is not my system. Second point: This has nothing to do with insular vs non-insular community. Everything I stated was true - you may not like what I stated, but it was true. It's possible I could have adjusted a word or two, or added a smiley to get an affect you would find more pleasing, but there was nothing incorrect about what I stated. The best way to get a feature in is to pay for somebody to do it, or do it yourself. If you need reference points look to the last 20 large features that went into PostgreSQL. See how it progresses. Also, people have been working on materialized views on and off for years with differing degrees of success. It is *NOT* the simple problem that people make it out to be. (Although it seems that some people may have a good grasp of the problem, and may be verging on a solution) > One of the major problems with open source as a whole is that you get > this 'fix it yourself or pay for it' business which provides no way to > spread the cost over many users who would all have something to gain - > but none of whom can justify footing the bill for the entire > development. Most of us are in that position as users, even if we do > have skills that would enable us to help - we have our own issues to > deal with. Buying support isn't the answer - its not support that's > needed, after all, so much as an ability to buy a share of influence > over a roadmap.. Open source is a double-edged sword for sure. This does not give any right to demand anything. If you want to work on something *with* other people, the message is "I am looking at doing tihs, but don't know how to start - does anybody with similar interest want to help?" This is *not* the message that was presented. > Do you want ensure that only the very rich in cash or time can have > any influence? You're going the right way about it with your > attitude, which appears deliberately user-hostile. > What do you want? I want people to respect the few developers we have, because I value them and respect them. As for my "attitude" - words in email are hardly capable of showing emotion, especially something as short as I wrote. You will think what you wish of my words, and your decision was already made the moment you ignored the fact that I spoke truth and focused only on the tone. I have an apology for Jean-Michel. I don't have an apology for people such as you who took this opportunity to attack me. Pot, kettle, black. Cheers, mark -- Mark Mielke <mark@mielke.cc>
James Mansion wrote: > If your view of the community is that it should be insular and closed to > those who can't or won't be developers, then fine. But taking that > attitude will pretty much guarantee that your system will never amount to > more than a hill of beans. Keep in mind that Mark Mielke is not a PostgreSQL developer. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Jan 12, 2008 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2008-01-12 at 13:27 +0100, Jean-Michel Pouré wrote: > > > Please pick-up this important issue for developpers. There is no need to > > concentrate on complex issues, when handling materialized views could > > boost somme web apps. by a factor of 10 or more. > > It's more complex than you think, but the main reason was that HOT was a > prerequisite for making summary tables work efficiently, which is only > now just about to go live into 8.3 +1 If you know how to write triggers, materialization techniques aren't all that difficult. The real technical limitation was not lack of materialization techniques (write triggers), but was dealing with the mvcc penalty. Previously to HOT, for summary tables I would redirect the trigger to insert to a 'roll up' table and move the data to the summary on cron or from an application event. Materialized views are syntax sugar (but still very sweet). merlin
<sarcasm>Good to see</sarcasm> things haven't changed, and requests for features and improvements on the pgsql-hackers list can still degenerate rapidly into a discussion about how to request features and improvements. As Joshua Drake has pointed out before, most of the core people working on PostgreSQL don't actually use it for anything themselves. I will expand a little on that and say that this means that while they are extremely good at what they do, they really don't have a clue what might be useful to someone "in the wild". Sort of like automotive engineers who in the 1970's made the Cadillac's engine so large that you couldn't change the spark plugs without taking the motor mounts loose and lifting the engine. The basic question this brings up in the context of this latest flurry of hurt feelings is whether user demand is officially a driving force in PostgreSQL development. If the answer to that question is yes, then the next question is how is that structured? I'm not sure that the collective answer to the first question is actually yes. If it actually is yes, then the next question has barely been touched, as witnessed by these flurries of electrons on the list, unless the structure is Anarchy. Which is oxymoronic of course. There doesn't appear to be an easy way to officially "take the temperature" of either the developer community, or the user community, and there certainly is no official way to clearly and easily communicate between the two in order to effect change. Unfortunately for all of us, the communications, social, organizational, and people skills/talents necessary to envision and create the type of social structure that benefits the entire community are outside the range of experience of everyone on this list. How do I know that? Because if even one person had those talents/skills it would have happened already. That is what those type of people do, they can't help themselves.* I'm not sure that anyone reading this would even be able to recognize such a person if they met them. Perhaps we should go fishing for some help from one of those "University Places"? From people outside the Computer Science department? Maybe even some people in Industrial Psychology? Somebody probably needs a Master's project.... -- Sean Utt * Actually I'm being optimistic, while organizers are compulsive, they know a hopeless cause when they see it, and quietly disappear. I'm operating under the assumption that the PostgreSQL community is not a hopeless cause organizationally. [entire original message deleted for lack of usefulness]
Sean Utt wrote: > <sarcasm>Good to see</sarcasm> things haven't changed, and requests for > features and improvements on the pgsql-hackers list can still degenerate > rapidly into a discussion about how to request features and improvements. > > As Joshua Drake has pointed out before, most of the core people working > on PostgreSQL don't actually use it for anything themselves. <snip> > Unfortunately for all of us, > the communications, social, organizational, and people skills/talents > necessary to envision and create the type of social structure that > benefits the entire community are outside the range of experience of > everyone on this list. How do I know that? Because if even one person > had those talents/skills it would have happened already. That is what > those type of people do, they can't help themselves.* I'm not sure that > anyone reading this would even be able to recognize such a person if > they met them. I would argue that they know and recognize them but... they don't care. They are -hackers. They care about code. Some of them in a tertiary manner care about the entire paragraph above, others not at all. I think if you look around what you will see is disparate group of people all organizing the community in their own particular way for the benefit of the community as a whole. Consider the following, in the last 2 years we have had a number of significant community events happen that allow for exactly what you are stating above. Formation of the Fundraising group and affiliation with Software in the Public Interest. This allowed us to execute the Anniversary, PostgreSQL Conference West, as well as the upcoming EAST conference. It also allowed to to provide help to PgDay.IT for last years Italian PostgreSQL day. Since that time they have also formed ITPUG and Italian non-profit for PostgreSQL. Lastly this affiliation has allowed allowed the community as a whole to sponsor travel for PostgreSQL speakers to various conferences and support major events such as OSCON and LinuxWorld. > Perhaps we should go fishing for some help from one of > those "University Places"? From people outside the Computer Science > department? Maybe even some people in Industrial Psychology? Somebody > probably needs a Master's project.... Actually you just need to not be on -hackers :). This belongs over in -advocacy. Sincerely, Joshua D. Drake
Just my two cents on this (rapidly degenerating) thread. On 1/13/08, Sean Utt <sean@strateja.com> wrote: > <sarcasm>Good to see</sarcasm> things haven't changed, and requests for > features and improvements on the pgsql-hackers list can still degenerate > rapidly into a discussion about how to request features and improvements. > > As Joshua Drake has pointed out before, most of the core people working on > PostgreSQL don't actually use it for anything themselves. I will expand a > little on that and say that this means that while they are extremely good at > what they do, they really don't have a clue what might be useful to someone > "in the wild". Sort of like automotive engineers who in the 1970's made the > Cadillac's engine so large that you couldn't change the spark plugs without > taking the motor mounts loose and lifting the engine. As a very satisfied Postgres "customer", I take exception to the comparison. <SNIP> > There doesn't appear to be an easy way to officially "take the temperature" > of either the developer community, or the user community, and there > certainly is no official way to clearly and easily communicate between the > two in order to effect change. Huh? A politely worded feature request generally gets discussed and then put on the TODO list if there is some consensus about its usefulness. If there is no consensus, then the requester usually has to do more work, which might involve prototyping some code etc. Admittedly, some developers get grumpy sometimes, but, as the man said, "Let him who is without sin throw the first stone..." There was an issue with the tone of the request for material views in the beginning of this thread, but that seems to ironed out among those who are actually interested in accomplishing something. May I propose the following: (1) can we put materialized views on the TODO, perhaps as a library or as core, still subject to a lot of design work and with no particular deadline? (2) Can we discontinue this particular flame war about the responsive of the developers (who in my estimation do a huge amount of work that benefits me immensely with nary a "thank you")? Sorry for the meta rant, I just couldn't take it anymore.
"Webb Sprague" <webb.sprague@gmail.com> writes: > May I propose the following: (1) can we put materialized views on the > TODO, perhaps as a library or as core, still subject to a lot of > design work and with no particular deadline? Actually, I had thought they *were* on the TODO list, because certainly it's been suggested before. I can't find anything about 'em in the list though, so +1 for adding the entry. There are plenty of other TODO items that we have no idea how to do, so the lack of a clear design is hardly an objection ;-) regards, tom lane
Sean Utt wrote: > > As Joshua Drake has pointed out before, most of the core people > working on PostgreSQL don't actually use it for anything themselves. I > will expand a little on that and say that this means that while they > are extremely good at what they do, they really don't have a clue what > might be useful to someone "in the wild". Sort of like automotive > engineers who in the 1970's made the Cadillac's engine so large that > you couldn't change the spark plugs without taking the motor mounts > loose and lifting the engine. > > This is both gratuitously offensive and based on a demonstrably false premise. The definition of "core people working on PostgreSQL" is somewhat vague. But if you were to take it as, say, the group of active committers, then I would say that the majority of us earn our living in whole or in part using PostgreSQL. Certainly I do (there's a reason I use an elephant logo for my business). Many of the things I have contributed to PostgreSQL have been in response to user requests, and often things I personally have no immediate need for. The same is true of many contributors. And almost all of mine have been aimed at increasing usability (e.g. dollar quoting, CSV import/export). So I rather take offense at the quite inappropriate spark plug analogy. As for feature requests, everyone has a right to air an opinion. Personally, I will pay more attention to people who contribute to the community than to those who don't. That doesn't just mean coders, though - contribution comes in many forms. In the case of the present request, Jean-Michel is a contributor, and any suggestion to the contrary is right out of line. The real question for me is whether we want to support Materialized Views at the grammar level. If we do then it should be worked on. If not then it can probably be dealt with via a bolt-on module. Personally I'm inclined to say we should support it via the grammar. cheers andrew
Andrew Dunstan wrote: > > > Sean Utt wrote: >> >> As Joshua Drake has pointed out before, most of the core people >> working on PostgreSQL don't actually use it for anything themselves. I >> will expand a little on that and say that this means that while they >> are extremely good at what they do, they really don't have a clue what >> might be useful to someone "in the wild". Sort of like automotive >> engineers who in the 1970's made the Cadillac's engine so large that >> you couldn't change the spark plugs without taking the motor mounts >> loose and lifting the engine. >> >> > > This is both gratuitously offensive and based on a demonstrably false > premise. The definition of "core people working on PostgreSQL" is > somewhat vague. But if you were to take it as, say, the group of active > committers, then I would say that the majority of us earn our living in > whole or in part using PostgreSQL. Certainly I do (there's a reason I > use an elephant logo for my business). The consideration of my comment (which I believe was made some time ago) was not about -hacking which as I understand it is what you and most everyone else on -hackers does. My comment was a consideration to the amount of "core" that are managing postgresql in a production environment. E.g; being DBAs. I would argue that very few committers actually qualify as that either. Feel free to prove me wrong :) Sincerely, Joshua D. Drake
Joshua D. Drake wrote: > Andrew Dunstan wrote: >> >> >> Sean Utt wrote: >>> >>> As Joshua Drake has pointed out before, most of the core people >>> working on PostgreSQL don't actually use it for anything themselves. >>> I will expand a little on that and say that this means that while >>> they are extremely good at what they do, they really don't have a >>> clue what might be useful to someone "in the wild". Sort of like >>> automotive engineers who in the 1970's made the Cadillac's engine so >>> large that you couldn't change the spark plugs without taking the >>> motor mounts loose and lifting the engine. >>> >>> >> >> This is both gratuitously offensive and based on a demonstrably false >> premise. The definition of "core people working on PostgreSQL" is >> somewhat vague. But if you were to take it as, say, the group of >> active committers, then I would say that the majority of us earn our >> living in whole or in part using PostgreSQL. Certainly I do (there's >> a reason I use an elephant logo for my business). > > The consideration of my comment (which I believe was made some time > ago) was not about -hacking which as I understand it is what you and > most everyone else on -hackers does. My comment was a consideration to > the amount of "core" that are managing postgresql in a production > environment. E.g; being DBAs. I would argue that very few committers > actually qualify as that either. Feel free to prove me wrong :) > > I'm not going to bother trying, because you just moved the goalposts (managing in a production environment vs using). And why should the number of DBAs matter one whit? Why should they matter more than, say application developers, when it comes to language level features? cheers andrew
Andrew Dunstan wrote: >> > > I'm not going to bother trying, because you just moved the goalposts > (managing in a production environment vs using). And why should the > number of DBAs matter one whit? Why should they matter more than, say > application developers, when it comes to language level features? Andrew don't get your knickers in a bunch. This was not an attack, I wasn't declaring that one type of user was better than another, nor was I moving goal posts. I was simply explaining the intent of my statement because Sean quoted me without context. Oh and as a note... the fact that you wonder about the worth of the number of DBAs proves my point infinitely. Joshua D. Drake
Joshua D. Drake wrote: > Andrew Dunstan wrote: > >> >> I'm not going to bother trying, because you just moved the goalposts >> (managing in a production environment vs using). And why should the >> number of DBAs matter one whit? Why should they matter more than, say >> application developers, when it comes to language level features? > > > Andrew don't get your knickers in a bunch. This was not an attack, I > wasn't declaring that one type of user was better than another, nor > was I moving goal posts. I was simply explaining the intent of my > statement because Sean quoted me without context. > > Oh and as a note... the fact that you wonder about the worth of the > number of DBAs proves my point infinitely. > > What is your point? You are implying something, but I'm damned if I know what. I did not wonder about the worth of DBAs - that would be silly. Please do not put words in my mouth that I did not speak. What I wondered was why DBAs should be uniquely important, and why we should think that there is any significance to the number of DBAs as opposed to other classes of user, among the ranks of either the core team, or the active committers, or hackers generally. I don't think they should - there are many classes of user whose needs we should cater for, of which DBAs are just one. cheers andrew
My point is simply this: The lack of a clear formal process for feature requests leads to this degradation in the conversation. Without a formalized structure, the conversation devolves rapidly into an argument over semantics and word choice. It is not my contention that the "core" developers need to be different in any way. It is also not my contention that the users need to be different in any way. It is my contention that the "process" currently generates more ill will than it needs to, and needs to be replaced. The problem is a systemic one. There needs to be a more formal structure put in place than just the -hackers mailing list. There needs to be a way to evaluate the demand for a specific feature as well as the benefits and the effort it will require. It needs to be done in as neutral a way as possible. In order to be effective, it will have to be driven into being by the developers, because they will be the ones who can hamstring it -- not the users. Or we can just keep bickering among ourselves over semantics and word choice. That seems to be fun for everyone so far. Sean
Sean Utt wrote: > My point is simply this: The lack of a clear formal process for > feature requests leads to this degradation in the conversation. > Without a formalized structure, the conversation devolves rapidly into > an argument over semantics and word choice. It is not my contention > that the "core" developers need to be different in any way. It is also > not my contention that the users need to be different in any way. It > is my contention that the "process" currently generates more ill will > than it needs to, and needs to be replaced. The problem is a systemic > one. There needs to be a more formal structure put in place than just > the -hackers mailing list. There needs to be a way to evaluate the > demand for a specific feature as well as the benefits and the effort > it will require. It needs to be done in as neutral a way as possible. > In order to be effective, it will have to be driven into being by the > developers, because they will be the ones who can hamstring it -- not > the users. > > Or we can just keep bickering among ourselves over semantics and word > choice. That seems to be fun for everyone so far. > > Very lofty sentiments. Perhaps next time you have a suggestion to make you could rise to that level in the first rather than the second instance, and avoid the unnecessary insults. cheers andrew
"Sean Utt" <sean@strateja.com> writes: > My point is simply this: The lack of a clear formal process for feature > requests leads to this degradation in the conversation. Two comments: 1) The existing informal process has served us very well for more than ten years now. I'm disinclined to consider replacing it, because that would risk altering the community's dynamics for the worse. 2) In the end, this is an open source *community*; no amount of formal feature requesting will have any material impact on what actually gets implemented, because there isn't any central control. What gets implemented is whatever individual contributors choose to work on, either because they find it interesting or (in some cases) because someone pays them to do something specific. Certainly, some contributors pay attention to what's being requested, but I see no reason to think that increasing the level of formality will help them. In short: we haven't got a "road map" and we like it that way. regards, tom lane
Sean Utt wrote: > My point is simply this: The lack of a clear formal process for > feature requests leads to this degradation in the conversation. > Without a formalized structure, the conversation devolves rapidly into > an argument over semantics and word choice. It is not my contention > that the "core" developers need to be different in any way. It is also > not my contention that the users need to be different in any way. It > is my contention that the "process" currently generates more ill will > than it needs to, and needs to be replaced. The problem is a systemic > one. There needs to be a more formal structure put in place than just > the -hackers mailing list. There needs to be a way to evaluate the > demand for a specific feature as well as the benefits and the effort > it will require. It needs to be done in as neutral a way as possible. > In order to be effective, it will have to be driven into being by the > developers, because they will be the ones who can hamstring it -- not > the users. What sort of structure are you envisioning? Features do make it into PostgreSQL - PostgreSQL has had, and arguable still has a more complete feature set than well funded alternatives such as MySQL. Their is a TODO list that both grows and shrinks with each release. I have myself seen major changes in 8.0, 8.1, 8.2, and 8.3 that have personally benefited me. What problem needs to be solved? I agree with your sentiment. For somebody who wants to request a feature, expect it to be taken up by a champion (other than them), and monitor it's regular process, no clear infrastructure seems to exist. However, I question the relevance or value of such a system. My own initial contribution was a reaction to the notion that somebody should be able to demand other people to volunteer their time to work on something. It may have been unfair and cold to the original poster, and for this, I apologize. I don't believe the user community should necessary be able to demand or prioritize a feature unless they are willing to put up resources to support the effort. Resources usually means either people or money. If people truly have a strong business case for a feature, there are several qualified companies willing to take their money and turn it into something real. If people are not able to produce such a business case or justify the expenditure of funds, attracting volunteers to do the same work requires a very different approach. It requires zeal, compelling argument, and a reasonable amount of ego stroking or challenge. Putting in place an official process can have the opposite effect. People such as myself are very willing to volunteer efforts in an informal manner without formal deadlines or processes, because we enjoy it. Forcing the regular amount of red tape many of us need to cut just to get our jobs done at work is not very motivational for people such as myself. Now, while I have contributed to other open source projects, I have not personally contributed much to PostgreSQL. The core PostgreSQL contributors would have to make their voice heard. I think, though, that telling them that they must work on a certain feature, because that's what the users are asking for, is the wrong approach. Not to say that is exactly what you are requesting, but I suggest that is where you are leading. Cheers, mark -- Mark Mielke <mark@mielke.cc>
> I think, though, that telling them that they must work on a > certain feature, because that's what the users are asking for, is the > wrong approach. Not to say that is exactly what you are requesting, but > I suggest that is where you are leading. > > Cheers, > mark > The more communication between users and developers the better. Companies would pay a fortune for the user complaints, moans and groans, suggestions, bug reports and demands that postgresql sees in a single day. IMHO developers need reality checks ... well so do users but for different reasons. There is no quicker way to kill a product or project than to ignore the needs/wants of the user/customer base (open-source or not). andrew
I realize that some very important navel-gazing (^H^H^H "group process") is happening, but let us remember where bona-fide feature requests should go: http://www.postgresql.org/docs/faqs.TODO.html So far, I don't see any mention of materialized views on this page, and I did refresh ... :)
Webb Sprague wrote: > I realize that some very important navel-gazing (^H^H^H "group > process") is happening, but let us remember where bona-fide feature > requests should go: > > http://www.postgresql.org/docs/faqs.TODO.html > > So far, I don't see any mention of materialized views on this page, > and I did refresh ... :) But did you clear your cache? :P Joshua D. Drake > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Andrew Chernow wrote: >> I think, though, that telling them that they must work on a certain >> feature, because that's what the users are asking for, is the wrong >> approach. Not to say that is exactly what you are requesting, but I >> suggest that is where you are leading. > The more communication between users and developers the better. > Companies would pay a fortune for the user complaints, moans and > groans, suggestions, bug reports and demands that postgresql sees in a > single day. > > IMHO developers need reality checks ... well so do users but for > different reasons. There is no quicker way to kill a product or > project than to ignore the needs/wants of the user/customer base > (open-source or not). Nobody is ignoring users or needs or wants. It is a question of priority. My priorities may not match yours may not match Tom's or one of the other core contributors. Valuable features are being added to PostgreSQL with each release as community efforts (or business efforts). With the exception of some arguably naive requests such as "give me a super fast count(*) - why is that so hard?" users are certainly being listened to - because we are the users. I have personally requested items, found myself to be not alone, and seen the group request implemented ahead of my expectations. So again, my question is - what is broken? What needs to be fixed? I, for one, think the core developers are doing a great job, and the extended team (community) is working very well together. I am a PostgreSQL advocate *because* of the model, not in spite of the model. If you want a corporate model, I believe Oracle is also a great product and it may be more compatible with your expectations? (Ironically, I would expect to see new features that *I* ask for implemented in PostgreSQL *before* Oracle would listen to me) Cheers, mark -- Mark Mielke <mark@mielke.cc>
Mark Mielke wrote: > Nobody is ignoring users or needs or wants. It is a question of > priority. My priorities may not match yours may not match Tom's or one > of the other core contributors. Valuable features are being added to FYI, the terminology core contributors is confusing. There is not really any such thing in PostgreSQL land. There is "core" which isn't really relevant to development (except in some specific circumstances such as addressing security concerns). There are "contributors" which are people who are constantly contributing to the community. An incomplete list of them is here: http://www.postgresql.org/community/contributors Sincerely, Joshua D. Drake
> But did you clear your cache? :P Freud might say it takes a lifetime to clear one's cache .... Luckily, in therapy you don't have to wait for those darn Postgres developers ;) > > Joshua D. Drake > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > >
On Sun, 2008-01-13 at 17:44 -0800, Sean Utt wrote: > It is not my contention that the "core" developers need to > be different in any way. It is also not my contention that the users need to > be different in any way. First, this is an open forum, so thank you for expressing your views openly in the manner you've felt them. General comment: I think one perspective I have on the above statement is the feeling that there is a distinction between two groups of people and that one group is put here to serve the other group better. Many new users of Postgres are so used to the closed source situation of Developers being the only people who can see the code that they often perpetuate the concept of tiering or groups, when it doesn't exist. Almost all of the people on the list are users of Postgres. There's just a complete range of people from new users to experienced hackers. Postgres is well documented, well commented and completely open source, so there is no barrier to anyone who wishes to change, and if you choose to define that change positively, improve. So I support Mark Mielke's views on writing code. Anybody who wants to code, can. There's probably a project of a size and complexity that's right for your first project. Apparently the guy that invented the new scheduling algorithms for Linux wasn't even a coder, but he sat down and worked it out. This is Hackers: Write some code today, everybody. You *can*. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Dear Friends, I hope that this flame war can stop, as it is useless. The logic of free software is that developers pick-up issues, based on their skills and interest. The power of the cummunity is to gather very talented developers from all over the planet. Freedom is the logic and there is no need to drive the community. In the end, PostgreSQL relies on the knownledge of talented developers. When posting this thread, I hope that a talented developer would some day pick-up the materialized view issue and work on it, during a process of discussion. In a few days, I will post some precise statistics on how much MATERIALIZED VIEW could boost a 400.000 posts PhpBB 3.0 forum. I will write the required PL trigger, test them and install them on my server. Then each individual is able to decide whether materiazed views are important or not. This is my definition of freedom. Freedom of choice. I don't ask for more. Now, I hope that the list can return to a more peaceful state. Kind regards, Jean-Michel Pouré
On Mon, 2008-01-14 at 09:22 +0000, Simon Riggs wrote: > So I support Mark Mielke's views on writing code. Anybody who wants to > code, can. There's probably a project of a size and complexity that's > right for your first project. The main problem is that usually that initial thing is not what you desperately need today... so the motivation will be pretty low unless you just have loads of time to start off playing with the code. > Apparently the guy that invented the new > scheduling algorithms for Linux wasn't even a coder, but he sat down and > worked it out. > This is Hackers: Write some code today, everybody. You *can*. Certainly everybody can write code, but the barrier to accept it is pretty high in the postgres community. So you better be a damn good coder if you expect your code to be accepted... and even then with considerable fight for justifying the use case for your feature ;-) This is all good for a stable product, but it really makes the barrier between simple users and hackers pretty high. Cheers, Csaba.
Jean-Michel Pouré wrote: > When posting this thread, I hope that a talented developer would some > day pick-up the materialized view issue and work on it, during a process > of discussion. > In case it was lost in the noise - there are several capable people that have been personally on this issue for years. You are not the first to suggest it, and your opinion that the feature is valuable is shared by a larger group. There is still a difference between talking about it and doing it. A few of the doers told me off privately stating that it is not that difficult. My suspicion is that it *is* difficult and they are not doing themselves credit, or their solution is incomplete, but whatever - the result is the same. When one or more of these people are ready, you will likely see it released. It may even be complete before 2008 is complete. > In a few days, I will post some precise statistics on how much > MATERIALIZED VIEW could boost a 400.000 posts PhpBB 3.0 forum. I will > write the required PL trigger, test them and install them on my server. > The two factors of note here: 1) If you can write a PL trigger for it, the value of the feature is limited. In my own case, I found it fairly easy to write triggers, or update the summary table from the application (poor man's view). 2) In my experience, a custom PL trigger can make assumptions about the application that allow greater optimization that a general 'syntactical sugar' solution could. In my own case, performance of queries leapt from 1500 ms to 1 ms. Even if materialized views were implemented to a level that most people would consider "full", I do not expect to see the same speed improvement, because a generalized implementation would not be able to make the assumptions that I can. FYI, my triggers are perhaps 10 lines each, and I believe I have three triggers in the 1500 ms -> 1 ms example. I have a view and a summary table. I update the summary table from the view. In my opinion, this solution is very manageable given the 1500:1 performance improvement it grants me. > Then each individual is able to decide whether materiazed views are > important or not. This is my definition of freedom. Freedom of choice. > > I don't ask for more. > Now, I hope that the list can return to a more peaceful state You are doing fine. I am sorry for assuming you intended more and giving you a cold-ish shoulder. Cheers, mark -- Mark Mielke <mark@mielke.cc>
Mark Mielke wrote: > FYI, my triggers are perhaps 10 lines each, and I believe I have three > triggers in the 1500 ms -> 1 ms example. I have a view and a summary > table. I update the summary table from the view. In my opinion, this > solution is very manageable given the 1500:1 performance improvement > it grants me. But you had to modify your queries. I would think that a materialized views implementation worth its salt would put the view to work on the original, unmodified queries. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Sunday, January 13, 2008 8:18 PM > To: Sean Utt > Cc: Andrew Dunstan; Joshua D. Drake; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Postgresql Materialized views > > "Sean Utt" <sean@strateja.com> writes: > > My point is simply this: The lack of a clear formal process for feature > > requests leads to this degradation in the conversation. > > Two comments: > > 1) The existing informal process has served us very well for more than > ten years now. I'm disinclined to consider replacing it, because that > would risk altering the community's dynamics for the worse. > > 2) In the end, this is an open source *community*; no amount of formal > feature requesting will have any material impact on what actually gets > implemented, because there isn't any central control. Wow. Being new to Open Source, this amazes me. > What gets > implemented is whatever individual contributors choose to work on, > either because they find it interesting or (in some cases) because > someone pays them to do something specific. Certainly, some > contributors pay attention to what's being requested, but I see no > reason to think that increasing the level of formality will help them. What happens when a person adds a feature or changes the architecture of the database that is perceived by some as incorrect or going in the wrong direction? Jon
Alvaro Herrera wrote: <blockquote cite="mid:20080114132644.GD4584@alvh.no-ip.org" type="cite"><pre wrap="">Mark Mielke wrote:</pre><blockquote type="cite"><pre wrap="">FYI, my triggers are perhaps 10 lines each, and I believe I have three triggers in the 1500 ms -> 1 ms example. I have a view and a summary table. I update the summary table from the view. In my opinion, this solution is very manageable given the 1500:1 performance improvement it grants me. </pre></blockquote><pre wrap=""> But you had to modify your queries. I would think that a materialized views implementation worth its salt would put the view to work on the original, unmodified queries. </pre></blockquote><br /> I might be slow today (everyday? :-) ) - but what do you mean bythis? The only difference between *_table and *_view is that *_table is the summary table and *_view is the view. The triggerson the tables the view is derived from select from *_view and update *_table. The queries remain unchanged exceptfor deciding whether to use *_table or *_view. Yes, syntactical sugar would make it prettier and more manageable -but I am confused as to how a syntactical sugar solution would improve performance, and I highly suspect it would not achievethe same performance benefit. As I said - I can make assumptions about how the base tables are updated. A generalizedsolution would not be able to make these assumptions?<br /><br /> For some further background - the base tablesare a mirror of accpac tables (augh!) from mssql. The view and summary table gathers information from 5 or so of thesetables including aggregates, conditionals, sub-selects (different queries to the same base tables) and deep joins. Perhapsmy imagination is too limited - but I don't see how it would be easy to make syntactical sugar for this and stillmaintain the performance I describe above. For about 30 lines of pl/pgsql and some application-side updates (again fromthe view to the summary table) in the synchronization script it seems acceptable.<br /><br /> Cheers,<br /> mark<br /><br/><pre class="moz-signature" cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, Jan 14, 2008 at 07:28:52AM -0600, Roberts, Jon wrote: [...] > What happens when a person adds a feature or changes the architecture of the > database that is perceived by some as incorrect or going in the wrong > direction? (S)he gets to hold onto her/his own version. If there is enough power behind both variants, a fork might happen. Forks are not bad per se -- on the contrary, the variants can explore a wider area of design space. If they stay sufficiently open, they later steal ideas (and even code) from each other. Think genetic algorithms on a global scale ;-) But now it's definitely off-topic, so I'll shut up already. Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFHi20GBcgs9XrR2kYRAppzAJ0VSKzWb8jjEQfMCwW1ycpCAyEZUQCdFmtz tDpywf0ywvk+3j0qI1GUHkI= =3TME -----END PGP SIGNATURE-----
Mark Mielke wrote: > Alvaro Herrera wrote: >> But you had to modify your queries. I would think that a materialized >> views implementation worth its salt would put the view to work on the >> original, unmodified queries. > > I might be slow today (everyday? :-) ) - but what do you mean by this? The > only difference between *_table and *_view is that *_table is the summary > table and *_view is the view. My point is that you should be able to query _table and the system should automatically use the view, without you saying so (except by initially creating them). At least if you had eagerly-updated materialized views. If you had lazy ones, I think those should be used explicitely only. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Alvaro Herrera > Sent: Monday, January 14, 2008 8:20 AM > To: Mark Mielke > Cc: Jean-Michel Pouré; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Postgresql Materialized views > > Mark Mielke wrote: > > Alvaro Herrera wrote: > > >> But you had to modify your queries. I would think that a materialized > >> views implementation worth its salt would put the view to work on the > >> original, unmodified queries. > > > > I might be slow today (everyday? :-) ) - but what do you mean by this? > The > > only difference between *_table and *_view is that *_table is the > summary > > table and *_view is the view. > > My point is that you should be able to query _table and the system > should automatically use the view, without you saying so (except by > initially creating them). > I agree! From a BI perspective, a materialized view is worthless if you have to re-write your query. There isn't a tool on the market that is smart enough to rewrite a query to a view because all other databases handle the rewriting internally. Jon
Roberts, Jon wrote: >> What gets >> implemented is whatever individual contributors choose to work on, >> either because they find it interesting or (in some cases) because >> someone pays them to do something specific. Certainly, some >> contributors pay attention to what's being requested, but I see no >> reason to think that increasing the level of formality will help them. >> > > What happens when a person adds a feature or changes the architecture of the > database that is perceived by some as incorrect or going in the wrong > direction? > > > If that's the general perception it doesn't get added to our source tree. It's very rare that it gets to anything like as formal as a vote. To avoid the possibility of people spending lots of time doing work which is ultimately not adopted, we strongly discourage ivory tower development. For major features especially, developers are encouraged to discuss early and often. But that's a different issue from which items people work on, which is very much a matter of individual choice, or at least something the community has little control over. cheers andrew
On Mon, Jan 14, 2008 at 08:30:53AM -0600, Roberts, Jon wrote: > > My point is that you should be able to query _table and the system > > should automatically use the view, without you saying so (except by > > initially creating them). > > > I agree! From a BI perspective, a materialized view is worthless if you > have to re-write your query. There isn't a tool on the market that is smart > enough to rewrite a query to a view because all other databases handle the > rewriting internally. I don't know about worthless, given that people are doing materialised views on postgres already. It is however a completely orthoginal problem. Someone needs to write the code to maintain such a view before you can even think about working on the planner. However, step 1 would be to get them onto the TODO list. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
> > But you had to modify your queries. I would think that a materialized > > views implementation worth its salt would put the view to work on the > > original, unmodified queries. > > > > I might be slow today (everyday? :-) ) - but what do you mean by this? > The only difference between *_table and *_view is that *_table is the > summary table and *_view is the view. The triggers on the tables the > view is derived from select from *_view and update *_table. The queries > remain unchanged except for deciding whether to use *_table or *_view. Traditionally materialized views exist, so that you do not need to code differently. Your queries still run on the detail table, but are silently answered by a suitable MV. The MV might have count + other aggregated columns grouped by some columns, and thus be able e.g. shortcircuit a "select count(*) from atab". The MV should be MVCC aware (have different values for different snapshots) and not substantially reduce possible concurrency of updates to the base table. > For some further background - the base tables are a mirror of accpac > tables (augh!) from mssql. The view and summary table gathers > information from 5 or so of these tables including aggregates, > conditionals, sub-selects (different queries to the same base tables) > and deep joins. Perhaps my imagination is too limited - but I > don't see > how it would be easy to make syntactical sugar for this and still > maintain the performance I describe above. For about 30 lines of > pl/pgsql and some application-side updates (again from the > view to the > summary table) in the synchronization script it seems acceptable. As long as you can formulate a normal view on the above statement, you should be able to tell the db to materialize that. A good MV feature would be able to use that MV regardless of whether you select from the view, or use a statement that the view is a generalization of. I think MV's where originally invented to boost benchmark results and thus had to operate on given sql to base tables. Andreas
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes: > Traditionally materialized views exist, so that you do not need to code differently. > Your queries still run on the detail table, but are silently answered > by a suitable MV. The MV might have count + other aggregated columns > grouped by some columns, and thus be able e.g. shortcircuit a > "select count(*) from atab". The MV should be MVCC aware (have different > values for different snapshots) and not substantially reduce possible > concurrency of updates to the base table. Note that you just raised the minimum bar for implementation of the feature by a couple orders of magnitude. We cannot automatically substitute an MV into queries unless this is guaranteed not to change the results. No lazy updates, MVCC transparency required, etc. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > My point is simply this: The lack of a clear formal process for feature > requests leads to this degradation in the conversation. Without a formalized > structure, the conversation devolves rapidly into an argument over semantics > and word choice. ... > There needs to be a way to evaluate the demand for a specific feature as > well as the benefits and the effort it will require. You could always start a page on the developer's wiki: http://developer.postgresql.org/ That would seem to be a good place to at least describe the problem in detail, show how you would like a feature to behave, and have people add the pros and cons of certain approaches. Certainly would be better to have a page to point to rather than trying to trawl through mailing archives (heck, the page could even mostly be a collection of such links). - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200801141104 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFHi4gvvJuQZxSWSsgRA66dAKCGCPBPDfTFDoizE0WDwXBzDK/W3ACg8dwZ 99OvuSU9PPmG6XDPPK2iQzA= =Xseg -----END PGP SIGNATURE-----
> > Traditionally materialized views exist, so that you do not need to code differently. > > Your queries still run on the detail table, but are silently answered > > by a suitable MV. The MV might have count + other aggregated columns > > grouped by some columns, and thus be able e.g. shortcircuit a > > "select count(*) from atab". The MV should be MVCC aware (have different > > values for different snapshots) and not substantially reduce possible > > concurrency of updates to the base table. > > Note that you just raised the minimum bar for implementation of the > feature by a couple orders of magnitude. We cannot automatically > substitute an MV into queries unless this is guaranteed not to change > the results. No lazy updates, MVCC transparency required, etc. Yes, unfortunately. But don't you also think that this is what makes it a worthwhile feature ? I mean, we do have the doityourself triggered summary table approach, which is not overly difficult to set up. It needs some thought and possibly design by the user to solve the most obvious concurrency issues, but it is doable. Imho MV could be separated in 2 parts: 1: materialized and MVCC aware views (only used explicitly) 2: add the smarts to rewrite sql Part 1 is already useful by itself since it provides a generic and easy solution to concurrency for the user. (probably nice and mindboggling, how to best implement that, though :-) The "lazy update" and non MVCC approach imho sounds too much like your "you can make it arbitrarily fast if it does not need to be correct" :-) Andreas
Roberts, Jon wrote: >> 2) In the end, this is an open source *community*; no amount of formal >> feature requesting will have any material impact on what actually gets >> implemented, because there isn't any central control. > > Wow. Being new to Open Source, this amazes me. Well it depends on what you mean by formal feature requesting. I will agree with Tom that we are a bunch of cats, you can not herd us. However feature requests do get done and do have material impact. However it usually takes more work than it should to actually get the feature accepted and or committed. >> someone pays them to do something specific. Certainly, some >> contributors pay attention to what's being requested, but I see no >> reason to think that increasing the level of formality will help them. > > What happens when a person adds a feature or changes the architecture of the > database that is perceived by some as incorrect or going in the wrong > direction? They can't add a feature. They can submit a feature for inclusion but if we don't like it we don't take it. It is the hope that people who are trying to submit follow this: http://www.postgresql.org/docs/faqs.FAQ_DEV.html Sincerely, Joshua D. Drake > > > > Jon > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
Mark Mielke wrote: > Mark Mielke wrote: >> Counts, because as we all know, PostgreSQL count(*) is slow, and in >> any case, my count(*) is not on the whole table, but on a subset. >> Doing this in a general way seems complex to me as it would need to be >> able to evaluate whether a given INSERT or UPDATE or one of the >> dependent tables would impact the WHERE clause for the materialized >> view, and it still wouldn't know which rows to add/update/remove >> without detailed analysis, so it would either be throwing out the >> entire materialized view and recreating it on INSERT or UPDATE (or >> deferring until the next query?) in which case it may be very slow, or >> it may be very complex. > > Bah. I forgot to add: The feature I've been wondering about (and not > necessarily looking for somebody else to do, although I don't think I > know the code well enough to do it at this point): > > Web applications often make the same queries over and over. While > memcache can be used to cache results, the memcache interface is > different from the web application interfere requiring complex code, and > as well, one loses the transaction guarantees as the memcache results > are not guaranteed to be up-to-date with the database. Regarding up-to-dateness note that there is a pgfoundry project that helps there. http://pgfoundry.org/projects/pgmemcache/ The other advantages of doing the caching outside the database is that (a) the memory for the cached results don't have to sit in the database machine, and (b) you can cache post-processed (rendered into HTML or gifs) fragments rather than raw data.
Added to TODO: * Add the ability to automatically create materialized views Right now materialized views require the user to create triggerson the main table to keep the summary table current. SQL syntax should be able to manager the triggers and summarytable automatically. A more sophisticated implementation would automatically retrieve from the summary table whenthe main table is referenced, if possible. I also thought this was on the TODO list. --------------------------------------------------------------------------- Zeugswetter Andreas ADI SD wrote: > > > > Traditionally materialized views exist, so that you do not need to code differently. > > > Your queries still run on the detail table, but are silently answered > > > by a suitable MV. The MV might have count + other aggregated columns > > > grouped by some columns, and thus be able e.g. shortcircuit a > > > "select count(*) from atab". The MV should be MVCC aware (have different > > > values for different snapshots) and not substantially reduce possible > > > concurrency of updates to the base table. > > > > Note that you just raised the minimum bar for implementation of the > > feature by a couple orders of magnitude. We cannot automatically > > substitute an MV into queries unless this is guaranteed not to change > > the results. No lazy updates, MVCC transparency required, etc. > > Yes, unfortunately. But don't you also think that this is what makes it > a worthwhile feature ? > > I mean, we do have the doityourself triggered summary table approach, > which is not overly difficult to set up. It needs some thought and possibly design > by the user to solve the most obvious concurrency issues, but it is doable. > > Imho MV could be separated in 2 parts: > 1: materialized and MVCC aware views (only used explicitly) > 2: add the smarts to rewrite sql > > Part 1 is already useful by itself since it provides a generic and easy > solution to concurrency for the user. (probably nice and mindboggling, how to > best implement that, though :-) > > The "lazy update" and non MVCC approach imho sounds too much like your > "you can make it arbitrarily fast if it does not need to be correct" :-) > > Andreas > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes: >> Note that you just raised the minimum bar for implementation of the >> feature by a couple orders of magnitude. > Yes, unfortunately. But don't you also think that this is what makes it > a worthwhile feature ? Well, my point is that taking automatic rewriting as a required feature has at least two negative impacts: * it rules out any form of lazy update, even though for many applications an out-of-date summary view would be acceptable for some purposes; * requiring MVCC consistency will probably hugely reduce the variety of views that we can figure out how to materialize, and cost performance even for the ones we can do at all. It's not zero-cost, even if you consider implementation effort and complexity as free (which I don't). regards, tom lane
Merlin Moncure wrote: > On Jan 12, 2008 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Sat, 2008-01-12 at 13:27 +0100, Jean-Michel Pouré wrote: >> >>> Please pick-up this important issue for developpers. There is no need to >>> concentrate on complex issues, when handling materialized views could >>> boost somme web apps. by a factor of 10 or more. >> It's more complex than you think, but the main reason was that HOT was a >> prerequisite for making summary tables work efficiently, which is only >> now just about to go live into 8.3 > > +1 I don't quite agree with that. HOT certainly speeds up UPDATEs on small tables, like you a summary table, but there's a lot of use cases like data warehousing, where the summary tables are not updated that often for the updates to become a bottleneck. > If you know how to write triggers, materialization techniques aren't > all that difficult. The real technical limitation was not lack of > materialization techniques (write triggers), but was dealing with the > mvcc penalty. Previously to HOT, for summary tables I would redirect > the trigger to insert to a 'roll up' table and move the data to the > summary on cron or from an application event. > > Materialized views are syntax sugar (but still very sweet). There's two things involved in materialized views: 1. Automatically updating the materialized view, when the tables change. This can be done with triggers, right now, but requires quite a bit of manual work to set up, especially with more complex views. 2. Using the materialized views to speed up existing queries. For example, if you have a materialized view on "SELECT COUNT(*) FROM foo", and someone issues the query "SELECT COUNT(*) FROM foo", the planner should automatically use the view to satisfy that. 1 is syntactic sugar, but 2 isn't. These are orthogonal features. Implementing just 1 without 2 would still be very useful, and in fact that seems to be what most people mean by materialized views. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2008-01-16 at 11:16 +0000, Heikki Linnakangas wrote: > Merlin Moncure wrote: > > On Jan 12, 2008 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> On Sat, 2008-01-12 at 13:27 +0100, Jean-Michel Pouré wrote: > >> > >>> Please pick-up this important issue for developpers. There is no need to > >>> concentrate on complex issues, when handling materialized views could > >>> boost somme web apps. by a factor of 10 or more. > >> It's more complex than you think, but the main reason was that HOT was a > >> prerequisite for making summary tables work efficiently, which is only > >> now just about to go live into 8.3 > > > > +1 > > I don't quite agree with that. HOT certainly speeds up UPDATEs on small > tables, like you a summary table, but there's a lot of use cases like > data warehousing, where the summary tables are not updated that often > for the updates to become a bottleneck. I should have said that was *my* reason for not doing it sooner. My thinking was if you load a 1000 rows and they all have the same key in your summary table then you'll be doing 1000 updates on a single row. With HOT that will cause the block to fill up and then we wipe out the previously updated rows, since they are inserted and deleted in same transaction. So all 1000 updates can occur without going beyond that single block. Much faster, no mess. Before HOT, large loads were worse, since there was no chance of VACUUMing the table between updates. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > > My thinking was if you load a 1000 rows and they all have the same key > in your summary table then you'll be doing 1000 updates on a single row. > This is true because the statement level triggers are still rudimentary, with no OLD and NEW support. A single AFTER statement trigger execution could maintain the summary table with much less effort. Regards, Andreas
Tom Lane wrote: > Well, my point is that taking automatic rewriting as a required feature > has at least two negative impacts: > > * it rules out any form of lazy update, even though for many applications > an out-of-date summary view would be acceptable for some purposes; > > * requiring MVCC consistency will probably hugely reduce the variety of > views that we can figure out how to materialize, and cost performance > even for the ones we can do at all. > > It's not zero-cost, even if you consider implementation effort and > complexity as free (which I don't). There is one big additional advantage of automatic rewriting though, I believe. If we had the infrastructure to recognize that possibility of using a predefined (materialized) view for executing a query, we can also use that infrastructure to get implement a kind of optimizer hints. How? We'd need "statistics-materialized views", which don't materialize the full result of the view, but instead compute it's statistical properties (the same which ANALYZE computes for a table). When planning a query we can then substitute the guessed values for rowcount and friends of a subplan by the values computed for the corresponding statistics-materialized view. However, until someone figures out *how* to *actually* recognize that possibility of using a MV for executing a query, this is just hand-wavering of course... regards, Florian Pflug