Thread: knngist patch support
Hello,
I noticed this morning that the k nearest neighbor gist patch https://commitfest.postgresql.org/action/patch_view?id=230 was still being considered for inclusion in 9. Sadly, this feature appears to have been dropped from 9.
It seems to me that the functionality this brings is one of the most common use cases for mobile applications (or even web in some cases) that are location enabled. How many times do we find ourselves "Finding the 10 nearest restaurants" in our smart phones? Well, this patch solves exactly that in the most efficient manner.
Granted, one can currently solve this problem with PostgreSQL/PostGIS as it stands, however the performance improvements that one can gain for those types of (extremely common) queries leveraging the Gist enhancements are, well, exciting.
300x time improvements? Sounds great to me.
I would love if you guys could consider applying this patch for this upcoming release.
Best Regards,
- Ragi Burhum
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, Feb 10, 2010 at 04:49:59PM -0800, Ragi Y. Burhum wrote: > Hello, > > I noticed this morning that the k nearest neighbor gist patch > https://commitfest.postgresql.org/action/patch_view?id=230 was still being > considered for inclusion in 9. Sadly, this feature appears to have been > dropped from 9. This has been discussed recently on this list. Seems the patch would need more review to be considered stable. So it's the hard choice of letting the schedule for 9.0 slip or not letting this patch in. But some prerequisites will go in, that's the good news. (BTW: I tried to find this discussion in the Web archives, but had no luck. It's in my mailbox, though -- e.g. message-ID 603c8f071002070527j1dada7cdseb42e7cbc71bf71a@mail.gmail.com part of the long thread "Damage control mode", starting on Jan 8, 2010; this one mail is from Feb 7 -- but that might be me) Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFLc5YoBcgs9XrR2kYRAoW3AJ94tYWPenLOjH4B4GHD9DCYSSWYOQCeOcoM RYDhINv+k9YeD23xFHyj9yw= =K1E0 -----END PGP SIGNATURE-----
This is very disgraceful from my point of view and reflects real problem in scheduling of CF. The patch was submitted Nov 23 2009, discussed and reworked Nov 25. Long holidays in December-January, probably are reason why there were no any movement on reviewing the patch. People with inspiration spent time to discuss rbtree, while it was clear, that rbtree is a minor issue. Now we have no review and great feature is missing. I understand that some healthy resistance is useful to let developers more accurate and discipline, but, hey, not dropping great feature ! I'd understand if developer is missing, or just not willing to contact, but I and Teodor are here and we readily answer any questions. I failed to find any documents about commitfest to understand if we already discussed all possible scenario of feature drop. If we say 'A', when started to formalize our development process instead of old way discuss&vote in -hackers, then we should say 'B' - formalize procedure and possible collision of interests. Oleg On Thu, 11 Feb 2010, tomas@tuxteam.de wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Wed, Feb 10, 2010 at 04:49:59PM -0800, Ragi Y. Burhum wrote: >> Hello, >> >> I noticed this morning that the k nearest neighbor gist patch >> https://commitfest.postgresql.org/action/patch_view?id=230 was still being >> considered for inclusion in 9. Sadly, this feature appears to have been >> dropped from 9. > > This has been discussed recently on this list. Seems the patch would > need more review to be considered stable. So it's the hard choice of > letting the schedule for 9.0 slip or not letting this patch in. > > But some prerequisites will go in, that's the good news. > > (BTW: I tried to find this discussion in the Web archives, but had no > luck. It's in my mailbox, though -- > > e.g. > > message-ID 603c8f071002070527j1dada7cdseb42e7cbc71bf71a@mail.gmail.com > > part of the long thread "Damage control mode", starting on Jan 8, 2010; > this one mail is from Feb 7 -- but that might be me) > > Regards > - -- tomЪЪs > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFLc5YoBcgs9XrR2kYRAoW3AJ94tYWPenLOjH4B4GHD9DCYSSWYOQCeOcoM > RYDhINv+k9YeD23xFHyj9yw= > =K1E0 > -----END PGP SIGNATURE----- > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > This is very disgraceful from my point of view and reflects real problem > in scheduling of CF. The patch was submitted Nov 23 2009, discussed and > reworked Nov 25. Long holidays in December-January, probably are reason why > there were no any movement on reviewing the patch. There was a scheduling problem all right, which was that this patch *did not make* the deadline for the November CF. The fact that it got any review at all in November was more than expected under the CF process. And I remind you that we stated more than once that we didn't want major feature patches to show up only at the last CF. If it had come from anyone other than you and Teodor, there would have not been even a moment's consideration of letting it into 9.0. My own feeling about it is that I much preferred the original proposal of a contrib module with little or no change to core code. I don't want to be changing core code for this at this late hour. If it were only touching GIST I'd be willing to rely on your and Teodor's expertise in that module, but it's not. It whacks around the planner, it makes questionable changes in the operator class structure, and the last version I saw hadn't any documentation whatever. It's not committable on documentation grounds alone, even if everybody was satisfied about the code. How do you feel about going back to the original contrib module for now and resubmitting the builtin version for 9.1? regards, tom lane
2010/2/11 Oleg Bartunov <oleg@sai.msu.su>: > This is very disgraceful from my point of view and reflects real problem > in scheduling of CF. The patch was submitted Nov 23 2009, discussed and > reworked Nov 25. Long holidays in December-January, probably are reason why > there were no any movement on reviewing the patch. People with So... I think the reason why there was no movement between November 25th and January 15th is because no CommitFest started between November 25th and January 15th. Had you submitted the patch on November 14th, you would have gotten a lot more feedback in November; I agree that we don't have a lot of formal documentation about the CommitFest process, but I would think that much would be pretty clear, but maybe not. The reason there was no movement after January 15th is because (1) I couldn't get anyone to volunteer to review it, except Mark Cave-Ayland who didn't actually do so (or anyway didn't post anything publicly), and (2) we were still working on rbtree. Personally, I am a little irritated about the whole way this situation has unfolded. I devoted a substantial amount of time over my Christmas vacation to patch review, and many of those patches went on to be committed. Some of the patches I reviewed were yours. I did not get paid one dime for any of that work. I expressed candidly, from the very beginning, that getting such a large patch done by the end of this CommitFest would likely be difficult, especially given that it had two precursor patches. In exchange for giving you my honest opinions about your patches two weeks before the scheduled start of the CommitFest, over my Christmas vacation, and for free, I got a long stream of complaints from you and others about how the process is unfair, and as nearly zero help making the prerequisite patches committable as it is possible for anyone to achieve. It regularly took 4-6 days for a new version of the patch to appear, and as often as not questions in my reviews were ignored for days, if not weeks. It took a LOT of iterations before my performance concerns were addressed; and I believe that process could have been done MUCH more quickly. Now, it is possible that as you are sitting there reading this email, you are thinking to yourself "well, your feedback didn't actually make that patch any better, so this whole thing is just pure obstructionism." I don't believe that's the case, but obviously I'm biased and everyone is entitled to their own opinion. What I can tell you for sure is that all of my reviewing was done with the best of motivations and in a sincere attempt to do the right thing. You may be right that January 15th was a bad time to start a CommitFest, although it's very unclear to me why that might be. At least in the US, the holidays are over long before January 15th, but we had a very small crop of reviewers this time around, and a number of them failed to review the patches they picked up, or did only a very cursory review. It might be mentioned that if you have concerns about getting your own patches reviewed, you might want to think about reviewing some patches by other people. Of the 60 patches currently in the 2010-01 CommitFest, I'm listed as a reviewer on 12 of them. Needless to say, if someone else had volunteered to do some or all of the review work on some of those patches, I would have had more time to work on other patches. ...Robert
I have to say that as a 3rd party observer it is quite obvious to understand why the PostgreSQL software is so good - people are very passionate about the work they are doing. However, in this instance, as a by-stander, it seems that there is a lot of energy being spent on pointing fingers. At the end, the only people that loose are users like me who would love to have a feature like this since it would literally make one of the most common types of spatial queries, for lack of better wording, ridiculously fast. I sincerely apologize if I triggered any kind of trouble by asking a questions about this feature. - Ragi On Wed, Feb 10, 2010 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > 2010/2/11 Oleg Bartunov <oleg@sai.msu.su>: >> This is very disgraceful from my point of view and reflects real problem >> in scheduling of CF. The patch was submitted Nov 23 2009, discussed and >> reworked Nov 25. Long holidays in December-January, probably are reason why >> there were no any movement on reviewing the patch. People with > > So... I think the reason why there was no movement between November > 25th and January 15th is because no CommitFest started between > November 25th and January 15th. Had you submitted the patch on > November 14th, you would have gotten a lot more feedback in November; > I agree that we don't have a lot of formal documentation about the > CommitFest process, but I would think that much would be pretty clear, > but maybe not. The reason there was no movement after January 15th is > because (1) I couldn't get anyone to volunteer to review it, except > Mark Cave-Ayland who didn't actually do so (or anyway didn't post > anything publicly), and (2) we were still working on rbtree. > > Personally, I am a little irritated about the whole way this situation > has unfolded. I devoted a substantial amount of time over my > Christmas vacation to patch review, and many of those patches went on > to be committed. Some of the patches I reviewed were yours. I did > not get paid one dime for any of that work. I expressed candidly, > from the very beginning, that getting such a large patch done by the > end of this CommitFest would likely be difficult, especially given > that it had two precursor patches. In exchange for giving you my > honest opinions about your patches two weeks before the scheduled > start of the CommitFest, over my Christmas vacation, and for free, I > got a long stream of complaints from you and others about how the > process is unfair, and as nearly zero help making the prerequisite > patches committable as it is possible for anyone to achieve. It > regularly took 4-6 days for a new version of the patch to appear, and > as often as not questions in my reviews were ignored for days, if not > weeks. It took a LOT of iterations before my performance concerns > were addressed; and I believe that process could have been done MUCH > more quickly. > > Now, it is possible that as you are sitting there reading this email, > you are thinking to yourself "well, your feedback didn't actually make > that patch any better, so this whole thing is just pure > obstructionism." I don't believe that's the case, but obviously I'm > biased and everyone is entitled to their own opinion. What I can tell > you for sure is that all of my reviewing was done with the best of > motivations and in a sincere attempt to do the right thing. > > You may be right that January 15th was a bad time to start a > CommitFest, although it's very unclear to me why that might be. At > least in the US, the holidays are over long before January 15th, but > we had a very small crop of reviewers this time around, and a number > of them failed to review the patches they picked up, or did only a > very cursory review. It might be mentioned that if you have concerns > about getting your own patches reviewed, you might want to think about > reviewing some patches by other people. Of the 60 patches currently > in the 2010-01 CommitFest, I'm listed as a reviewer on 12 of them. > Needless to say, if someone else had volunteered to do some or all of > the review work on some of those patches, I would have had more time > to work on other patches. > > ...Robert >
On Thu, 11 Feb 2010, Tom Lane wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: >> This is very disgraceful from my point of view and reflects real problem >> in scheduling of CF. The patch was submitted Nov 23 2009, discussed and >> reworked Nov 25. Long holidays in December-January, probably are reason why >> there were no any movement on reviewing the patch. > > There was a scheduling problem all right, which was that this patch *did > not make* the deadline for the November CF. The fact that it got any > review at all in November was more than expected under the CF process. > And I remind you that we stated more than once that we didn't want major > feature patches to show up only at the last CF. If it had come from > anyone other than you and Teodor, there would have not been even a > moment's consideration of letting it into 9.0. there were several long threads, which I have no possibility to follow, so we relied on the wisdom of people, who can discuss. So, it's true we didn't track all nuances of our development schedule. We just developed. Looked on commitfest page we didn't find any summary and it's hard to understand what is the final word. In the old-good time we also discussed a lot, we release faster and we always had tolerance of time, since many things were not formalized, we were developers and reviewed each other. Now, the whole process of development redesigned to be more enterprize, but we still have problem with resources - developers, reviewers. And I don't see, how all changes try to solve this problem. We have problem with long release cycle, it's getting worse and worse, in spite of CF. The main problem is not in scheduling - we have little delta here, the problem is in human resources and unclear regulations make it worse. > > My own feeling about it is that I much preferred the original proposal > of a contrib module with little or no change to core code. I don't want > to be changing core code for this at this late hour. If it were only > touching GIST I'd be willing to rely on your and Teodor's expertise in > that module, but it's not. It whacks around the planner, it makes > questionable changes in the operator class structure, and the last aha, we originally submit contrib module, which didn't touch anything you mentioned, we improve stuff to follow discussion and now we are out of luck %( > version I saw hadn't any documentation whatever. It's not committable > on documentation grounds alone, even if everybody was satisfied about > the code. well, there is enough documentation to review patch. In my understanding this was always enough to submit code. User's documentation is depend on discussion and review and can be added later before releasing beta. > > How do you feel about going back to the original contrib module for now > and resubmitting the builtin version for 9.1? Hmm, one good thing is that rbtree seems ok for submisson. We need to discuss this, if it's good for PostGIS community. I'd not complain about this decision if it touch my interests only, I could live with closed-source patch. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Thu, 11 Feb 2010, Robert Haas wrote: > 2010/2/11 Oleg Bartunov <oleg@sai.msu.su>: >> This is very disgraceful from my point of view and reflects real problem >> in scheduling of CF. The patch was submitted Nov 23 2009, discussed and >> reworked Nov 25. Long holidays in December-January, probably are reason why >> there were no any movement on reviewing the patch. People with > > So... I think the reason why there was no movement between November > 25th and January 15th is because no CommitFest started between > November 25th and January 15th. Had you submitted the patch on > November 14th, you would have gotten a lot more feedback in November; > I agree that we don't have a lot of formal documentation about the > CommitFest process, but I would think that much would be pretty clear, > but maybe not. The reason there was no movement after January 15th is > because (1) I couldn't get anyone to volunteer to review it, except > Mark Cave-Ayland who didn't actually do so (or anyway didn't post > anything publicly), and (2) we were still working on rbtree. > > Personally, I am a little irritated about the whole way this situation > has unfolded. I devoted a substantial amount of time over my Robert, please accept my public apology, if you feel I offense you. There are nothing against you. Your contribution is very important and I really don't understand why on the Earth you're not paid ! I remember discussion to paid you from our foundation. That's shame. Does nybody ever got support for development from our foundation ? > Christmas vacation to patch review, and many of those patches went on > to be committed. Some of the patches I reviewed were yours. I did > not get paid one dime for any of that work. I expressed candidly, > from the very beginning, that getting such a large patch done by the > end of this CommitFest would likely be difficult, especially given > that it had two precursor patches. In exchange for giving you my > honest opinions about your patches two weeks before the scheduled > start of the CommitFest, over my Christmas vacation, and for free, I > got a long stream of complaints from you and others about how the > process is unfair, and as nearly zero help making the prerequisite > patches committable as it is possible for anyone to achieve. It > regularly took 4-6 days for a new version of the patch to appear, and > as often as not questions in my reviews were ignored for days, if not > weeks. It took a LOT of iterations before my performance concerns > were addressed; and I believe that process could have been done MUCH > more quickly. Robert, it's very hard to marshal all developers, who are not-paid people with their regular duties and problems and their own interests in postgres. You just discovered we have long-long holidays in Russia, when people try to spend somewhere. I always beaten with Christmas in December, when I tried to communicate with business people un US. Earlier, we lived with this and our releases were faster. I'd not say, CF is a step back, but our system should have tolerance in time if we're open-source community, or go enterprize way - we are all paid, we follow business plan, ... etc. Something is really wrong, that's what I can say. > > Now, it is possible that as you are sitting there reading this email, > you are thinking to yourself "well, your feedback didn't actually make > that patch any better, so this whole thing is just pure > obstructionism." I don't believe that's the case, but obviously I'm > biased and everyone is entitled to their own opinion. What I can tell > you for sure is that all of my reviewing was done with the best of > motivations and in a sincere attempt to do the right thing. > > You may be right that January 15th was a bad time to start a > CommitFest, although it's very unclear to me why that might be. At > least in the US, the holidays are over long before January 15th, but > we had a very small crop of reviewers this time around, and a number > of them failed to review the patches they picked up, or did only a > very cursory review. It might be mentioned that if you have concerns > about getting your own patches reviewed, you might want to think about > reviewing some patches by other people. Of the 60 patches currently > in the 2010-01 CommitFest, I'm listed as a reviewer on 12 of them. > Needless to say, if someone else had volunteered to do some or all of > the review work on some of those patches, I would have had more time > to work on other patches. Robert, human resources are the main problem and, first of all, our system should work for developers ! If we will not understand each other and follow only some unclear rules, we'll lost current developers and will not attract new. We, probably, in our particulary case, will follow our original suggestion -just contrib module, but I concern about future. Now I have to think not just about algorithms and implementation, but about reviewer and current regulation. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Thu, Feb 11, 2010 at 3:38 AM, Oleg Bartunov <oleg@sai.msu.su> wrote: > Robert, please accept my public apology, if you feel I offense you. There > are > nothing against you. Your contribution is very important and I really don't > understand why on the Earth you're not paid ! I remember discussion to paid > you from our foundation. That's shame. Does nybody ever got support for > development from our foundation ? No, I don't feel like you offended me. It's more that, from my point of view, it seems like all the things you're complaining about are things that you more or less have control over, or at least could have foreseen. I have only been involved in this project for a year and a half, so the CommitFest process is the only process that I know or understand. On the whole, I've found it to be a pretty good process. I get my patches in; I help other people get their patches in (and hopefully improve them along the way). It's particularly appealing when you're a non-committer, as it gives you a formal structure to make sure your work gets looked at. It seems that you're sort of frustrated with the system and the need to go through a process before committing a patch; and that you feel that the rules are unclear. I don't think it's a bad thing to go through a process before committing a patch, especially a large patch like knngist, but of course that's just my opinion. I agree that the fact that the rules are unclear is a problem, though I'm not sure what to do about it. I am not sure they are so unclear as you are making them out to be, but again, I'm biased by being a relative newcomer, as well as someone who has been in the middle of many of the process discussions. > Robert, human resources are the main problem and, first of all, > our system should work for developers ! If we will not understand each other > and follow only some unclear rules, we'll lost current developers and will > not attract new. We, probably, in our particulary case, will follow our > original suggestion -just contrib module, but I concern about future. Now I > have to think not just about algorithms and implementation, but about > reviewer and current regulation. IMHO, our system has to work for both developers and users, and it has to work for both committers and non-committers. ...Robert
On Thu, Feb 11, 2010 at 3:00 AM, Oleg Bartunov <oleg@sai.msu.su> wrote: >> version I saw hadn't any documentation whatever. It's not committable >> on documentation grounds alone, even if everybody was satisfied about >> the code. > > well, there is enough documentation to review patch. Where is there any documentation at all? There are no changes to doc/ at all; no README; and not even a lengthy comment block anywhere that I saw. Nor did the email in which the patch was submitted clearly lay out the design of the feature. > In my understanding > this was always enough to submit code. User's documentation is depend on > discussion and review and can be added later > before releasing beta. Several people have said this lately, but it doesn't match what I've seen of our practice over the last year and a half; Tom regularly boots patches that lack documentation (or necessary regression test updates). Sure, people often submit small patches without documentation thinking to fill it in later, but anything major pretty much has to have it, AFAICS. From my own point of view, I would never commit anything that lacked documentation, for fear of being asked to write it myself if the patch author didn't. Of course it's a bit different for committers, who can presumably be counted on to clean up their own mess, but I still think it's fair to expect at least some effort to be put into the docs before commit. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > It seems that you're sort of frustrated with the system and the need > to go through a process before committing a patch; I've been handling arround here for years (since 2005 or before) and I think there always was a process. The only change is it's getting more and more formal. But still not any clearer. It's good to try to keep the major releases one year apart, but until now we had some flexibility towards developpers. They could have their own agenda then appear with a big patch and it was getting considered. We never asked contributors to arrange for being able to find a sponsor, do the closed source version, prepare for publishing, and then send a patch in a timely maneer so that to ease the integration and release. Before there was a Commit Fest process, we took some weeks then months at the end of the cycle to consider what had been accumulated. The new process is there for giving more feedback to developpers, and is being considered now as a way to get better control about the release agenda. I'm not sure it's a good tool for that. I'm not sure insisting that much on the release schedule is a good idea. Once more making compromises is easy. What's hard and challenging is making *good* compromises. > IMHO, our system has to work for both developers and users, and it has > to work for both committers and non-committers. That's an easy goal to share. The question is how you get there without losing existing developpers and possibly attracting new developpers on the way. -- dim
On Thu, Feb 11, 2010 at 1:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> In my understanding >> this was always enough to submit code. User's documentation is depend on >> discussion and review and can be added later >> before releasing beta. > > Several people have said this lately, but it doesn't match what I've > seen of our practice over the last year and a half; Perhaps the confusion is that we often say not to worry about the quality of the English in the documentation. That's because it's easy for a reviewer to fix up the English but not so easy to figure out what you intend the behaviour to be. -- greg
On Thu, 11 Feb 2010, Robert Haas wrote: > On Thu, Feb 11, 2010 at 3:00 AM, Oleg Bartunov <oleg@sai.msu.su> wrote: >>> version I saw hadn't any documentation whatever. It's not committable >>> on documentation grounds alone, even if everybody was satisfied about >>> the code. >> >> well, there is enough documentation to review patch. > > Where is there any documentation at all? There are no changes to doc/ > at all; no README; and not even a lengthy comment block anywhere that > I saw. Nor did the email in which the patch was submitted clearly lay > out the design of the feature. Well, initial knngist announce http://archives.postgresql.org/pgsql-hackers/2009-11/msg01547.php isn't enough to review ? We made test data available to reproduce results, see http://www.sai.msu.su/~megera/wiki/2009-11-25 We are here and open to any reviewer's question. > >> In my understanding >> this was always enough to submit code. User's documentation is depend on >> discussion and review and can be added later >> before releasing beta. > > Several people have said this lately, but it doesn't match what I've > seen of our practice over the last year and a half; Tom regularly > boots patches that lack documentation (or necessary regression test > updates). Sure, people often submit small patches without > documentation thinking to fill it in later, but anything major pretty > much has to have it, AFAICS. From my own point of view, I would never > commit anything that lacked documentation, for fear of being asked to > write it myself if the patch author didn't. Of course it's a bit > different for committers, who can presumably be counted on to clean up > their own mess, but I still think it's fair to expect at least some > effort to be put into the docs before commit. I think nobody will spend his time to write sgml code for user's documentation for fear his patch will be rejected/moved/getting rewritten, so his time will be just wasted. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Thu, 11 Feb 2010, Greg Stark wrote: > On Thu, Feb 11, 2010 at 1:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >>> In my understanding >>> this was always enough to submit code. User's documentation is depend on >>> discussion and review and can be added later >>> before releasing beta. >> >> Several people have said this lately, but it doesn't match what I've >> seen of our practice over the last year and a half; > > Perhaps the confusion is that we often say not to worry about the > quality of the English in the documentation. That's because it's easy > for a reviewer to fix up the English but not so easy to figure out > what you intend the behaviour to be. English + SGML stuff. We usually provide information in plain text, posted in -hackers and published in my wiki. I don't remember, that there were no information about patches. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Thu, 11 Feb 2010, Oleg Bartunov wrote: > On Thu, 11 Feb 2010, Tom Lane wrote: >> >> My own feeling about it is that I much preferred the original proposal >> of a contrib module with little or no change to core code. I don't want >> to be changing core code for this at this late hour. If it were only >> touching GIST I'd be willing to rely on your and Teodor's expertise in >> that module, but it's not. It whacks around the planner, it makes >> questionable changes in the operator class structure, and the last We splitted patch to make review easy, probably by several reviewers, since we touched several subsystems. http://archives.postgresql.org/message-id/4B4CCB9F.8080708@sigaev.ru Patch for planner is 5600 bytes long, not so big. > > aha, we originally submit contrib module, which didn't touch anything you > mentioned, we improve stuff to follow discussion and now we are out of luck > %( > >> version I saw hadn't any documentation whatever. It's not committable >> on documentation grounds alone, even if everybody was satisfied about >> the code. > > well, there is enough documentation to review patch. In my understanding this > was always enough to submit code. User's documentation is depend on > discussion and review and can be added later > before releasing beta. > >> >> How do you feel about going back to the original contrib module for now >> and resubmitting the builtin version for 9.1? > > Hmm, one good thing is that rbtree seems ok for submisson. We need to discuss > this, if it's good for PostGIS community. I'd not complain about this > decision > if it touch my interests only, I could live with closed-source patch. Contrib module is a big step backward and will produce compatibility problem, since we'll have to use awkward operation ><, which works different with/without index. Also, it'd be very difficult to add support to other contrib modules (btree_gist, pg_trgm). Links: Heikki complaint - http://archives.postgresql.org/message-id/4B0B8C30.2080400@enterprisedb.com Simon - http://archives.postgresql.org/message-id/1259115190.27757.11194.camel@ebony Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, Feb 11, 2010 at 03:19:14PM +0100, Dimitri Fontaine wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > It seems that you're sort of frustrated with the system and the need > > to go through a process before committing a patch; > > I've been handling arround here for years (since 2005 or before) and I > think there always was a process. The only change is it's getting more > and more formal. But still not any clearer. I didn't want to give the impression that I see the process as too heavy. Granted, knngist is a cool feature, and I'm a big fan of Oleg and Teodor (since they pulled the hstore trick :) Still I think Robert is doing a terrific job. The weight of the big patches has increased considerably in the last years, and without the heroic efforts of some people (adn Robert is definitely among them!), PostgreSQL's release process wouldn't have been as smooth. > It's good to try to keep the major releases one year apart, but until > now we had some flexibility towards developpers. They could have their > own agenda then appear with a big patch and it was getting considered. > > We never asked contributors to arrange for being able to find a > sponsor, do the closed source version, prepare for publishing, and then > send a patch in a timely maneer so that to ease the integration and > release. > > Before there was a Commit Fest process, we took some weeks then months > at the end of the cycle to consider what had been accumulated. Yes. But remember HOT. Things seem smoother nowadays (disclaimer: I'm just a bystander). > The new process is there for giving more feedback to developpers, and is > being considered now as a way to get better control about the release > agenda. I'm not sure it's a good tool for that. I'm not sure insisting > that much on the release schedule is a good idea. What's your proposal then? Allow more slippage in the release schedule? How much? When? > Once more making compromises is easy. What's hard and challenging is > making *good* compromises. Agreed Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFLdHkNBcgs9XrR2kYRAhSsAJ4n899St+aPYkRA7XBX78vF/xZh9wCfZ3T0 h226KRarnFrEEIcOY+zBG9E= =X9fa -----END PGP SIGNATURE-----
2010/2/11 Oleg Bartunov <oleg@sai.msu.su>: > On Thu, 11 Feb 2010, Robert Haas wrote: > >> On Thu, Feb 11, 2010 at 3:00 AM, Oleg Bartunov <oleg@sai.msu.su> wrote: >>>> >>>> version I saw hadn't any documentation whatever. It's not committable >>>> on documentation grounds alone, even if everybody was satisfied about >>>> the code. >>> >>> well, there is enough documentation to review patch. >> >> Where is there any documentation at all? There are no changes to doc/ >> at all; no README; and not even a lengthy comment block anywhere that >> I saw. Nor did the email in which the patch was submitted clearly lay >> out the design of the feature. > > Well, initial knngist announce > http://archives.postgresql.org/pgsql-hackers/2009-11/msg01547.php > isn't enough to review ? We made test data available to reproduce results, > see http://www.sai.msu.su/~megera/wiki/2009-11-25 > We are here and open to any reviewer's question. Well, just for example, that doesn't document the changes you made to the opclass infrastructure, and the reasons for those decisions. Actually, I've been working on an email on that topic but haven't gotten around to finishing it. There's some description of the overall goal of the feature but not much about how you got there. Is it enough to review? Perhaps, but it would certainly be nice to have some more discussion of the overall design. IMHO, anyway. ...Robert
On Fri, 12 Feb 2010, Robert Haas wrote: > 2010/2/11 Oleg Bartunov <oleg@sai.msu.su>: > > On Thu, 11 Feb 2010, Robert Haas wrote: > > > >> On Thu, Feb 11, 2010 at 3:00 AM, Oleg Bartunov <oleg@sai.msu.su> wrote: > >>>> > >>>> version I saw hadn't any documentation whatever. =A0It's not committab= > le > >>>> on documentation grounds alone, even if everybody was satisfied about > >>>> the code. > >>> > >>> well, there is enough documentation to review patch. > >> > >> Where is there any documentation at all? =A0There are no changes to doc/ > >> at all; no README; and not even a lengthy comment block anywhere that > >> I saw. =A0Nor did the email in which the patch was submitted clearly lay > >> out the design of the feature. > > > > Well, initial knngist announce > > http://archives.postgresql.org/pgsql-hackers/2009-11/msg01547.php > > isn't enough to review ? We made test data available to reproduce results= > , > > see http://www.sai.msu.su/~megera/wiki/2009-11-25 > > We are here and open to any reviewer's question. > > Well, just for example, that doesn't document the changes you made to > the opclass infrastructure, and the reasons for those decisions. > Actually, I've been working on an email on that topic but haven't > gotten around to finishing it. There's some description of the > overall goal of the feature but not much about how you got there. Is > it enough to review? Perhaps, but it would certainly be nice to have > some more discussion of the overall design. IMHO, anyway. This is not fair,Robert. Everything was discussed in -hackers.I assume reviewer should follow discussion at least, he is a member of our community. Mailing list archive was/is/will our the best knowledge base. For example, regarding changes in the opclass infrastructure you complain, you can see your reply to Teodor's message http://archives.postgresql.org/message-id/603c8f070912292255u30e1983bi22ed5778bd2ce890@mail.gmail.com which contains description of amcanorderbyop flag. Frankly, I think we see here limit of our resources. Let me explain this. We splitted patch by several parts - 2 parts are about contrib modules (rather trivial), 1 - is a gist changes, 1 - planner changes and 1 part - some proc changes. The most serious are gist and planner patches. We develop GiST for many years and know almost everything there and could say that we're responsible for GiST. I don't know if anybody from -hackers could review our patch for planner better than Tom, but he is busy and will be busy. So, any serious feature, which touch planner doomed to be rejected because of lack of reviewer. We tried to find compromise for 9.0 (Tom suggests contrib module), but all variants are ugly and bring incompatibility in future. If there are no hackers willing/capable to review our patches, then, please, help us how to save neighbourhood search without incompatibility in future. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Fri, Feb 12, 2010 at 3:44 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: > This is not fair,Robert. Everything was discussed in -hackers.I assume > reviewer > should follow discussion at least, he is a member of our community. Mailing > list archive was/is/will our the best knowledge base. Dude, there's no fair or unfair here; I'm just telling you what I think. There are not six people who follow this mailing list more closely than I do, and when I started looking at this patch it took me two hours to figure out why you made those changes to the opclass machinery. It's not documented anywhere either in the patch or in the email in which the patch was submitted. That made it hard for me. If that makes me stupid, then I'm stupid, but then probably there are a lot of stupid people around here. > For example, regarding > changes in the opclass infrastructure you complain, you can see your reply > to Teodor's message > http://archives.postgresql.org/message-id/603c8f070912292255u30e1983bi22ed5778bd2ce890@mail.gmail.com > which contains description of amcanorderbyop flag. OK, so in one email message that is not the email in which you submitted the patch there is one sentence of explanation that I failed to remember six weeks later when I looked at the patch. > Frankly, I think we see here limit of our resources. Let me explain this. > We splitted patch by several parts - 2 parts are about contrib modules > (rather trivial), 1 - is a gist changes, 1 - planner changes and 1 part - > some proc changes. The most serious are gist and planner patches. We develop > GiST for many years and know almost everything there and could say that > we're > responsible for GiST. I don't know if anybody from -hackers could review our > patch for planner better than Tom, but he is busy and will be busy. > So, any serious feature, which touch planner doomed to be rejected because > of > lack of reviewer. I do think resource limitations play a role. For at least as long as I have been involved in the community, we have relied very heavily on Tom Lane to review nearly every patch and commit more than half of them. As our group of developers grows, that is unsustainable, which I believe to be part of the reason that -core just created four new committers; as well as part of the reason for the CommitFest process, which tries to enlist non-committer reviewers. But none of us are Tom Lane. The part of your patch that took me two hours to figure out probably would have taken Tom twenty minutes (maybe less). But even if we assume that I am as smart as Tom Lane and that I will spend as much time working on PostgreSQL as Tom Lane, both of which may well be false, I won't know the code base as well as he does now until ~2020. The only way we can get past that is, first, by splitting up the work across as many people as possible, and second, by making it as easy as possible for the reviewer to understand what the code is about. And at least if the reviewer is me, *documentation helps*. I'm going to respond to the part of this email that's about moving this patch forward with a separate email. ...Robert
On Fri, Feb 12, 2010 at 3:44 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: > We tried to find compromise for 9.0 (Tom suggests contrib module), but all > variants are ugly and bring incompatibility in future. If there are no > hackers > willing/capable to review our patches, then, please, help us how to save > neighbourhood search without incompatibility in future. Here's what I've gathered from my read through this patch. Let me know if it's right: In CVS HEAD, if an AM is marked amcanorder, that means that the index defines some kind of intrinsic order over the tuples so that, from a given starting point, it makes sense to talk about scanning either forward or backward. GIST indices don't have an intrinsic notion of ordering, but the structure of the index does lend itself to finding index tuples that are "nearby" to some specified point. So this patch wants to introduce the notion of an AM that is marked amcanorderbyop to accelerate queries that order by <indexed column> <op> <constant>. To do that, we need some way of recognizing which operators are candidates for this optimization. This patch implements that by putting the relevant operators into the operator class. That requires relaxing the rule that operator class operators must return bool; so instead when the AM is marked amcanorderbyop we allow the operator class operators to return any type with a default btree operator class. Does that sound right? Tom remarked in another email that he wasn't too happy with the opclass changes. They seem kind of grotty to me, too, but I don't immediately have a better idea. My fear is that there may be places in the code that rely on opclass operators only ever returning bool, and that changing that may break things. It also feels like allowing non-bool-returning opclass members only for this one specific case is kind of a hack: is this an instance of some more general problem that we ought to be solving in some more general way? Not sure. In any case, it seems to me that figuring out how we're going to solve the problem of marking the operators (or otherwise identifying the expression trees) that are index-optimizable is the key issue for this patch. If we can agree on that, the rest should be a SMOP. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Tom remarked in another email that he wasn't too happy with the > opclass changes. They seem kind of grotty to me, too, but I don't > immediately have a better idea. My fear is that there may be places > in the code that rely on opclass operators only ever returning bool, > and that changing that may break things. It also feels like allowing > non-bool-returning opclass members only for this one specific case is > kind of a hack: is this an instance of some more general problem that > we ought to be solving in some more general way? Not sure. Yes, that's exactly what I didn't like about it: the proposed changes create confusion between opclass members that represent index-optimizable WHERE conditions and those that represent index-optimizable ORDER BY conditions. You can get away with that to some extent as long as you assume that the latter type of operator never yields boolean and so can never appear at the top level of WHERE. But that assumption sucks. There are plenty of cases where people ORDER BY boolean values, so who's to argue that we will never want an operator returning boolean in the second category? And as soon as you put it in, the planner is going to think that it's also a potential index-qualification operator, which is something the AM might or might not be prepared to support. I think this is really unacceptable and there needs to be some cleaner way of distinguishing the two types of operators. Possibly a couple of boolean columns added to pg_amop (you'd need two because there are three possible states, in case an operator really can serve both purposes in a particular opclass). Or maybe we should do something else. But ignoring the issue won't do. Maybe a more general idea would be to invent "categories" of opclass members, where the only existing category is "index search qualifier", and these new knngist thingies are another, and maybe plus and minus for window function ranges are a third. But I'm not sure what you do if one operator can be in more than one category. regards, tom lane
On Fri, Feb 12, 2010 at 7:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maybe a more general idea would be to invent "categories" of opclass > members, where the only existing category is "index search qualifier", > and these new knngist thingies are another, and maybe plus and minus for > window function ranges are a third. But I'm not sure what you do if one > operator can be in more than one category. Well, if you were willing to change pg_amop so that the key was (amopfamily, amoplefttype, amoprighttype, amopcategory) rather than just (amopfamily, amoplefttype, amoprighttype), the issue of what to do if an operator can be in more than one category becomes moot. You just specify the operator more than once if need be. If you don't want the amopcategory to be part of the key, then you just need to define it as a type that can handle multiple values yet has a fast membership test. A character string of some type would be flexible - you could use any single character as a category identifier - but given that we don't expect many categories and we do want good performance, it seems like an int4 used as a bitmap field would be more appropriate. I think the first approach is better, partly because it seems to lend itself to a cleaner syntax for CREATE OPERATOR CLASS. Something like: CREATE OPERATOR CLASS blah blah AS OPERATOR 3 &&, OPERATOR ORDER 15 <->; ...Robert
On Fri, Feb 12, 2010 at 9:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 12, 2010 at 7:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Maybe a more general idea would be to invent "categories" of opclass >> members, where the only existing category is "index search qualifier", >> and these new knngist thingies are another, and maybe plus and minus for >> window function ranges are a third. But I'm not sure what you do if one >> operator can be in more than one category. > > Well, if you were willing to change pg_amop so that the key was > (amopfamily, amoplefttype, amoprighttype, amopcategory) rather than > just (amopfamily, amoplefttype, amoprighttype), the issue of what to > do if an operator can be in more than one category becomes moot. You > just specify the operator more than once if need be. Except I'm full of it, because amopstrategy is in there too. Hmm. And that's unfortunate because the syscache machinery is limited to four columns as lookup keys. This is a bit ugly, but one idea that occurs to me is to change amopstrategy from int16 to int32. Internally, we'll treat the low 16 bits as the strategy number and the high 16 bits as the strategy category, with strategy category 0 being "index search qualifier". ...Robert
Robert Haas <robertmhaas@gmail.com> writes: >> Well, if you were willing to change pg_amop so that the key was >> (amopfamily, amoplefttype, amoprighttype, amopcategory) rather than >> just (amopfamily, amoplefttype, amoprighttype), the issue of what to >> do if an operator can be in more than one category becomes moot. �You >> just specify the operator more than once if need be. Yeah, that occurred to me too after sending my earlier email. > Except I'm full of it, because amopstrategy is in there too. Hmm. > And that's unfortunate because the syscache machinery is limited to > four columns as lookup keys. Ugh. Still, we could certainly change the 4-key limit to 5, though it might be a tad tedious to go round and edit all the SearchSysCache and related calls. Maybe while we were at it we should change them to SearchSysCache1, SearchSysCache2, etc to not have the limit hardwired textually in quite so many places... > This is a bit ugly, but one idea that occurs to me is to change > amopstrategy from int16 to int32. Internally, we'll treat the low 16 > bits as the strategy number and the high 16 bits as the strategy > category, with strategy category 0 being "index search qualifier". Hm, yeah that would work, but I agree it's ugly. While thinking about different possible solutions here: one of the things that was worrying me is that for cases where the same operator can serve in more than one role, it might have to have either the same opstrategy or different ones in different roles, depending on how the AM has assigned strategy numbers. The method with an extra index column side-steps that nicely since there are two unrelated pg_amop entries. If there's only one entry then you lose if you need different strategies. Robert's use-the-high-bits method works too, since there would still be two separate entries, but some other possible representations are eliminated by that worry. regards, tom lane
On Fri, Feb 12, 2010 at 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> Well, if you were willing to change pg_amop so that the key was >>> (amopfamily, amoplefttype, amoprighttype, amopcategory) rather than >>> just (amopfamily, amoplefttype, amoprighttype), the issue of what to >>> do if an operator can be in more than one category becomes moot. You >>> just specify the operator more than once if need be. > > Yeah, that occurred to me too after sending my earlier email. > >> Except I'm full of it, because amopstrategy is in there too. Hmm. >> And that's unfortunate because the syscache machinery is limited to >> four columns as lookup keys. > > Ugh. Still, we could certainly change the 4-key limit to 5, though it > might be a tad tedious to go round and edit all the SearchSysCache and > related calls. Maybe while we were at it we should change them to > SearchSysCache1, SearchSysCache2, etc to not have the limit hardwired > textually in quite so many places... Maybe. It sounds sort of awful though; and there's probably a distributed performance penalty involved >> This is a bit ugly, but one idea that occurs to me is to change >> amopstrategy from int16 to int32. Internally, we'll treat the low 16 >> bits as the strategy number and the high 16 bits as the strategy >> category, with strategy category 0 being "index search qualifier". > > Hm, yeah that would work, but I agree it's ugly. On further review there's a serious problem with this idea: pg_amop_opr_fam_index. > While thinking about different possible solutions here: one of the > things that was worrying me is that for cases where the same operator > can serve in more than one role, it might have to have either the same > opstrategy or different ones in different roles, depending on how the AM > has assigned strategy numbers. The method with an extra index column > side-steps that nicely since there are two unrelated pg_amop entries. > If there's only one entry then you lose if you need different > strategies. Robert's use-the-high-bits method works too, since there > would still be two separate entries, but some other possible > representations are eliminated by that worry. OK, here's another idea. Let's just add a new column to pg_amop called amoporderstrategy. If an operator can only be used for one purpose or the other, we'll set the other value to -1. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > OK, here's another idea. Let's just add a new column to pg_amop > called amoporderstrategy. If an operator can only be used for one > purpose or the other, we'll set the other value to -1. ... problem for unique index, no? regards, tom lane
On Fri, Feb 12, 2010 at 10:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> OK, here's another idea. Let's just add a new column to pg_amop >> called amoporderstrategy. If an operator can only be used for one >> purpose or the other, we'll set the other value to -1. > > ... problem for unique index, no? Dang. What a pain in the tail. I guess we could make that column nullable, but that's got it's own fair share of problems. Is the only reasonable way to solve this problem a new catalog? That's not tremendously scalable, but it's starting to feel like the only way of solving this problem that doesn't involve massive surgery. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Feb 12, 2010 at 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> This is a bit ugly, but one idea that occurs to me is to change >>> amopstrategy from int16 to int32. �Internally, we'll treat the low 16 >>> bits as the strategy number and the high 16 bits as the strategy >>> category, with strategy category 0 being "index search qualifier". >> >> Hm, yeah that would work, but I agree it's ugly. > On further review there's a serious problem with this idea: > pg_amop_opr_fam_index. I think that's soluble though. The reason that index exists is to enforce the rule that an operator can stand in only one relationship to an opfamily. In this design the natural rule would be "one relationship per role", ie, the unique key would become (operator, category, opfamily). However, that does make it even uglier to have category shoehorned in as part of a different field. Back to wanting 5-key syscaches ... regards, tom lane
On Fri, Feb 12, 2010 at 10:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Feb 12, 2010 at 9:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> This is a bit ugly, but one idea that occurs to me is to change >>>> amopstrategy from int16 to int32. Internally, we'll treat the low 16 >>>> bits as the strategy number and the high 16 bits as the strategy >>>> category, with strategy category 0 being "index search qualifier". >>> >>> Hm, yeah that would work, but I agree it's ugly. > >> On further review there's a serious problem with this idea: >> pg_amop_opr_fam_index. > > I think that's soluble though. The reason that index exists is to > enforce the rule that an operator can stand in only one relationship > to an opfamily. In this design the natural rule would be "one > relationship per role", ie, the unique key would become > (operator, category, opfamily). > > However, that does make it even uglier to have category shoehorned in as > part of a different field. Back to wanting 5-key syscaches ... Sigh. ...Robert
>> However, that does make it even uglier to have category shoehorned in as >> part of a different field. Back to wanting 5-key syscaches ... > Sigh. I see your point. May be it's better to introduce new system table? pg_amorderop to store ordering operations for index. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: > I see your point. May be it's better to introduce new system table? pg_amorderop > to store ordering operations for index. We could, but that approach doesn't scale to wanting more categories in the future --- you're essentially decreeing that every new category of opclass-associated operator will require a new system catalog, along with all the infrastructure needed for that. That guarantees that the temptation to take shortcuts will remain high. If we didn't already have the plus/minus-for-WINDOW-RANGE example staring us in the face, I might think that an extensible solution wasn't needed here ... but we do so I think we really need to allow for multiple categories in some form. Now on the flip side, adding new catalogs would allow flexibility to add columns that aren't there in pg_amop, which could come in handy if some future category requires auxiliary data that's not needed for the existing category of index search operators. But since the two examples we have at hand don't appear to need any extra data, this argument isn't real strong. regards, tom lane
On Sat, Feb 13, 2010 at 1:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: >> I see your point. May be it's better to introduce new system table? pg_amorderop >> to store ordering operations for index. > > We could, but that approach doesn't scale to wanting more categories > in the future --- you're essentially decreeing that every new category > of opclass-associated operator will require a new system catalog, > along with all the infrastructure needed for that. That guarantees > that the temptation to take shortcuts will remain high. Yeah. PFA a patch to allow 5-key syscaches. ...Robert
Attachment
On Sat, Feb 13, 2010 at 01:31:44PM -0500, Robert Haas wrote: > On Sat, Feb 13, 2010 at 1:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Teodor Sigaev <teodor@sigaev.ru> writes: > >> I see your point. May be it's better to introduce new system table? pg_amorderop > >> to store ordering operations for index. > > > > We could, but that approach doesn't scale to wanting more categories > > in the future --- you're essentially decreeing that every new category > > of opclass-associated operator will require a new system catalog, > > along with all the infrastructure needed for that. That guarantees > > that the temptation to take shortcuts will remain high. > > Yeah. PFA a patch to allow 5-key syscaches. (Realizing I'm a lurker in this conversation, and hoping not to ask irritating questions) Do we need to rename SearchSysCache et al. to SearchSysCache1, etc.? It seems to me that requires changes to all kinds of software without any real need. The four lines of PL/LOLCODE that inspired this thought aren't themselves a great burden, but when combined with everyone else using SearchSysCache already... -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
On Sat, Feb 13, 2010 at 2:03 PM, Joshua Tolley <eggyknap@gmail.com> wrote: > On Sat, Feb 13, 2010 at 01:31:44PM -0500, Robert Haas wrote: >> On Sat, Feb 13, 2010 at 1:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Teodor Sigaev <teodor@sigaev.ru> writes: >> >> I see your point. May be it's better to introduce new system table? pg_amorderop >> >> to store ordering operations for index. >> > >> > We could, but that approach doesn't scale to wanting more categories >> > in the future --- you're essentially decreeing that every new category >> > of opclass-associated operator will require a new system catalog, >> > along with all the infrastructure needed for that. That guarantees >> > that the temptation to take shortcuts will remain high. >> >> Yeah. PFA a patch to allow 5-key syscaches. > > (Realizing I'm a lurker in this conversation, and hoping not to ask irritating > questions) Do we need to rename SearchSysCache et al. to SearchSysCache1, > etc.? It seems to me that requires changes to all kinds of software without > any real need. The four lines of PL/LOLCODE that inspired this thought aren't > themselves a great burden, but when combined with everyone else using > SearchSysCache already... If we want to allow 5-key syscaches, we have to add an extra parameter to SearchSysCache and friends. So everyone caller of SearchSysCache is going to break. (Well, unless we instead leave SearchSysCache alone and add SearchSysCacheExtended or similar; but that's not really project style.) The point of the macros is that if someone some day decides they want to allow SIX-key syscaches, we don't have to break all those people again; we can just update the macro definitions. ...Robert
2010/2/14 Tom Lane <tgl@sss.pgh.pa.us>: > Teodor Sigaev <teodor@sigaev.ru> writes: >> I see your point. May be it's better to introduce new system table? pg_amorderop >> to store ordering operations for index. > > We could, but that approach doesn't scale to wanting more categories > in the future --- you're essentially decreeing that every new category > of opclass-associated operator will require a new system catalog, > along with all the infrastructure needed for that. That guarantees > that the temptation to take shortcuts will remain high. > > If we didn't already have the plus/minus-for-WINDOW-RANGE example > staring us in the face, I might think that an extensible solution > wasn't needed here ... but we do so I think we really need to allow > for multiple categories in some form. I'm not so familiar with knn gist topic, but I think there are not enough fundamentals yet. Because we started from defining operators if they can be "ordered or indexed", general operator semantics is over them, not adding columns to pg_amop. It will be something like datatype-independent human word: "add and subtract", "take distance", "ordered". And we don't have time to invent such new world. Regards. -- Hitoshi Harada
2010/2/14 Robert Haas <robertmhaas@gmail.com>: > If we want to allow 5-key syscaches, we have to add an extra parameter > to SearchSysCache and friends. So everyone caller of SearchSysCache > is going to break. (Well, unless we instead leave SearchSysCache > alone and add SearchSysCacheExtended or similar; but that's not really > project style.) DirectFunctionCall1/2/3...? -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > And we don't have time to invent such new world. Huh? This is all discussion for 9.1 (or even later). There's plenty of time. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Feb 13, 2010 at 2:03 PM, Joshua Tolley <eggyknap@gmail.com> wrote: >> (Realizing I'm a lurker in this conversation, and hoping not to ask irritating >> questions) Do we need to rename SearchSysCache et al. to SearchSysCache1, >> etc.? It seems to me that requires changes to all kinds of software without >> any real need. The four lines of PL/LOLCODE that inspired this thought aren't >> themselves a great burden, but when combined with everyone else using >> SearchSysCache already... > If we want to allow 5-key syscaches, we have to add an extra parameter > to SearchSysCache and friends. So everyone caller of SearchSysCache > is going to break. (Well, unless we instead leave SearchSysCache > alone and add SearchSysCacheExtended or similar; but that's not really > project style.) It's fair to stop and think about how this would affect external packages and what they'd have to do to support building against both new and old-style backends. Reflecting on it, it seems to me that the separate SearchSysCacheN() macros are obviously cleaner and closer to preferred project style than the existing code with all those explicit zeroes. So I think there's a case for migrating to that style even if we didn't have a concern about the max number of lookup keys. What would probably be the recommended solution for backwards-compatible source code is to convert the actual calls to new style, and then provide a block of macro definitions along the lines of #if CATALOG_VERSION_NO < something #define SearchSysCache1(...) SearchSysCache(..., 0, 0, 0) #define SearchSysCache2(...) SearchSysCache(..., 0, 0) etc So that seems okay so far. What's not clear from this is whether we should reserve SearchSysCache as an equivalent to SearchSysCache4() and therefore name the underlying function something different. That would allow being lazy about converting individual calls over. I'm inclined to think not, and here's the reason: with the old call style it was never apparent from the callee side how many arguments the caller intended to be valid, and so for example we couldn't have an Assert that checked it was right. I'm not sure if we should go so far as to add an explicit nkeys argument to SearchSysCache, but it's worth thinking about at least. regards, tom lane
On Sat, Feb 13, 2010 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What would probably be the recommended solution for backwards-compatible > source code is to convert the actual calls to new style, and then > provide a block of macro definitions along the lines of > > #if CATALOG_VERSION_NO < something > #define SearchSysCache1(...) SearchSysCache(..., 0, 0, 0) > #define SearchSysCache2(...) SearchSysCache(..., 0, 0) > etc > > So that seems okay so far. Where would we put this block of code? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Feb 13, 2010 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What would probably be the recommended solution for backwards-compatible >> source code is to convert the actual calls to new style, and then >> provide a block of macro definitions along the lines of >> >> #if CATALOG_VERSION_NO < something >> #define SearchSysCache1(...) SearchSysCache(..., 0, 0, 0) >> #define SearchSysCache2(...) SearchSysCache(..., 0, 0) >> etc >> >> So that seems okay so far. > Where would we put this block of code? *We* wouldn't. This is a solution that might be used by pg_foundry projects for instance. Some people want to distribute loadable modules for which the same source code can be compiled against multiple PG releases. They'd stick those macro definitions, conditionalized as above, into their own source file. regards, tom lane
On Sat, Feb 13, 2010 at 2:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> And we don't have time to invent such new world. > > Huh? This is all discussion for 9.1 (or even later). There's > plenty of time. Just to be clear, I was intending this patch, at least, to be applied now. I actually think there's a good argument that we should do at least this much for 9.0, namely that now is probably the time when there are the fewest outstanding patches that will be broken by this. If we try to apply this for the first 9.1 CommitFest, then (1) it'll have to be completely redone and (2) it'll force massive rebasing. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Just to be clear, I was intending this patch, at least, to be applied > now. I actually think there's a good argument that we should do at > least this much for 9.0, namely that now is probably the time when > there are the fewest outstanding patches that will be broken by this. > If we try to apply this for the first 9.1 CommitFest, then (1) it'll > have to be completely redone and (2) it'll force massive rebasing. What I think we should do is not change SearchSysCache for 9.0, but provide the SearchSysCacheN macros as syntactic sugar, and go around and change (at least most of) the call sites to use the macros. This is clearly cleaner source code, and with that method we'll still be ABI-compatible in 9.0 for anybody who doesn't fix their source right away. regards, tom lane
> Reflecting on it, it seems to me that the separate SearchSysCacheN() > macros are obviously cleaner and closer to preferred project style than > the existing code with all those explicit zeroes. So I think there's > a case for migrating to that style even if we didn't have a concern > about the max number of lookup keys. http://www.sigaev.ru/misc/syscache-0.1.gz Patch enlarges number of arguments of SearchSysCache and friends to 5 and introduces SearchSysCacheN (and friends) macroses. All SearchSysCache/SearchSysCacheCopy/SearchSysCacheExists/SearchSysCacheList/GetSysCacheOid calls are converted into calling macroses. For compatibility, we could rename SearchSysCache into SearchSysCacheSomething and provide SearchSysCache function for old code. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Tom Lane <tgl@sss.pgh.pa.us> writes: > Teodor Sigaev <teodor@sigaev.ru> writes: >> I see your point. May be it's better to introduce new system table? pg_amorderop >> to store ordering operations for index. > > We could, but that approach doesn't scale to wanting more categories > in the future --- you're essentially decreeing that every new category > of opclass-associated operator will require a new system catalog, > along with all the infrastructure needed for that. That guarantees > that the temptation to take shortcuts will remain high. On the other hand here's how the fine manual define an operator class: Operator classes are so called because one thing they specify is the set of WHERE-clause operators that can be used withan index (i.e., can be converted into an index-scan qualification). An operator class can also specify some support proceduresthat are needed by the internal operations of the index method, but do not directly correspond to any WHERE-clauseoperator that can be used with the index. > If we didn't already have the plus/minus-for-WINDOW-RANGE example > staring us in the face, I might think that an extensible solution > wasn't needed here ... but we do so I think we really need to allow > for multiple categories in some form. Agreed. And we're talking about the basic operators + and -, and about a distance or metric operator. Those remind me of groups and etc. http://en.wikipedia.org/wiki/Group_(mathematics) http://en.wikipedia.org/wiki/Abelian_group http://en.wikipedia.org/wiki/Ring_(mathematics)http://en.wikipedia.org/wiki/Metric_space A group is defined by a data type, an operator (+), and an identity element. If the group is abelian, the given operation is associative and commutative, and each element of the data type has an inverse. Then there's the metric space which is a data type with a distance function. This function must be non-negative, commutative, etc. So I guess what we need here is a Operator Group to define our plus and minus operators, and the fact that it's a group says (by convention, like the total ordering of a BTree) that the + is commutative and the - its opposite. Or we have an "option" called abelian for specifying the commutativity? Then I'm for tricking the maths to say that our notion of a metric space will apply only to our notion of an Operator Group, unless someone really insists on separating the concepts. So an Operator Group defines 2 strategies that you have to attach to your + and - operators, and a support function which is the distance, and optional. Now, we want to have Groups able to work on more than one datatype, for example talking about the distance of a point to a circle or a box. So we need an Operator Group Family, don't we? How much does all that help in this case? -- dim PS: I'm sad to have this discussion after having read it's too late for 9.0. The KNN-Gist stuff with extended ORDER BY indexing was too good to be true.
On Sat, 2010-02-13 at 13:28 -0500, Tom Lane wrote: > If we didn't already have the plus/minus-for-WINDOW-RANGE example > staring us in the face, I might think that an extensible solution > wasn't needed here ... but we do so I think we really need to allow > for multiple categories in some form. Is this also an opportunity to provide the infrastructure to assign meaning to operators? Related discussion: http://archives.postgresql.org/pgsql-hackers/2009-10/msg01443.php Regards,Jeff Davis
On Sat, Feb 13, 2010 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Just to be clear, I was intending this patch, at least, to be applied >> now. I actually think there's a good argument that we should do at >> least this much for 9.0, namely that now is probably the time when >> there are the fewest outstanding patches that will be broken by this. >> If we try to apply this for the first 9.1 CommitFest, then (1) it'll >> have to be completely redone and (2) it'll force massive rebasing. > > What I think we should do is not change SearchSysCache for 9.0, > but provide the SearchSysCacheN macros as syntactic sugar, and > go around and change (at least most of) the call sites to use the > macros. This is clearly cleaner source code, and with that method > we'll still be ABI-compatible in 9.0 for anybody who doesn't fix their > source right away. Let me just think out loud for a second about the roadmap for knngist at this point. 1. Add support for 5-key syscaches using either my patch or Teodor's or some third version that may crop up. This could possibly be split into one version that converts everything to using macros (for 9.0) and a second version that actually increases the maximum number of keys from 4 to 5 (for 9.1). 2. Modify pg_amop by adding a new column amopcategory, probably either int2 or maybe even just char. Add this key to both unique indices on pg_amop. Modify CREATE OPERATOR CLASS (and maybe ALTER OPERATOR FAMILY?) to support some new syntax to add "order-by-op" operators to opclasses/families (I'm thinking OPERATOR ORDER <strategy-number> <operator> vs. the usual OPRATOR <strategy-number> <operator>). Also add an amcanorderbyop flag to pg_am. Teach the planner to generate an index scan path when amcanorderbyop is set on the AM and a suitable order-by-op clause is present. 3. Modify GIST to use the infrastructure introduced by #2 for the new <-> operator. I'm not prepared to endorse doing #3 in core for 9.0, but I wonder if it would be feasible to think about doing #1 and #2 and putting something into contrib for #3. The last round of knngist patches had a "planner" patch which basically did #2 (with the problems previously discussed, to which we now seem to have a fairly clean and straightforward solution) and then the "itself" and "proc" patches did #3. The planner patch is actually quite small, so maybe it's not out of the question to think that it might go in, with some suitable fixing up along the lines discussed here and upthread. On the other hand, maybe I'm getting too ambitious. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > ... > 2. Modify pg_amop by adding a new column amopcategory, probably either > int2 or maybe even just char. > ... > I'm not prepared to endorse doing #3 in core for 9.0, but I wonder if > it would be feasible to think about doing #1 and #2 and putting > something into contrib for #3. No, we are not touching the system catalogs for this in 9.0, much less fooling with any planner logic. If it had been submitted a few months earlier that could have happened, but we are barely 48 hours away from alpha freeze. The only part of this that I think can even be considered at this point is to do a backwards-compatible source code upgrade that will decouple the various call sites from knowledge of exactly how many key columns the syscaches allow. And even that is not for the benefit of this feature; it's mainly to minimize breakage of other patches that will be developed between now and whenever knngist does land. (IOW, I agree with your point that making the call syntax change now is the least painful time to do that.) regards, tom lane
On Sat, Feb 13, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> ... >> 2. Modify pg_amop by adding a new column amopcategory, probably either >> int2 or maybe even just char. >> ... >> I'm not prepared to endorse doing #3 in core for 9.0, but I wonder if >> it would be feasible to think about doing #1 and #2 and putting >> something into contrib for #3. > > No, we are not touching the system catalogs for this in 9.0, much less > fooling with any planner logic. If it had been submitted a few months > earlier that could have happened, but we are barely 48 hours away from > alpha freeze. The only part of this that I think can even be considered > at this point is to do a backwards-compatible source code upgrade that > will decouple the various call sites from knowledge of exactly how many > key columns the syscaches allow. And even that is not for the benefit > of this feature; it's mainly to minimize breakage of other patches > that will be developed between now and whenever knngist does land. > (IOW, I agree with your point that making the call syntax change now > is the least painful time to do that.) OK. In that case, any objections to my applying the attached patch, which I believe implements this as you suggested? ...Robert
Attachment
Dimitri Fontaine wrote: > Then there's the metric space which is a data type with a distance > function. This function must be non-negative, commutative, etc. > > So I guess what we need here is a Operator Group to define our plus and > minus operators, and the fact that it's a group says (by convention, > like the total ordering of a BTree) that the + is commutative and the - > its opposite. Or we have an "option" called abelian for specifying the > commutativity? > Would the group analogy work with partially ordered domains, e.g. with a location on a sphere datatype together with an identifier for the sphere - so poi on earth can be compared to another, but not a poi on earth with a poi on the moon. ? regards, Yeb Havinga
Robert Haas <robertmhaas@gmail.com> writes: > OK. In that case, any objections to my applying the attached patch, > which I believe implements this as you suggested? Um, did you test this version? It looks like the macros are still defined according to the idea that SearchSysCache takes five arguments. Also, I'd suggest adding explicit comments to syscache.h suggesting that SearchSysCache etc are meant to be called via the macros rather than directly. I didn't check all the individual calls, but it looks generally sane except for those points. regards, tom lane
On Sun, Feb 14, 2010 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> OK. In that case, any objections to my applying the attached patch, >> which I believe implements this as you suggested? > > Um, did you test this version? It looks like the macros are still > defined according to the idea that SearchSysCache takes five arguments. You are correct. I realized that this morning while I was shaving. Sorry about that. > Also, I'd suggest adding explicit comments to syscache.h suggesting > that SearchSysCache etc are meant to be called via the macros > rather than directly. Good idea. > I didn't check all the individual calls, but it looks generally > sane except for those points. Will fix and commit. ...Robert