Thread: Email not searchable in our archives
Would someone please fine out why the attached email from Heikki is not appearing in a search of our archives? I tried the subject and a line from the email and neither came up as a hit: http://search.postgresql.org/search?q=git+patch+review&m=1&l=&d=-1&s=r -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Would someone please fine out why the attached email from Heikki is not > appearing in a search of our archives? I've noticed some curious omissions lately, too. For instance I searched for "Include Lists for Text Search" this morning, and successfully got hits on yesterday's and today's posts with that title, but not the ones I wanted from last September. Perhaps there are chunks of last year that are missing from the tsearch index for some reason? regards, tom lane
On Mon, Mar 10, 2008 at 6:55 PM, Bruce Momjian <bruce@momjian.us> wrote: > Would someone please fine out why the attached email from Heikki is not > appearing in a search of our archives? I tried the subject and a line > from the email and neither came up as a hit: > > http://search.postgresql.org/search?q=git+patch+review&m=1&l=&d=-1&s=r Eh? The message you included is at the top of the results when I use the query above. -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
BTW, your mail broke our archiver. If you go to your message page, http://archives.postgresql.org/pgsql-www/2008-03/msg00183.php you'll notice it only displays your part of the message -- not the attached message. Then you notice that the date index has no msg00184.php nearby ... until you go to the end of it and you notice that there's a message from Heikki dated 23 May 2007. The problem here is that your message contains a text/plain attachment with the dreaded "^From " line, which causes Mhonarc to think it's a separate message. Not sure if there's something we can do about this. One idea would be making the separator contain the list address in the "^From " line. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Dave Page wrote: > On Mon, Mar 10, 2008 at 6:55 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Would someone please fine out why the attached email from Heikki is not > > appearing in a search of our archives? I tried the subject and a line > > from the email and neither came up as a hit: > > > > http://search.postgresql.org/search?q=git+patch+review&m=1&l=&d=-1&s=r > > Eh? The message you included is at the top of the results when I use > the query above. It shows up now for me too, but did not this morning, about 12 hours ago. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > BTW, your mail broke our archiver. If you go to your message page, > http://archives.postgresql.org/pgsql-www/2008-03/msg00183.php > you'll notice it only displays your part of the message -- not the > attached message. Then you notice that the date index has no > msg00184.php nearby ... until you go to the end of it and you notice > that there's a message from Heikki dated 23 May 2007. > > The problem here is that your message contains a text/plain attachment > with the dreaded "^From " line, which causes Mhonarc to think it's a > separate message. Not sure if there's something we can do about this. > One idea would be making the separator contain the list address in the > "^From " line. Yea, I can see how that would happen. Sorry. I can probably modify my mailer to escape those "From" lines but that isn't going to fix it for other posters. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Alvaro Herrera wrote: >> The problem here is that your message contains a text/plain attachment >> with the dreaded "^From " line, which causes Mhonarc to think it's a >> separate message. Not sure if there's something we can do about this. >> One idea would be making the separator contain the list address in the >> "^From " line. > Yea, I can see how that would happen. Sorry. I can probably modify my > mailer to escape those "From" lines but that isn't going to fix it for > other posters. The "From " to ">From " kluge is supposed to happen during delivery into a Unix-format mailbox. There *is not* any restriction on messages in flight that they not contain lines starting with "From ". So if this broke the archives, the fault is on the archives' side not Bruce's. http://www.faqs.org/rfcs/rfc822.html regards, tom lane
On Mon, Mar 10, 2008 at 11:37:40PM -0400, Tom Lane wrote: > a Unix-format mailbox. There *is not* any restriction on messages in > flight that they not contain lines starting with "From ". So if this > broke the archives, the fault is on the archives' side not Bruce's. > > http://www.faqs.org/rfcs/rfc822.html Well, you probably want to look at 2821 and 2822, also, but you're quite right. The way the archiver is using From sounds like a filthy hack to me. A
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Alvaro Herrera wrote: > >> The problem here is that your message contains a text/plain attachment > >> with the dreaded "^From " line, which causes Mhonarc to think it's a > >> separate message. Not sure if there's something we can do about this. > >> One idea would be making the separator contain the list address in the > >> "^From " line. > > > Yea, I can see how that would happen. Sorry. I can probably modify my > > mailer to escape those "From" lines but that isn't going to fix it for > > other posters. > > The "From " to ">From " kluge is supposed to happen during delivery into > a Unix-format mailbox. There *is not* any restriction on messages in > flight that they not contain lines starting with "From ". So if this > broke the archives, the fault is on the archives' side not Bruce's. Well, I'm not sure the problem is the delivery either, because the "From " line here occured in an attachment, not the email body itself. I think this is more a bug in Mhonarc's message separation, which is way too primitive. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Bruce Momjian wrote: > Dave Page wrote: > > On Mon, Mar 10, 2008 at 6:55 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > Would someone please fine out why the attached email from Heikki is not > > > appearing in a search of our archives? I tried the subject and a line > > > from the email and neither came up as a hit: > > > > > > http://search.postgresql.org/search?q=git+patch+review&m=1&l=&d=-1&s=r > > > > Eh? The message you included is at the top of the results when I use > > the query above. > > It shows up now for me too, but did not this morning, about 12 hours > ago. Of course it shows up, but it's the copy added to the 2008-03 mbox. Note the URL. The original still doesn't appear. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Bruce Momjian wrote: > > Dave Page wrote: > > > On Mon, Mar 10, 2008 at 6:55 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > > Would someone please fine out why the attached email from Heikki is not > > > > appearing in a search of our archives? I tried the subject and a line > > > > from the email and neither came up as a hit: > > > > > > > > http://search.postgresql.org/search?q=git+patch+review&m=1&l=&d=-1&s=r > > > > > > Eh? The message you included is at the top of the results when I use > > > the query above. > > > > It shows up now for me too, but did not this morning, about 12 hours > > ago. > > Of course it shows up, but it's the copy added to the 2008-03 mbox. > Note the URL. The original still doesn't appear. Oh, I see now, yea. So who is going to find out why that email is missing? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, Mar 11, 2008 at 1:02 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Of course it shows up, but it's the copy added to the 2008-03 mbox. > > Note the URL. The original still doesn't appear. > > Oh, I see now, yea. So who is going to find out why that email is > missing? Google seems to find the May 07 version, so it must be in the archives. Any ideas Magnus - being your code 'n' all :-) ? -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> The "From " to ">From " kluge is supposed to happen during delivery into >> a Unix-format mailbox. There *is not* any restriction on messages in >> flight that they not contain lines starting with "From ". So if this >> broke the archives, the fault is on the archives' side not Bruce's. > Well, I'm not sure the problem is the delivery either, because the > "From " line here occured in an attachment, not the email body itself. > I think this is more a bug in Mhonarc's message separation, which is way > too primitive. Whether it's an attachment or not is irrelevant --- the standards for this don't even know that there is such a thing as an attachment. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Well, I'm not sure the problem is the delivery either, because the > > "From " line here occured in an attachment, not the email body itself. > > I think this is more a bug in Mhonarc's message separation, which is way > > too primitive. > > Whether it's an attachment or not is irrelevant --- the standards for > this don't even know that there is such a thing as an attachment. Oh, RFC2822 clearly does -- it refers to RFC2045 through 20499, which define MIME. In any case, that "From " line is not defined by RFC822 either, it is purely an implementation matter. As such, Mhonarc would stand better if it followed the MIME standard which says that the message should be split at the terminators defined in the header. Only if no such terminators are defined the "From " line should be used. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Mar 11, 2008 at 10:57:43AM -0400, Tom Lane wrote: > > Whether it's an attachment or not is irrelevant --- the standards for > this don't even know that there is such a thing as an attachment. That's not quite right: MIME knows about message parts, which is really what we mean by "attachment". My bet is that the problem (I haven't looked at the pieces that are doing this) is that something (mhonarc?) isn't handling MIME message parts correctly (maybe during the transition to mailbox format?). A
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Tue, Mar 11, 2008 at 10:57:43AM -0400, Tom Lane wrote: >> Whether it's an attachment or not is irrelevant --- the standards for >> this don't even know that there is such a thing as an attachment. > That's not quite right: MIME knows about message parts, which is really what > we mean by "attachment". My bet is that the problem (I haven't looked at > the pieces that are doing this) is that something (mhonarc?) isn't handling > MIME message parts correctly (maybe during the transition to mailbox > format?). Right, the problem is exactly that Unix mbox format knows about "From " (and nothing else) as a message separator. Whatever code dumps messages into such a file *must* escape data lines beginning with "From ". regards, tom lane
Tom Lane wrote: > Andrew Sullivan <ajs@crankycanuck.ca> writes: > > That's not quite right: MIME knows about message parts, which is really what > > we mean by "attachment". My bet is that the problem (I haven't looked at > > the pieces that are doing this) is that something (mhonarc?) isn't handling > > MIME message parts correctly (maybe during the transition to mailbox > > format?). > > Right, the problem is exactly that Unix mbox format knows about "From " > (and nothing else) as a message separator. Whatever code dumps messages > into such a file *must* escape data lines beginning with "From ". It would be Majordomo's fault then. However, I think you'd find that if you complain about it to them, they will tell you that they correctly handle the "From " line in the message body but they don't touch it inside MIME parts. And they would be right ... Still, it would be a good idea to ask. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
bruce wrote: > Would someone please fine out why the attached email from Heikki is not > appearing in a search of our archives? I tried the subject and a line > from the email and neither came up as a hit: OK, it has been 24 hours since I reported some emails are not being archived and no one has even responded they are looking at the problem. Are we unable to manage our own archive search? If we can't, I will start linking to another archive from the TODO list. Right now, every time I need a URL for the TODO list I have to troll through the archives by date until I find the email. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > bruce wrote: > > Would someone please fine out why the attached email from Heikki is not > > appearing in a search of our archives? I tried the subject and a line > > from the email and neither came up as a hit: > > OK, it has been 24 hours since I reported some emails are not being > archived and no one has even responded they are looking at the problem. I can fix the problem by providing URLs with message ids. Would that help? You can find out the Message-Id trivially from the original email, and the URL would take you to the main message page complete with thread links and all. I haven't done it yet because I noticed that I'd need to create a directory with thousands of files and I'm not sure how is it going to work. I've been trying to generate something of the form msgid/f/e/fedup2007234234@momjian.us (i.e. creating subdirs for the first letters) but apparently Mhonarc doesn't let me do that. Perhaps I should just try without the subdir and see if it works. Thoughts? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Mar 11, 2008 at 9:35 PM, Bruce Momjian <bruce@momjian.us> wrote: > bruce wrote: > > Would someone please fine out why the attached email from Heikki is not > > appearing in a search of our archives? I tried the subject and a line > > from the email and neither came up as a hit: > > OK, it has been 24 hours since I reported some emails are not being > archived and no one has even responded they are looking at the problem. > > Are we unable to manage our own archive search? If we can't, I will > start linking to another archive from the TODO list. Right now, every > time I need a URL for the TODO list I have to troll through the archives > by date until I find the email. The people that know that code well all have day jobs and limited spare time. I don't think it's unreasonable for them to take more than 24 hours to respond. Google seems to work fine on our archives, so there is no need to link elsewhere. -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Dave Page wrote: > On Tue, Mar 11, 2008 at 9:35 PM, Bruce Momjian <bruce@momjian.us> wrote: > > bruce wrote: > > > Would someone please fine out why the attached email from Heikki is not > > > appearing in a search of our archives? I tried the subject and a line > > > from the email and neither came up as a hit: > > > > OK, it has been 24 hours since I reported some emails are not being > > archived and no one has even responded they are looking at the problem. > > > > Are we unable to manage our own archive search? If we can't, I will > > start linking to another archive from the TODO list. Right now, every > > time I need a URL for the TODO list I have to troll through the archives > > by date until I find the email. > > The people that know that code well all have day jobs and limited > spare time. I don't think it's unreasonable for them to take more than > 24 hours to respond. > > Google seems to work fine on our archives, so there is no need to link > elsewhere. Does Google link _into_ our archives --- ah, that does work and is a good work-around. Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > Bruce Momjian wrote: > > bruce wrote: > > > Would someone please fine out why the attached email from Heikki is not > > > appearing in a search of our archives? I tried the subject and a line > > > from the email and neither came up as a hit: > > > > OK, it has been 24 hours since I reported some emails are not being > > archived and no one has even responded they are looking at the problem. > > I can fix the problem by providing URLs with message ids. Would that > help? You can find out the Message-Id trivially from the original > email, and the URL would take you to the main message page complete with > thread links and all. I have been pasting the email subject line into the search and usually it is the first hit (when search works). > I haven't done it yet because I noticed that I'd need to create a > directory with thousands of files and I'm not sure how is it going to > work. I've been trying to generate something of the form > msgid/f/e/fedup2007234234@momjian.us > (i.e. creating subdirs for the first letters) but apparently Mhonarc > doesn't let me do that. > > Perhaps I should just try without the subdir and see if it works. I am thinking we need the searches to actually work. I can find the emails eventually, and using Google with site:archives.postgresql.org works pretty well for the time being. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Dave Page wrote: > On Tue, Mar 11, 2008 at 9:35 PM, Bruce Momjian <bruce@momjian.us> wrote: > > bruce wrote: > > > Would someone please fine out why the attached email from Heikki is not > > > appearing in a search of our archives? I tried the subject and a line > > > from the email and neither came up as a hit: > > > > OK, it has been 24 hours since I reported some emails are not being > > archived and no one has even responded they are looking at the problem. > > > > Are we unable to manage our own archive search? If we can't, I will > > start linking to another archive from the TODO list. Right now, every > > time I need a URL for the TODO list I have to troll through the archives > > by date until I find the email. > > The people that know that code well all have day jobs and limited > spare time. I don't think it's unreasonable for them to take more than > 24 hours to respond. > > Google seems to work fine on our archives, so there is no need to link > elsewhere. OK, but also consider I am not the only one who is doing searches, and I have no idea how long it has been broken. I am now finding 80% of emails missing for June, 2008, so it is a massive issue, not just a few emails. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 11 Mar 2008 19:02:25 -0300 Alvaro Herrera <alvherre@commandprompt.com> wrote: > Perhaps I should just try without the subdir and see if it works. I am wondering if we should bail out of Mhonarc all together. Do we actually need it? We have the actual mbox files right? Couldn't we build our own parser for whatever? As a note, mailman also uses mbox files. We could try its archive generation capability. I am not suggesting we move to mailman, just that we use and existing tool that may work better to generate the archives themselves. Either way, this is certainly not urgent, although it is important. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit
Joshua D. Drake wrote: -- Start of PGP signed section. > On Tue, 11 Mar 2008 19:02:25 -0300 > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > Perhaps I should just try without the subdir and see if it works. > > I am wondering if we should bail out of Mhonarc all together. Do we > actually need it? We have the actual mbox files right? Couldn't we > build our own parser for whatever? > > As a note, mailman also uses mbox files. We could try its archive > generation capability. I am not suggesting we move to mailman, just > that we use and existing tool that may work better to generate the > archives themselves. > > Either way, this is certainly not urgent, although it is important. FYI, I have set up a custom Google search site for archives.postgresql.org: http://www.google.com/coop/cse?cx=008259951665801127283%3A7jpbk6al2qu -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Alvaro Herrera wrote: > > I can fix the problem by providing URLs with message ids. Would that > > help? You can find out the Message-Id trivially from the original > > email, and the URL would take you to the main message page complete with > > thread links and all. > > I have been pasting the email subject line into the search and usually > it is the first hit (when search works). I am not saying we should continue to have a broken search -- I only say that I can fix your particular use case. If you're not interested in it I can easily push the issue down in my TODO list. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
bruce wrote: > Dave Page wrote: > > On Tue, Mar 11, 2008 at 9:35 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > bruce wrote: > > > > Would someone please fine out why the attached email from Heikki is not > > > > appearing in a search of our archives? I tried the subject and a line > > > > from the email and neither came up as a hit: > > > > > > OK, it has been 24 hours since I reported some emails are not being > > > archived and no one has even responded they are looking at the problem. > > > > > > Are we unable to manage our own archive search? If we can't, I will > > > start linking to another archive from the TODO list. Right now, every > > > time I need a URL for the TODO list I have to troll through the archives > > > by date until I find the email. > > > > The people that know that code well all have day jobs and limited > > spare time. I don't think it's unreasonable for them to take more than > > 24 hours to respond. > > > > Google seems to work fine on our archives, so there is no need to link > > elsewhere. > > Does Google link _into_ our archives --- ah, that does work and is a > good work-around. Thanks. OK, now Google search isn't finding this email either: http://archives.postgresql.org/pgsql-hackers/2007-08/msg00055.php See this search: http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&hs=eqZ&q=Re%3A+clog_buffers+to+64+in+8.3+site%3Aarchives.postgresql.org&btnG=Search It sees this email: http://archives.postgresql.org/pgsql-hackers/2007-09/msg00636.php but not the emails from August on the same subject. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > Bruce Momjian wrote: > > Alvaro Herrera wrote: > > > > I can fix the problem by providing URLs with message ids. Would that > > > help? You can find out the Message-Id trivially from the original > > > email, and the URL would take you to the main message page complete with > > > thread links and all. > > > > I have been pasting the email subject line into the search and usually > > it is the first hit (when search works). > > I am not saying we should continue to have a broken search -- I only say > that I can fix your particular use case. If you're not interested in it > I can easily push the issue down in my TODO list. I don't feel it is right that you have to push up a TODO item just because the search is broken. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Joshua D. Drake wrote: > I am wondering if we should bail out of Mhonarc all together. Do we > actually need it? We have the actual mbox files right? Couldn't we > build our own parser for whatever? > > As a note, mailman also uses mbox files. We could try its archive > generation capability. Mailman archives are just as crappy, if not crappier. And they know it. It's based on Hypermail; I note that Hypermail's latest version happened on 2003. The Mailman guys are rethinking the issue; see http://wiki.list.org/display/DEV/ModernArchiving Somebody suggests Lurker as one alternative: http://lurker.sourceforge.net/ It is a very different interface. Perhaps we could try it as an experiment. I have seen the Debian lists under it and it feels really martian. I don't want to lose Mhonarc, at least not for the moment. It is powerful and customizable and has served us reasonably well for a very long time. (Longer than most of us, actually.) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Bruce Momjian wrote: > Alvaro Herrera wrote: > > I am not saying we should continue to have a broken search -- I only say > > that I can fix your particular use case. If you're not interested in it > > I can easily push the issue down in my TODO list. > > I don't feel it is right that you have to push up a TODO item just > because the search is broken. Well, then it means somebody else has to push up the "fix the search" TODO item ;-) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I don't want to lose Mhonarc, at least not for the moment. It is > powerful and customizable and has served us reasonably well for a very > long time. (Longer than most of us, actually.) Agreed. One of the problems with moving off it is that if we change to something else that breaks mbox files at different points, we will invalidate archive URLs. We went there once already by accident and it was not fun. The From-line problem is minor anyway. I think the real issue is why the heck is our text search missing some old mail? AFAIK no one has a clue where that problem is, so it's premature to blame any particular component. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alvaro, have you checked with the mhonarc folks about the From-line issue? - --On Tuesday, March 11, 2008 22:03:46 -0300 Alvaro Herrera <alvherre@commandprompt.com> wrote: > Joshua D. Drake wrote: > >> I am wondering if we should bail out of Mhonarc all together. Do we >> actually need it? We have the actual mbox files right? Couldn't we >> build our own parser for whatever? >> >> As a note, mailman also uses mbox files. We could try its archive >> generation capability. > > Mailman archives are just as crappy, if not crappier. And they know it. > It's based on Hypermail; I note that Hypermail's latest version happened > on 2003. The Mailman guys are rethinking the issue; see > http://wiki.list.org/display/DEV/ModernArchiving > > Somebody suggests Lurker as one alternative: > http://lurker.sourceforge.net/ > > It is a very different interface. Perhaps we could try it as an > experiment. I have seen the Debian lists under it and it feels really > martian. > > I don't want to lose Mhonarc, at least not for the moment. It is > powerful and customizable and has served us reasonably well for a very > long time. (Longer than most of us, actually.) > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > > -- > Sent via pgsql-www mailing list (pgsql-www@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-www - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFH1zEy4QvfyHIvDvMRAiHVAJ9JKBeJueMskm4TwHGyL0y48PIELQCgzC4N BhGEZe2qt4oyrLnWsjFVy04= =ZjTJ -----END PGP SIGNATURE-----
> > Perhaps I should just try without the subdir and see if it works. > > I am wondering if we should bail out of Mhonarc all together. Do we > actually need it? We have the actual mbox files right? Couldn't we > build our own parser for whatever? Let's not throw it out until we know that's where the problem is. A lot of work has been but into our mhonarc instal overthe years to make it fit with our website etc. Dave has looked at the custom parser thing, but it's a lot more work than you'd initially think. There's a zillion corner-cases. /Magnus
> bruce wrote: > > Would someone please fine out why the attached email from Heikki is not > > appearing in a search of our archives? I tried the subject and a line > > from the email and neither came up as a hit: > > OK, it has been 24 hours since I reported some emails are not being > archived and no one has even responded they are looking at the problem. > > Are we unable to manage our own archive search? If we can't, I will > start linking to another archive from the TODO list. Right now, every > time I need a URL for the TODO list I have to troll through the archives > by date until I find the email. Obviously I wil look at this as soon as I can. But as Dave has already pointed out, most of us has a dayjob that has to beprioritised, so everything cannot be done within 24 hours. I was hopeing somebody else would have time to look at it meanwhile,but so far nobody has had the time. If this is not acceptable then the answer to your question is no, we currentlycan't do it. I notice, however, that when we have a similar issue with for example the patch queue not being updated for many many weeks,that is considered a *feature*, and not a problem. Are we not able to manage our own patch queue? If we can't, perhapswe should stop all development until we can be sure it's always up-to-date? /Magnus
On Tue, Mar 11, 2008 at 10:20 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > On Tue, 11 Mar 2008 19:02:25 -0300 > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > Perhaps I should just try without the subdir and see if it works. > > I am wondering if we should bail out of Mhonarc all together. Do we > actually need it? We have the actual mbox files right? Couldn't we > build our own parser for whatever? As I've mentioned a number of times, I've spent quite a bit of time doing that already - I just don't have the spare cycles to finish right now. We currently have a parser/archiver that will incrementally archive messages from the (growing) mboxes into a database, and a web frontend which resolves many of the problems with the current archives. There are optimisation/performance issues to be solved, as well as some PHP crashes that seem to manifest themselves only on the FreeBSD production server. The mime handling could also use some improvement to properly reconstruct multi-part messages, but in reality I'm not sure we ever have any where that's actually an issue. If anyone else wants to pickup where I've left off, please let me know. -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
On Wed, Mar 12, 2008 at 1:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I don't want to lose Mhonarc, at least not for the moment. It is > > powerful and customizable and has served us reasonably well for a very > > long time. (Longer than most of us, actually.) > > Agreed. One of the problems with moving off it is that if we change to > something else that breaks mbox files at different points, we will > invalidate archive URLs. We went there once already by accident and it > was not fun. The replacement archives system I've been working on takes it's data from the existing monthly mboxes, and although it uses it's own URL scheme, it does understand and accept the old URLs. That (and fixing the thread-breaks-over-a-month issue) we pretty much my top requirements. -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Magnus Hagander wrote: > > bruce wrote: > > > Would someone please fine out why the attached email from Heikki is not > > > appearing in a search of our archives? I tried the subject and a line > > > from the email and neither came up as a hit: > > > > OK, it has been 24 hours since I reported some emails are not being > > archived and no one has even responded they are looking at the problem. > > > > Are we unable to manage our own archive search? If we can't, I will > > start linking to another archive from the TODO list. Right now, every > > time I need a URL for the TODO list I have to troll through the archives > > by date until I find the email. > > Obviously I wil look at this as soon as I can. But as Dave has > already pointed out, most of us has a dayjob that has to be > prioritised, so everything cannot be done within 24 hours. I > was hopeing somebody else would have time to look at it meanwhile, > but so far nobody has had the time. If this is not acceptable > then the answer to your question is no, we currently can't do > it. OK, so should we look to outsource our searching? (Of cource, Google isn't indexing all the emails either so I am worried about outsourcing too.) > I notice, however, that when we have a similar issue with for > example the patch queue not being updated for many many weeks, > that is considered a *feature*, and not a problem. Are we not > able to manage our own patch queue? If we can't, perhaps we > should stop all development until we can be sure it's always > up-to-date? The patch emails have always been available and online. What wasn't done is processing them as TODO items and applying, and that isn't going to be done for weeks still, I bet. We don't have an option to outsource that, but we do have the option for search. Also, search is a public infrastructure issue, while the patch queue is a development tool --- I don't consider them to have the same reliability requirements. Also, I have been working on the patch queue for a week, and so has Tom. The search problem, a more public infrastructure with a higher promise of reliability, isn't even being worked on yet. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Marc G. Fournier wrote: > Alvaro, have you checked with the mhonarc folks about the From-line issue? Nope, not yet. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Bruce Momjian wrote: > bruce wrote: > > Would someone please fine out why the attached email from Heikki is not > > appearing in a search of our archives? I tried the subject and a line > > from the email and neither came up as a hit: > > OK, it has been 24 hours since I reported some emails are not being > archived and no one has even responded they are looking at the problem. I'm looking at the problem. On a quick glance it is obvious that there's something bogus going on -- the search database only contains 472 emails for May 2007, but Mhonarc reports 1187. I have to go chase something at the bank right now, I'll update you later. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, Mar 12, 2008 at 1:09 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I'm looking at the problem. On a quick glance it is obvious that > there's something bogus going on -- the search database only contains > 472 emails for May 2007, but Mhonarc reports 1187. Confirmed. I added a test mode to a copy of the archives indexer, and running that it claims it would index a further 715 messages, which would give us a total of 1187. So I guess the next step is to try running out of test mode to see if the data actually makes it into the index now, but I didn't want to do that and stomp on any testing you're doing. -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
On Wed, Mar 12, 2008 at 2:14 PM, Dave Page <dpage@pgadmin.org> wrote: > Confirmed. I added a test mode to a copy of the archives indexer, and > running that it claims it would index a further 715 messages, which > would give us a total of 1187. > > So I guess the next step is to try running out of test mode to see if > the data actually makes it into the index now, but I didn't want to do > that and stomp on any testing you're doing. OK, so running it properly has added those missing 715 messages. I think we need to run a full index run which should restore any missing pages, but before we do that, I'd kinda like to gather any ideas on why this has happened before removing any evidence. My best guess is simply that the indexer failed for some time and noone noticed for a few weeks. By the time it was re-run, some messages that it had missed were outside the timeframe that an incremental crawl would have picked up (the current, plus last month). Thoughts? Stefan; any thoughts on how we might monitor that the indexer has been running correctly? I assume that should be fairly easy if we have it drop a timestamp someplace? -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Dave Page wrote: > On Wed, Mar 12, 2008 at 2:14 PM, Dave Page <dpage@pgadmin.org> wrote: >> Confirmed. I added a test mode to a copy of the archives indexer, and >> running that it claims it would index a further 715 messages, which >> would give us a total of 1187. >> >> So I guess the next step is to try running out of test mode to see if >> the data actually makes it into the index now, but I didn't want to do >> that and stomp on any testing you're doing. > > OK, so running it properly has added those missing 715 messages. I > think we need to run a full index run which should restore any missing > pages, but before we do that, I'd kinda like to gather any ideas on > why this has happened before removing any evidence. hmm weird ... > > My best guess is simply that the indexer failed for some time and > noone noticed for a few weeks. By the time it was re-run, some > messages that it had missed were outside the timeframe that an > incremental crawl would have picked up (the current, plus last month). > Thoughts? > > Stefan; any thoughts on how we might monitor that the indexer has been > running correctly? I assume that should be fairly easy if we have it > drop a timestamp someplace? yes - iirc there is even some discussion on that on pmt - will work something out for that in the next days. Stefan
On Wed, Mar 12, 2008 at 03:25:00PM +0000, Dave Page wrote: > On Wed, Mar 12, 2008 at 2:14 PM, Dave Page <dpage@pgadmin.org> wrote: > > Confirmed. I added a test mode to a copy of the archives indexer, and > > running that it claims it would index a further 715 messages, which > > would give us a total of 1187. > > > > So I guess the next step is to try running out of test mode to see if > > the data actually makes it into the index now, but I didn't want to do > > that and stomp on any testing you're doing. > > OK, so running it properly has added those missing 715 messages. I > think we need to run a full index run which should restore any missing > pages, but before we do that, I'd kinda like to gather any ideas on > why this has happened before removing any evidence. > > My best guess is simply that the indexer failed for some time and > noone noticed for a few weeks. By the time it was re-run, some > messages that it had missed were outside the timeframe that an > incremental crawl would have picked up (the current, plus last month). > Thoughts? > > Stefan; any thoughts on how we might monitor that the indexer has been > running correctly? I assume that should be fairly easy if we have it > drop a timestamp someplace? I admint to having a ticket on pmt to get that set up. Actually, it might be better to look into the actual database, and find the latest email indexed? If it's older than <nn> something is wrong. It oculd be the archives that's wrong and not indexer of course, but the point is we'll get notified and someone can look into it. Do you think we need to track it on a per-list basis, or just check for the latest timestamp across all lists? //Magnus
"Dave Page" <dpage@pgadmin.org> writes: > My best guess is simply that the indexer failed for some time and > noone noticed for a few weeks. By the time it was re-run, some > messages that it had missed were outside the timeframe that an > incremental crawl would have picked up (the current, plus last month). > Thoughts? That would explain a contiguous range of messages that were not indexed, but is that what we have? I think the thing to do before you destroy the old index is make a list of which messages were indexed and which weren't. regards, tom lane
"Dave Page" <dpage@pgadmin.org> writes: > On Wed, Mar 12, 2008 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That would explain a contiguous range of messages that were not indexed, >> but is that what we have? > Looking at the debug output, the messages that were missed were all contiguous: OK, that seems to support your theory. Might as well go ahead and reindex. +1 for getting some monitoring in there somewhere. regards, tom lane
On Wed, Mar 12, 2008 at 4:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > OK, that seems to support your theory. Might as well go ahead and > reindex. +1 for getting some monitoring in there somewhere. Yeah. Tried that - on the first attempt it died with a deadlock: ---------------- [search@community2 search]$ php archives.php -f Indexing list 42 (atlpug) Indexing list 43 (lapug) Indexing list 41 (pdxpug) Indexing list 40 (persianpug) PHP Warning: pg_execute(): Query failed: ERROR: deadlock detected DETAIL: Process 9245 waits for ShareLock on transaction 378698; blocked by process 9297. Process 9297 waits for ShareLock on transaction 378690; blocked by process 9245. CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."lists" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x" in /home/search/portal/tools/search/classes/SearchDB.class.php on line 60 #0 SearchDB::mydie(Query failed: 0.94209700 1205339104 ) called at [/home/search/portal/tools/search/classes/SearchDB.class.php:61] #1 SearchDB::ExecutePrepared(0.94209700 1205339104, Array ([0] => 0,[1] => 1183059217,[2] => PostgreSQL & our ML,[3] => Mohsen Pahlevanzadeh <mohsen@pahlevanzadeh.org>,[4] => سÙا٠بر دÙستا٠ÙØ§Ø±Ø³Û Ø²Ø¨Ø§ÙØ Ø§Û٠اÛÙ Û٠ر٠زد٠تا ÚÙد تا ÙÚ(c)ت٠ر٠Ûاد Ø¢Ùر بش٠: Û±. Ùرار٠ÙØ§Ø±Ø³Û ÙÙشت٠بشÙ. Û². Ùرار٠جاÛÛ Ø¨Ø´Ù ØªØ§ Ú(c)Ø§Ø±Ø¨Ø±Ø§Û pg دÛÚ¯Ù Ùر٠ر٠ML ÙØ§Û Ø²Ø¨Ø§ÙâÙØ§Û Ø¯Ûگ٠پست بگذارÙد. Û³. ترÙÛج DB ÙØ§Û FOSSÛ Ø¯Ø± اÛراÙ. Û´. اگر بش٠ترج٠٠Doc اÙ٠ب٠ÙØ§Ø±Ø³Û --Ù Øس٠-- ------------------------- Mohsen Pahlevanzadeh email address : mohsen ( at ) pahlevanzadeh ( dot ) org web site : http://pahlevanzadeh.org IRC IM : m_pahlevanzadeh yahoo IM : linuxorbsd ---------------------------- ,[5] => PostgreSQL & our ML,[6] => سÙا٠بر دÙستا٠ÙØ§Ø±Ø³Û Ø²Ø¨Ø§ÙØ Ø§Û٠اÛÙ Û٠ر٠زد٠تا ÚÙد تا ÙÚ(c)ت٠ر٠Ûاد Ø¢Ùر بش٠: Û±. Ùرار٠ÙØ§Ø±Ø³Û ÙÙشت٠بشÙ. Û². Ùرار٠جاÛÛ Ø¨Ø´Ù ØªØ§ Ú(c)Ø§Ø±Ø¨Ø±Ø§Û pg دÛÚ¯Ù Ùر٠ر٠ML ÙØ§Û Ø²Ø¨Ø§ÙâÙØ§Û Ø¯Ûگ٠پست بگذارÙد. Û³. ترÙÛج DB ÙØ§Û FOSSÛ Ø¯Ø± اÛراÙ. Û´. اگر بش٠ترج٠٠Doc اÙ٠ب٠ÙØ§Ø±Ø³Û --Ù Øس٠-- ------------------------- Mohsen Pahlevanzadeh email address : mohsen ( at ) pahlevanzadeh ( dot ) org web site : http://pahlevanzadeh.org IRC IM : m_pahlevanzadeh yahoo IM : linuxorbsd ---------------------------- )) called at [/home/search/portal/tools/search/classes/ArchiveIndexer.class.php:114] #2 ArchiveIndexer->IndexSinglePage(40, persianpug, 2007, 6, 0) called at [/home/search/portal/tools/search/classes/ArchiveIndexer.class.php:62] #3 ArchiveIndexer->IndexMonth(40, persianpug, 2007, 6) called at [/home/search/portal/tools/search/classes/ArchiveIndexer.class.php:40] #4 ArchiveIndexer->Index(1, , -1, -1) called at [/home/search/portal/tools/search/archives.php:28] Query failed: 0.94209700 1205339104 ---------------- Running it again now and it's up to 4000+ messages indexed (ie. ones that weren't already indexed) and is still working on pgsql-advocacy. We'll see how it goes.... -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Tom Lane wrote: > "Dave Page" <dpage@pgadmin.org> writes: >> On Wed, Mar 12, 2008 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That would explain a contiguous range of messages that were not indexed, >>> but is that what we have? > >> Looking at the debug output, the messages that were missed were all contiguous: > > OK, that seems to support your theory. Might as well go ahead and > reindex. +1 for getting some monitoring in there somewhere. yeah we will work on that and add some new ones to our current list of 354 active checks ;-) Stefan
On Wed, Mar 12, 2008 at 4:55 PM, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote: > > Tom Lane wrote: > > "Dave Page" <dpage@pgadmin.org> writes: > >> On Wed, Mar 12, 2008 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> That would explain a contiguous range of messages that were not indexed, > >>> but is that what we have? > > > >> Looking at the debug output, the messages that were missed were all contiguous: > > > > OK, that seems to support your theory. Might as well go ahead and > > reindex. +1 for getting some monitoring in there somewhere. > > yeah we will work on that and add some new ones to our current list of > 354 active checks ;-) One thing that crosses my mind - perhaps we should run a full index once per week to try to catch this sort of thing in the future? BTW, up to 13500 messages now... -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Dave Page wrote: > On Wed, Mar 12, 2008 at 4:55 PM, Stefan Kaltenbrunner > <stefan@kaltenbrunner.cc> wrote: > > > > Tom Lane wrote: > > > "Dave Page" <dpage@pgadmin.org> writes: > > >> On Wed, Mar 12, 2008 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >>> That would explain a contiguous range of messages that were not indexed, > > >>> but is that what we have? > > > > > >> Looking at the debug output, the messages that were missed were all contiguous: > > > > > > OK, that seems to support your theory. Might as well go ahead and > > > reindex. +1 for getting some monitoring in there somewhere. > > > > yeah we will work on that and add some new ones to our current list of > > 354 active checks ;-) > > One thing that crosses my mind - perhaps we should run a full index > once per week to try to catch this sort of thing in the future? Seems running it weekly would mean failures would disappear and not be diagnosed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, Mar 12, 2008 at 5:16 PM, Bruce Momjian <bruce@momjian.us> wrote: > Dave Page wrote: > > One thing that crosses my mind - perhaps we should run a full index > > once per week to try to catch this sort of thing in the future? > > Seems running it weekly would mean failures would disappear and not be > diagnosed. There is that - but then at least the index should be up to date within 7 days at all times regardless of any corner cases that we might otherwise not notice for some time. BTW, the reindexing just finished - it added 31,269 messages that were previously missing :-( -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Dave Page wrote: > On Wed, Mar 12, 2008 at 5:16 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Dave Page wrote: > > > One thing that crosses my mind - perhaps we should run a full index > > > once per week to try to catch this sort of thing in the future? > > > > Seems running it weekly would mean failures would disappear and not be > > diagnosed. > > There is that - but then at least the index should be up to date > within 7 days at all times regardless of any corner cases that we > might otherwise not notice for some time. > > > BTW, the reindexing just finished - it added 31,269 messages that were > previously missing :-( 31k emails. Wow. Thanks, I just checked an email that use to be missing and it is there now. Thanks! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +