Thread: Mail archive indexes are broken, URLs too
When Marc fixed the message-boundary pattern and regenerated the archives, many of the existing messages changed URLs because they got assigned slightly different numbers. I notice that the archive search engine hasn't yet tracked this change --- if you do a search and click on a link to a message, you'll arrive at a message close to the one you want but probably not quite it. Regenerating the archive indexes is presumably not hard, but there's a bigger problem: for awhile now many of us have been in the habit of citing old discussions by archive URLs. All those links are now broken too, and I can't think of any easy way to fix them. And then there's Google etc. I wonder if it'd be better to revert the regeneration of the archives, and only apply the new message-boundary pattern to future messages. regards, tom lane
On Sun, 16 Jul 2006, Tom Lane wrote: > When Marc fixed the message-boundary pattern and regenerated the > archives, many of the existing messages changed URLs because they > got assigned slightly different numbers. I notice that the archive > search engine hasn't yet tracked this change --- if you do a search > and click on a link to a message, you'll arrive at a message close > to the one you want but probably not quite it. > > Regenerating the archive indexes is presumably not hard, but there's > a bigger problem: for awhile now many of us have been in the habit > of citing old discussions by archive URLs. All those links are now > broken too, and I can't think of any easy way to fix them. And then > there's Google etc. > > I wonder if it'd be better to revert the regeneration of the archives, > and only apply the new message-boundary pattern to future messages. Nope, for one simple reason ... if, for some reason, at some point in the future, we have to regenerate everything anyway (ie. the last time we did a major template change for the archives), all the #'ng is going to end up reverting back to what it is now ... so we'd only be 'delaying the inevitable' ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
On Jul 16, 2006, at 2:43 PM, Marc G. Fournier wrote: > On Sun, 16 Jul 2006, Tom Lane wrote: >> When Marc fixed the message-boundary pattern and regenerated the >> archives, many of the existing messages changed URLs because they >> got assigned slightly different numbers. I notice that the archive >> search engine hasn't yet tracked this change --- if you do a search >> and click on a link to a message, you'll arrive at a message close >> to the one you want but probably not quite it. >> >> Regenerating the archive indexes is presumably not hard, but there's >> a bigger problem: for awhile now many of us have been in the habit >> of citing old discussions by archive URLs. All those links are now >> broken too, and I can't think of any easy way to fix them. And then >> there's Google etc. >> >> I wonder if it'd be better to revert the regeneration of the >> archives, >> and only apply the new message-boundary pattern to future messages. > > Nope, for one simple reason ... if, for some reason, at some point > in the future, we have to regenerate everything anyway (ie. the > last time we did a major template change for the archives), all the > #'ng is going to end up reverting back to what it is now ... so > we'd only be 'delaying the inevitable' ... This is a problem for most mailing lists, but I think it's a critical one for us since we depend very, very heavily on the archives. Can we change the lists so that they will generate a UUID and add it to message headers, and then allow the archive software to key off of that? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Tom Lane wrote: > When Marc fixed the message-boundary pattern and regenerated the > archives, many of the existing messages changed URLs because they > got assigned slightly different numbers. I notice that the archive > search engine hasn't yet tracked this change --- if you do a search > and click on a link to a message, you'll arrive at a message close > to the one you want but probably not quite it. > > Regenerating the archive indexes is presumably not hard, but there's > a bigger problem: for awhile now many of us have been in the habit > of citing old discussions by archive URLs. All those links are now > broken too, and I can't think of any easy way to fix them. And then > there's Google etc. > > I wonder if it'd be better to revert the regeneration of the archives, > and only apply the new message-boundary pattern to future messages. Agreed. There have been no changes since we discussed this. The best proposal was to renumber the newly-found items to the end of the numeric range for the pre-July 2006 archives, and to properly number July 2006 and later archives. And this date range has to be enbedded in the archive script so if it is ever run again, this behavior continues to happen. The longer we take to fix this, the more likely that people are creating URL's that refer to the existing pre-July 2006 numbering which should change. It needs to be fixed quickly. And we can't just leave it alone because old archive emails have URLs that point to now-incorrect numbers, and there is no good way to fix that everywhere are emails are archived. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Is anyone working on this? Marc? If not, who can make these modifications to the archive numbering? --------------------------------------------------------------------------- Bruce Momjian wrote: > Tom Lane wrote: > > When Marc fixed the message-boundary pattern and regenerated the > > archives, many of the existing messages changed URLs because they > > got assigned slightly different numbers. I notice that the archive > > search engine hasn't yet tracked this change --- if you do a search > > and click on a link to a message, you'll arrive at a message close > > to the one you want but probably not quite it. > > > > Regenerating the archive indexes is presumably not hard, but there's > > a bigger problem: for awhile now many of us have been in the habit > > of citing old discussions by archive URLs. All those links are now > > broken too, and I can't think of any easy way to fix them. And then > > there's Google etc. > > > > I wonder if it'd be better to revert the regeneration of the archives, > > and only apply the new message-boundary pattern to future messages. > > Agreed. There have been no changes since we discussed this. > > The best proposal was to renumber the newly-found items to the end of > the numeric range for the pre-July 2006 archives, and to properly number > July 2006 and later archives. And this date range has to be enbedded in > the archive script so if it is ever run again, this behavior continues > to happen. > > The longer we take to fix this, the more likely that people are creating > URL's that refer to the existing pre-July 2006 numbering which should > change. It needs to be fixed quickly. > > And we can't just leave it alone because old archive emails have URLs > that point to now-incorrect numbers, and there is no good way to fix > that everywhere are emails are archived. > > -- > Bruce Momjian bruce@momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Is anyone working on this? Marc? If not, who can make these > modifications to the archive numbering? I believe Marc is the only one that can at last I heard on this, he disagreed with rolling back the change. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Joshua D. Drake wrote: > Bruce Momjian wrote: > > Is anyone working on this? Marc? If not, who can make these > > modifications to the archive numbering? > > I believe Marc is the only one that can at last I heard on this, he > disagreed with rolling back the change. I have heard no reason he doesn't like the change, and unless he can convince most of us, it is time to make the change. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Joshua D. Drake wrote: >> Bruce Momjian wrote: >>> Is anyone working on this? Marc? If not, who can make these >>> modifications to the archive numbering? >> I believe Marc is the only one that can at last I heard on this, he >> disagreed with rolling back the change. > > I have heard no reason he doesn't like the change, and unless he can > convince most of us, it is time to make the change. > Marc wrote: Nope, for one simple reason ... if, for some reason, at some point in the future, we have to regenerate everything anyway (ie. the last time we did a major template change for the archives), all the #'ng is going to end up reverting back to what it is now ... so we'd only be 'delaying the inevitable' ... On July 17th. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
> Marc wrote: > > > Nope, for one simple reason ... if, for some reason, at some point in > the future, we have to regenerate everything anyway (ie. the last time > we did a major template change for the archives), all the #'ng is going > to end up reverting back to what it is now ... so we'd only be 'delaying > the inevitable' ... > > On July 17th. > > Joshua D. Drake If you look below you will see my idea was to hack the script to always use the method of putting newly found items numerically at the end for pre-July 2006 dumps. That addresses Marc's concern. Marc hasn't responded so I assume he is busy and will hack on this when he gets back. --------------------------------------------------------------------------- Bruce Momjian wrote: > Tom Lane wrote: > > When Marc fixed the message-boundary pattern and regenerated the > > archives, many of the existing messages changed URLs because they > > got assigned slightly different numbers. I notice that the archive > > search engine hasn't yet tracked this change --- if you do a search > > and click on a link to a message, you'll arrive at a message close > > to the one you want but probably not quite it. > > > > Regenerating the archive indexes is presumably not hard, but there's > > a bigger problem: for awhile now many of us have been in the habit > > of citing old discussions by archive URLs. All those links are now > > broken too, and I can't think of any easy way to fix them. And then > > there's Google etc. > > > > I wonder if it'd be better to revert the regeneration of the archives, > > and only apply the new message-boundary pattern to future messages. > > Agreed. There have been no changes since we discussed this. > > The best proposal was to renumber the newly-found items to the end of > the numeric range for the pre-July 2006 archives, and to properly number > July 2006 and later archives. And this date range has to be enbedded in > the archive script so if it is ever run again, this behavior continues > to happen. > > The longer we take to fix this, the more likely that people are creating > URL's that refer to the existing pre-July 2006 numbering which should > change. It needs to be fixed quickly. > > And we can't just leave it alone because old archive emails have URLs > that point to now-incorrect numbers, and there is no good way to fix > that everywhere are emails are archived. > > -- > Bruce Momjian bruce@momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 1 Aug 2006, Bruce Momjian wrote: > >> Marc wrote: >> >> >> Nope, for one simple reason ... if, for some reason, at some point in >> the future, we have to regenerate everything anyway (ie. the last time >> we did a major template change for the archives), all the #'ng is going >> to end up reverting back to what it is now ... so we'd only be 'delaying >> the inevitable' ... >> >> On July 17th. >> >> Joshua D. Drake > > If you look below you will see my idea was to hack the script to always > use the method of putting newly found items numerically at the end for > pre-July 2006 dumps. That addresses Marc's concern. > > Marc hasn't responded so I assume he is busy and will hack on this when > he gets back. Yup, been busy dealing with an Adaptec driver issue, will try and get something hacked up over the coming weekend, sorry for the delay ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
Marc G. Fournier wrote: > On Tue, 1 Aug 2006, Bruce Momjian wrote: > > > > >> Marc wrote: > >> > >> > >> Nope, for one simple reason ... if, for some reason, at some point in > >> the future, we have to regenerate everything anyway (ie. the last time > >> we did a major template change for the archives), all the #'ng is going > >> to end up reverting back to what it is now ... so we'd only be 'delaying > >> the inevitable' ... > >> > >> On July 17th. > >> > >> Joshua D. Drake > > > > If you look below you will see my idea was to hack the script to always > > use the method of putting newly found items numerically at the end for > > pre-July 2006 dumps. That addresses Marc's concern. > > > > Marc hasn't responded so I assume he is busy and will hack on this when > > he gets back. > > Yup, been busy dealing with an Adaptec driver issue, will try and get > something hacked up over the coming weekend, sorry for the delay ... Thanks. When talking via IM I didn't get the sense whether you agreed that this was a good idea or not, so I figured I should ask on the lists so others know it is in process. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 1 Aug 2006, Bruce Momjian wrote: > Marc G. Fournier wrote: >> On Tue, 1 Aug 2006, Bruce Momjian wrote: >> >>> >>>> Marc wrote: >>>> >>>> >>>> Nope, for one simple reason ... if, for some reason, at some point in >>>> the future, we have to regenerate everything anyway (ie. the last time >>>> we did a major template change for the archives), all the #'ng is going >>>> to end up reverting back to what it is now ... so we'd only be 'delaying >>>> the inevitable' ... >>>> >>>> On July 17th. >>>> >>>> Joshua D. Drake >>> >>> If you look below you will see my idea was to hack the script to always >>> use the method of putting newly found items numerically at the end for >>> pre-July 2006 dumps. That addresses Marc's concern. >>> >>> Marc hasn't responded so I assume he is busy and will hack on this when >>> he gets back. >> >> Yup, been busy dealing with an Adaptec driver issue, will try and get >> something hacked up over the coming weekend, sorry for the delay ... > > Thanks. When talking via IM I didn't get the sense whether you agreed > that this was a good idea or not, so I figured I should ask on the lists > so others know it is in process. Oh, I still don't think its a good idea, but understand why ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
Just shutdown rsync while I rebuild the archives for the 'old/new' scheme, where old is pre-July 2006 ... will post once its been all rebuilt ... On Tue, 1 Aug 2006, Bruce Momjian wrote: > > Is anyone working on this? Marc? If not, who can make these > modifications to the archive numbering? > > --------------------------------------------------------------------------- > > Bruce Momjian wrote: >> Tom Lane wrote: >>> When Marc fixed the message-boundary pattern and regenerated the >>> archives, many of the existing messages changed URLs because they >>> got assigned slightly different numbers. I notice that the archive >>> search engine hasn't yet tracked this change --- if you do a search >>> and click on a link to a message, you'll arrive at a message close >>> to the one you want but probably not quite it. >>> >>> Regenerating the archive indexes is presumably not hard, but there's >>> a bigger problem: for awhile now many of us have been in the habit >>> of citing old discussions by archive URLs. All those links are now >>> broken too, and I can't think of any easy way to fix them. And then >>> there's Google etc. >>> >>> I wonder if it'd be better to revert the regeneration of the archives, >>> and only apply the new message-boundary pattern to future messages. >> >> Agreed. There have been no changes since we discussed this. >> >> The best proposal was to renumber the newly-found items to the end of >> the numeric range for the pre-July 2006 archives, and to properly number >> July 2006 and later archives. And this date range has to be enbedded in >> the archive script so if it is ever run again, this behavior continues >> to happen. >> >> The longer we take to fix this, the more likely that people are creating >> URL's that refer to the existing pre-July 2006 numbering which should >> change. It needs to be fixed quickly. >> >> And we can't just leave it alone because old archive emails have URLs >> that point to now-incorrect numbers, and there is no good way to fix >> that everywhere are emails are archived. >> >> -- >> Bruce Momjian bruce@momjian.us >> EnterpriseDB http://www.enterprisedb.com >> >> + If your life is a hard drive, Christ can be your backup. + >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org > > -- > Bruce Momjian bruce@momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
'k, rsync is back up ... for a short period, part of the archives will disappear, but a large portion of it is re-generated, and figured may as well let the 'feed server' start downloading now :) On Wed, 9 Aug 2006, Marc G. Fournier wrote: > > Just shutdown rsync while I rebuild the archives for the 'old/new' scheme, > where old is pre-July 2006 ... > > will post once its been all rebuilt ... > > On Tue, 1 Aug 2006, Bruce Momjian wrote: > >> >> Is anyone working on this? Marc? If not, who can make these >> modifications to the archive numbering? >> >> --------------------------------------------------------------------------- >> >> Bruce Momjian wrote: >>> Tom Lane wrote: >>>> When Marc fixed the message-boundary pattern and regenerated the >>>> archives, many of the existing messages changed URLs because they >>>> got assigned slightly different numbers. I notice that the archive >>>> search engine hasn't yet tracked this change --- if you do a search >>>> and click on a link to a message, you'll arrive at a message close >>>> to the one you want but probably not quite it. >>>> >>>> Regenerating the archive indexes is presumably not hard, but there's >>>> a bigger problem: for awhile now many of us have been in the habit >>>> of citing old discussions by archive URLs. All those links are now >>>> broken too, and I can't think of any easy way to fix them. And then >>>> there's Google etc. >>>> >>>> I wonder if it'd be better to revert the regeneration of the archives, >>>> and only apply the new message-boundary pattern to future messages. >>> >>> Agreed. There have been no changes since we discussed this. >>> >>> The best proposal was to renumber the newly-found items to the end of >>> the numeric range for the pre-July 2006 archives, and to properly number >>> July 2006 and later archives. And this date range has to be enbedded in >>> the archive script so if it is ever run again, this behavior continues >>> to happen. >>> >>> The longer we take to fix this, the more likely that people are creating >>> URL's that refer to the existing pre-July 2006 numbering which should >>> change. It needs to be fixed quickly. >>> >>> And we can't just leave it alone because old archive emails have URLs >>> that point to now-incorrect numbers, and there is no good way to fix >>> that everywhere are emails are archived. >>> >>> -- >>> Bruce Momjian bruce@momjian.us >>> EnterpriseDB http://www.enterprisedb.com >>> >>> + If your life is a hard drive, Christ can be your backup. + >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Have you searched our list archives? >>> >>> http://archives.postgresql.org >> >> -- >> Bruce Momjian bruce@momjian.us >> EnterpriseDB http://www.enterprisedb.com >> >> + If your life is a hard drive, Christ can be your backup. + >> > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email . scrappy@hub.org MSN . scrappy@hub.org > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
Nice, thanks. --------------------------------------------------------------------------- Marc G. Fournier wrote: > > 'k, rsync is back up ... for a short period, part of the archives will > disappear, but a large portion of it is re-generated, and figured may as > well let the 'feed server' start downloading now :) > > On Wed, 9 Aug 2006, Marc G. Fournier wrote: > > > > > Just shutdown rsync while I rebuild the archives for the 'old/new' scheme, > > where old is pre-July 2006 ... > > > > will post once its been all rebuilt ... > > > > On Tue, 1 Aug 2006, Bruce Momjian wrote: > > > >> > >> Is anyone working on this? Marc? If not, who can make these > >> modifications to the archive numbering? > >> > >> --------------------------------------------------------------------------- > >> > >> Bruce Momjian wrote: > >>> Tom Lane wrote: > >>>> When Marc fixed the message-boundary pattern and regenerated the > >>>> archives, many of the existing messages changed URLs because they > >>>> got assigned slightly different numbers. I notice that the archive > >>>> search engine hasn't yet tracked this change --- if you do a search > >>>> and click on a link to a message, you'll arrive at a message close > >>>> to the one you want but probably not quite it. > >>>> > >>>> Regenerating the archive indexes is presumably not hard, but there's > >>>> a bigger problem: for awhile now many of us have been in the habit > >>>> of citing old discussions by archive URLs. All those links are now > >>>> broken too, and I can't think of any easy way to fix them. And then > >>>> there's Google etc. > >>>> > >>>> I wonder if it'd be better to revert the regeneration of the archives, > >>>> and only apply the new message-boundary pattern to future messages. > >>> > >>> Agreed. There have been no changes since we discussed this. > >>> > >>> The best proposal was to renumber the newly-found items to the end of > >>> the numeric range for the pre-July 2006 archives, and to properly number > >>> July 2006 and later archives. And this date range has to be enbedded in > >>> the archive script so if it is ever run again, this behavior continues > >>> to happen. > >>> > >>> The longer we take to fix this, the more likely that people are creating > >>> URL's that refer to the existing pre-July 2006 numbering which should > >>> change. It needs to be fixed quickly. > >>> > >>> And we can't just leave it alone because old archive emails have URLs > >>> that point to now-incorrect numbers, and there is no good way to fix > >>> that everywhere are emails are archived. > >>> > >>> -- > >>> Bruce Momjian bruce@momjian.us > >>> EnterpriseDB http://www.enterprisedb.com > >>> > >>> + If your life is a hard drive, Christ can be your backup. + > >>> > >>> ---------------------------(end of broadcast)--------------------------- > >>> TIP 4: Have you searched our list archives? > >>> > >>> http://archives.postgresql.org > >> > >> -- > >> Bruce Momjian bruce@momjian.us > >> EnterpriseDB http://www.enterprisedb.com > >> > >> + If your life is a hard drive, Christ can be your backup. + > >> > > > > ---- > > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > > Email . scrappy@hub.org MSN . scrappy@hub.org > > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 > > > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email . scrappy@hub.org MSN . scrappy@hub.org > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +