Thread: Archives too slow
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I am not happy with how slow the archives site is. It reflects bad on PostgreSQL and supports the stereotype of PG being "slow". For example, this page: http://archives.postgresql.org/pgsql-hackers-win32/2004-07/msg00003.php took 27 seconds to load completely. Some days, the times are better, but it always takes a notieeable delay for the page to load. It also seems to load in three sections: the top banner ad / search box, the title and header section, and then the rest of the page. The first two load today in about 10 seconds, the third about 16 seconds after that. Subsequent hits tend to be much faster (disk caching one presumes), so if you test, change the above URL to a random month, day, and message. I don;t think it is on my end: I'm on a very fast connection, and (for comparison) pgsql.ru loads very, very fast, and runs it searches of the entire archives faster than arhives.postgresql.org can display a single message. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200408281951 -----BEGIN PGP SIGNATURE----- iD8DBQFBMRs3vJuQZxSWSsgRAgTqAJ4rjb9HpR+glJS4y4fGYrTLnahIxwCgjrUv WteQbp0fd3QaZPbVb5wZM+4= =puf4 -----END PGP SIGNATURE-----
I can confirm the performance currently is terrible. --------------------------------------------------------------------------- Greg Sabino Mullane wrote: [ There is text before PGP section. ] > [ PGP not available, raw data follows ] > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I am not happy with how slow the archives site is. It reflects > bad on PostgreSQL and supports the stereotype of PG being "slow". > For example, this page: > > http://archives.postgresql.org/pgsql-hackers-win32/2004-07/msg00003.php > > took 27 seconds to load completely. Some days, the times are better, but > it always takes a notieeable delay for the page to load. It also seems > to load in three sections: the top banner ad / search box, the title > and header section, and then the rest of the page. The first two load > today in about 10 seconds, the third about 16 seconds after that. > Subsequent hits tend to be much faster (disk caching one presumes), so > if you test, change the above URL to a random month, day, and message. > I don;t think it is on my end: I'm on a very fast connection, and (for > comparison) pgsql.ru loads very, very fast, and runs it searches of > the entire archives faster than arhives.postgresql.org can display > a single message. > > - -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200408281951 > -----BEGIN PGP SIGNATURE----- > > iD8DBQFBMRs3vJuQZxSWSsgRAgTqAJ4rjb9HpR+glJS4y4fGYrTLnahIxwCgjrUv > WteQbp0fd3QaZPbVb5wZM+4= > =puf4 > -----END PGP SIGNATURE----- > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > [ Decrypting message... End of raw data. ] -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I'm going to look at mirroring it onto the same server that runs ftp.postgresql.org ... archives is the worst site that we run, since its all a bunch of little flat files, so when it gets indexed by the various search engines, disk I/O goes through the roof ... we had googlebot index it once where we had to literally shut down the server for a few minutes while we waited for load to drop ... Will work on this this weekend On Sat, 28 Aug 2004, Greg Sabino Mullane wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I am not happy with how slow the archives site is. It reflects > bad on PostgreSQL and supports the stereotype of PG being "slow". > For example, this page: > > http://archives.postgresql.org/pgsql-hackers-win32/2004-07/msg00003.php > > took 27 seconds to load completely. Some days, the times are better, but > it always takes a notieeable delay for the page to load. It also seems > to load in three sections: the top banner ad / search box, the title > and header section, and then the rest of the page. The first two load > today in about 10 seconds, the third about 16 seconds after that. > Subsequent hits tend to be much faster (disk caching one presumes), so > if you test, change the above URL to a random month, day, and message. > I don;t think it is on my end: I'm on a very fast connection, and (for > comparison) pgsql.ru loads very, very fast, and runs it searches of > the entire archives faster than arhives.postgresql.org can display > a single message. > > - -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200408281951 > -----BEGIN PGP SIGNATURE----- > > iD8DBQFBMRs3vJuQZxSWSsgRAgTqAJ4rjb9HpR+glJS4y4fGYrTLnahIxwCgjrUv > WteQbp0fd3QaZPbVb5wZM+4= > =puf4 > -----END PGP SIGNATURE----- > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I'm going to look at mirroring it onto the same server that runs > ftp.postgresql.org ... archives is the worst site that we run, since its > all a bunch of little flat files, so when it gets indexed by the various > search engines, disk I/O goes through the roof ... we had googlebot index > it once where we had to literally shut down the server for a few minutes > while we waited for load to drop ... On googlebot's page[1], they claim they never go more than once every few seconds. Surely this should not be a problem as long as these are static pages. They also have an email address on that page where you can request that Google go a little gentler on your site. Also, if the pages are static (or static plus simple cgis), have you considered using boa? [2] I use it for a large site that has a lot of static pages and it does great - it's a small, clean, minimal web server written in C. A final option is an accelerator cache [3]. Not sure if PG is using one yet, but it probably should be. [1] http://www.google.com/bot.html [2] http://www.boa.org/ [3] http://www.squid-cache.org/Doc/FAQ/FAQ-20.html#what-is-httpd-accelerator - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200408290732 -----BEGIN PGP SIGNATURE----- iD8DBQFBMcEGvJuQZxSWSsgRAhK9AKCLQ4CIW2JQQDg+BFI12DyhaFiFVgCg1DW5 VkM2ayTI9OK6M1kIscAwgxs= =1uGd -----END PGP SIGNATURE-----
I always didn't understand why not have static pages for mailing list archive if they already generated. Another thing to consider is using lightweight frontend with ability to cache pages generated by heavy backend. www.pgsql.ru uses 3-servers setup - frontend (apache+mod_accel), backend ( apache + modperl ) and thttpd (very light and fast) for serving binary data (images, for example). Only frontend interacts with user, so slow clients (bad connectivity) don't bother heavy backend and, consequently, db server. Oleg On Sun, 29 Aug 2004, Greg Sabino Mullane wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > I'm going to look at mirroring it onto the same server that runs > > ftp.postgresql.org ... archives is the worst site that we run, since its > > all a bunch of little flat files, so when it gets indexed by the various > > search engines, disk I/O goes through the roof ... we had googlebot index > > it once where we had to literally shut down the server for a few minutes > > while we waited for load to drop ... > > On googlebot's page[1], they claim they never go more than once every few > seconds. Surely this should not be a problem as long as these are static > pages. They also have an email address on that page where you can request > that Google go a little gentler on your site. > > Also, if the pages are static (or static plus simple cgis), have you considered > using boa? [2] I use it for a large site that has a lot of static pages > and it does great - it's a small, clean, minimal web server written in C. > > A final option is an accelerator cache [3]. Not sure if PG is using one > yet, but it probably should be. > > [1] http://www.google.com/bot.html > > [2] http://www.boa.org/ > > [3] http://www.squid-cache.org/Doc/FAQ/FAQ-20.html#what-is-httpd-accelerator > > - -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200408290732 > -----BEGIN PGP SIGNATURE----- > > iD8DBQFBMcEGvJuQZxSWSsgRAhK9AKCLQ4CIW2JQQDg+BFI12DyhaFiFVgCg1DW5 > VkM2ayTI9OK6M1kIscAwgxs= > =1uGd > -----END PGP SIGNATURE----- > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
The reason why they are dynamic was so that we could 'strip' the garbage out for the search engines, so that they didn't search all of the extra verbage, only the message itself ... it also meant that if we changed the format (ie. added a new list to the left menu), we didn't have to regenerate all the pages ... I'm open to moving them back to static myself, if nobody objects *shrug* In fact, if we get rid of the php code altogether, we could go with what one other suggested, and use a 'light weight' web server instead of apache ... you mention thttpd below, someone else mentioned one called Boa(?) ... never having used either, I'm flexible either way ... Does anyone care if I get rid of the .php code? Before I do that, assuming no ... does anyone know a way of 'hiding' sections of HTML code from search engines? Right now, we're doing that with the PHP, and I nkow there is/was a <!-- --> way of doing it, but someone (Oleg?) mentioned that it isn't very consistent in being honored by search engines ... ? On Sun, 29 Aug 2004, Oleg Bartunov wrote: > I always didn't understand why not have static pages for mailing list > archive if they already generated. > > Another thing to consider is using > lightweight frontend with ability to cache pages generated by heavy backend. > www.pgsql.ru uses 3-servers setup - frontend (apache+mod_accel), > backend ( apache + modperl ) and thttpd (very light and fast) for serving > binary data (images, for example). Only frontend interacts with user, so slow > clients (bad connectivity) don't bother heavy backend and, consequently, > db server. > > > Oleg > On Sun, 29 Aug 2004, Greg Sabino Mullane wrote: > >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> >>> I'm going to look at mirroring it onto the same server that runs >>> ftp.postgresql.org ... archives is the worst site that we run, since its >>> all a bunch of little flat files, so when it gets indexed by the various >>> search engines, disk I/O goes through the roof ... we had googlebot index >>> it once where we had to literally shut down the server for a few minutes >>> while we waited for load to drop ... >> >> On googlebot's page[1], they claim they never go more than once every few >> seconds. Surely this should not be a problem as long as these are static >> pages. They also have an email address on that page where you can request >> that Google go a little gentler on your site. >> >> Also, if the pages are static (or static plus simple cgis), have you considered >> using boa? [2] I use it for a large site that has a lot of static pages >> and it does great - it's a small, clean, minimal web server written in C. >> >> A final option is an accelerator cache [3]. Not sure if PG is using one >> yet, but it probably should be. >> >> [1] http://www.google.com/bot.html >> >> [2] http://www.boa.org/ >> >> [3] http://www.squid-cache.org/Doc/FAQ/FAQ-20.html#what-is-httpd-accelerator >> >> - -- >> Greg Sabino Mullane greg@turnstep.com >> PGP Key: 0x14964AC8 200408290732 >> -----BEGIN PGP SIGNATURE----- >> >> iD8DBQFBMcEGvJuQZxSWSsgRAhK9AKCLQ4CIW2JQQDg+BFI12DyhaFiFVgCg1DW5 >> VkM2ayTI9OK6M1kIscAwgxs= >> =1uGd >> -----END PGP SIGNATURE----- >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 6: Have you searched our list archives? >> >> http://archives.postgresql.org >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Sun, 29 Aug 2004, Marc G. Fournier wrote: > > The reason why they are dynamic was so that we could 'strip' the garbage > out for the search engines, so that they didn't search all of the extra > verbage, only the message itself ... it also meant that if we changed the > format (ie. added a new list to the left menu), we didn't have to > regenerate all the pages ... You always could use SSI for that. > > I'm open to moving them back to static myself, if nobody objects *shrug* > > In fact, if we get rid of the php code altogether, we could go with what > one other suggested, and use a 'light weight' web server instead of apache > ... you mention thttpd below, someone else mentioned one called Boa(?) ... > never having used either, I'm flexible either way ... > thttpd is nice for simple binary data, but currently we evaluate lighttpd http://jan.kneschke.de/projects/lighttpd/, which looks realy powerful. It's faster than thttpd having much more features. > Does anyone care if I get rid of the .php code? Before I do that, > assuming no ... does anyone know a way of 'hiding' sections of HTML code > from search engines? Right now, we're doing that with the PHP, and I nkow > there is/was a <!-- --> way of doing it, but someone (Oleg?) mentioned > that it isn't very consistent in being honored by search engines ... ? > <!-- --> doesn't help, because comment tags also hide content from browser :) In principle, smart search engines should understand firm elements like navigation bar and penalize their weight. We do that, at least. > > > > On Sun, 29 Aug 2004, Oleg Bartunov wrote: > > > I always didn't understand why not have static pages for mailing list > > archive if they already generated. > > > > Another thing to consider is using > > lightweight frontend with ability to cache pages generated by heavy backend. > > www.pgsql.ru uses 3-servers setup - frontend (apache+mod_accel), > > backend ( apache + modperl ) and thttpd (very light and fast) for serving > > binary data (images, for example). Only frontend interacts with user, so slow > > clients (bad connectivity) don't bother heavy backend and, consequently, > > db server. > > > > > > Oleg > > On Sun, 29 Aug 2004, Greg Sabino Mullane wrote: > > > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> > >>> I'm going to look at mirroring it onto the same server that runs > >>> ftp.postgresql.org ... archives is the worst site that we run, since its > >>> all a bunch of little flat files, so when it gets indexed by the various > >>> search engines, disk I/O goes through the roof ... we had googlebot index > >>> it once where we had to literally shut down the server for a few minutes > >>> while we waited for load to drop ... > >> > >> On googlebot's page[1], they claim they never go more than once every few > >> seconds. Surely this should not be a problem as long as these are static > >> pages. They also have an email address on that page where you can request > >> that Google go a little gentler on your site. > >> > >> Also, if the pages are static (or static plus simple cgis), have you considered > >> using boa? [2] I use it for a large site that has a lot of static pages > >> and it does great - it's a small, clean, minimal web server written in C. > >> > >> A final option is an accelerator cache [3]. Not sure if PG is using one > >> yet, but it probably should be. > >> > >> [1] http://www.google.com/bot.html > >> > >> [2] http://www.boa.org/ > >> > >> [3] http://www.squid-cache.org/Doc/FAQ/FAQ-20.html#what-is-httpd-accelerator > >> > >> - -- > >> Greg Sabino Mullane greg@turnstep.com > >> PGP Key: 0x14964AC8 200408290732 > >> -----BEGIN PGP SIGNATURE----- > >> > >> iD8DBQFBMcEGvJuQZxSWSsgRAhK9AKCLQ4CIW2JQQDg+BFI12DyhaFiFVgCg1DW5 > >> VkM2ayTI9OK6M1kIscAwgxs= > >> =1uGd > >> -----END PGP SIGNATURE----- > >> > >> > >> > >> ---------------------------(end of broadcast)--------------------------- > >> TIP 6: Have you searched our list archives? > >> > >> http://archives.postgresql.org > >> > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
<!-- --> doesn't help, because comment tags also hide content from browser :)
In principle, smart search engines should understand firm elements
like navigation bar and penalize their weight. We do that, at least.
I think you misunderstood this.
In principle, smart search engines should understand firm elements
like navigation bar and penalize their weight. We do that, at least.
I think you misunderstood this.
The comment tags do not hide things from the browser, think of this:
<html>
...
<!--noindex-->
stuff we want hidden
<!--/noindex-->
stuff we don't want hidden
...
</html>
... John
Folks: Archives are more than slow now. They're down. And the rest of www.postgresql.org is molasses-slow as well. I have another suggestion: Command Prompt has offered to host the archives; they have excess capacity and believe that they can make them run faster. Why not let them? -- Josh Berkus Aglio Database Solutions San Francisco
On Mon, 30 Aug 2004, Josh Berkus wrote: > Folks: > > Archives are more than slow now. They're down. And the rest of > www.postgresql.org is molasses-slow as well. > > I have another suggestion: Command Prompt has offered to host the archives; > they have excess capacity and believe that they can make them run faster. > Why not let them? 'k, I must have missed the offer, but sure ... I can work with Joshua to get the scripts and all put into place ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Marc G. Fournier wrote: > On Mon, 30 Aug 2004, Josh Berkus wrote: > > > Folks: > > > > Archives are more than slow now. They're down. And the rest of > > www.postgresql.org is molasses-slow as well. > > > > I have another suggestion: Command Prompt has offered to host the archives; > > they have excess capacity and believe that they can make them run faster. > > Why not let them? > > 'k, I must have missed the offer, but sure ... I can work with Joshua to > get the scripts and all put into place ... Agreed. I got the offer from Joshua Drake via private email but was not sure how to address the issue. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 31 Aug 2004, John Hansen wrote: > <!-- --> doesn't help, because comment tags also hide content from browser :) > In principle, smart search engines should understand firm elements > like navigation bar and penalize their weight. We do that, at least. > > I think you misunderstood this. > The comment tags do not hide things from the browser, think of this: > > <html> > ... > <!--noindex--> > stuff we want hidden > <!--/noindex--> > stuff we don't want hidden > ... > </html> Aha, got that. I don't know such syntax and not sure if it's well supported. I know about <NOINDEX>.....</NOINDEX> by atomz.com, though > > ... John > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> -----Original Message----- > From: pgsql-www-owner@postgresql.org > [mailto:pgsql-www-owner@postgresql.org] On Behalf Of Oleg Bartunov > Sent: 30 August 2004 20:41 > To: John Hansen > Cc: Marc G. Fournier; Greg Sabino Mullane; pgsql-www@postgresql.org > Subject: Re: [pgsql-www] Archives too slow > > Aha, got that. I don't know such syntax and not sure if it's > well supported. > I know about <NOINDEX>.....</NOINDEX> by atomz.com, though That's the problem I think - there is no standard, just lot's of variations :-( /D
On Mon, 30 Aug 2004, Bruce Momjian wrote: > Marc G. Fournier wrote: >> On Mon, 30 Aug 2004, Josh Berkus wrote: >> >>> Folks: >>> >>> Archives are more than slow now. They're down. And the rest of >>> www.postgresql.org is molasses-slow as well. >>> >>> I have another suggestion: Command Prompt has offered to host the archives; >>> they have excess capacity and believe that they can make them run faster. >>> Why not let them? >> >> 'k, I must have missed the offer, but sure ... I can work with Joshua to >> get the scripts and all put into place ... > > Agreed. I got the offer from Joshua Drake via private email but was not > sure how to address the issue. Joshua and I are talking about it ... :) ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Mon, 30 Aug 2004, Oleg Bartunov wrote: > On Tue, 31 Aug 2004, John Hansen wrote: > >> <!-- --> doesn't help, because comment tags also hide content from browser :) >> In principle, smart search engines should understand firm elements >> like navigation bar and penalize their weight. We do that, at least. >> >> I think you misunderstood this. >> The comment tags do not hide things from the browser, think of this: >> >> <html> >> ... >> <!--noindex--> >> stuff we want hidden >> <!--/noindex--> >> stuff we don't want hidden >> ... >> </html> > > Aha, got that. I don't know such syntax and not sure if it's well supported. > I know about <NOINDEX>.....</NOINDEX> by atomz.com, though Does Google have an equivalent that they honor? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
> > > > Aha, got that. I don't know such syntax and not sure if > it's well supported. > > I know about <NOINDEX>.....</NOINDEX> by atomz.com, though > > Does Google have an equivalent that they honor? > I've asked google, and their reply was consistent with the information available on their website. Google does not allow cloaking, and any attempt to do so, will result in a permanent ban. They define cloaking as presenting different content to search engines as to what regular visitors see. ... John
On Wed, 1 Sep 2004, John Hansen wrote: >>> >>> Aha, got that. I don't know such syntax and not sure if >> it's well supported. >>> I know about <NOINDEX>.....</NOINDEX> by atomz.com, though >> >> Does Google have an equivalent that they honor? >> > > I've asked google, and their reply was consistent with the information > available on their website. Google does not allow cloaking, and any > attempt to do so, will result in a permanent ban. > > They define cloaking as presenting different content to search engines > as to what regular visitors see. Guess they don't check often then since we've been doing it for months now :) ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
> > > > They define cloaking as presenting different content to > search engines > > as to what regular visitors see. > > Guess they don't check often then since we've been doing it > for months now > :) > Yea, I think tho, that it's meant as a way to discourage so-called SEO's from creating pages whose sole purpose is to skewthe ranking. One could for example just join all these offers of link exchanges one gets, and just not show them to the search engines.That would improve your own ranking in a positive way, hehe. ... John Btw, marc, could you tar.bz2 up the archives for me,... I'm moving machines and will need to recrawl all of it.