Thread: Postgres mail list traffic over time
I got interested by Bruce's plot of PG email traffic here http://momjian.us/main/img/pgincoming.gif and decided to try to extend it into the past. The data I have available is just my own incoming mail log, but being a pack-rat by nature I have that back to April 1998. Attached is a graph of Postgres list messages per month since then. I should note that this covers only the mail lists I'm subscribed to, which has been most of them since about 1999; but the first few numbers in this chart are undercounts by comparison. Also, the very last dot is month-to-date for November and so is an underestimate. So, to a first approximation, the PG list traffic has been constant since 2000. Not the result I expected. regards, tom lane
Attachment
Tom Lane wrote: > I got interested by Bruce's plot of PG email traffic here > http://momjian.us/main/img/pgincoming.gif > and decided to try to extend it into the past. The data I have > available is just my own incoming mail log, but being a pack-rat by > nature I have that back to April 1998. Attached is a graph of Postgres > list messages per month since then. I should note that this covers only > the mail lists I'm subscribed to, which has been most of them since > about 1999; but the first few numbers in this chart are undercounts by > comparison. Also, the very last dot is month-to-date for November and > so is an underestimate. > > So, to a first approximation, the PG list traffic has been constant > since 2000. Not the result I expected. Yes, I know Magnus did a graph for the PG-EU conference and it was also flat; perhaps he can post it here. His chart was pulled from the Postgres archives, so it is even more accurate than our graphs. I also was confused by its flatness. I am finding the email traffic almost impossible to continue tracking, so something different is happening, but it seems it is not volume-related. I am going to post another blog tomorrow with more thoughts. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> So, to a first approximation, the PG list traffic has been constant >> since 2000. Not the result I expected. > I also was confused by its flatness. I am finding the email traffic > almost impossible to continue tracking, so something different is > happening, but it seems it is not volume-related. Yes, my perception also is that it's getting harder and harder to keep up with the list traffic; so something is happening that a simple volume count doesn't capture. Does anyone have the data to break it down per mailing list? That might yield some more insight. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> So, to a first approximation, the PG list traffic has been constant > >> since 2000. Not the result I expected. > > > I also was confused by its flatness. I am finding the email traffic > > almost impossible to continue tracking, so something different is > > happening, but it seems it is not volume-related. > > Yes, my perception also is that it's getting harder and harder to keep > up with the list traffic; so something is happening that a simple > volume count doesn't capture. Agreed. I am struggling to put into words some of my angst, but I am concerned I will not be able to offer the same guarantees I have done in previous releases that every bug has been either fixed or added to the TODO list, and every submitted patch has been either applied or rejected. There, I said it. :-( -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 2008-11-20 at 22:36 -0500, Tom Lane wrote: > I got interested by Bruce's plot of PG email traffic here > http://momjian.us/main/img/pgincoming.gifto > and decided to try to extend it into the past. The data I have > available is just my own incoming mail log, but being a pack-rat by > nature I have that back to April 1998. Attached is a graph of Postgres > list messages per month since then. I should note that this covers only > the mail lists I'm subscribed to, which has been most of them since > about 1999; but the first few numbers in this chart are undercounts by > comparison. Also, the very last dot is month-to-date for November and > so is an underestimate. > > So, to a first approximation, the PG list traffic has been constant > since 2000. Not the result I expected. Am I reading your graph wrong? I show a sharp increase right before 2006 and then a small drop off but a constant after that? I know that my email (I am pretty sure I am subscribed to at least as many lists as you) has been on a steady incline, especially through -general and -hackers. Joshua D. Drake > > regards, tom lane > --
"Joshua D. Drake" <jd@commandprompt.com> writes: > I know that my email (I am pretty sure I am subscribed to at least as > many lists as you) has been on a steady incline, especially through > -general and -hackers. I would have said the same, which is why I find it noteworthy that my mail logs don't seem to support that impression. Have you got actual log data on the point? regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> >> I am finding the email traffic >> almost impossible to continue tracking, so something different is >> happening, but it seems it is not volume-related. > > Yes, my perception also is that it's getting harder and harder to keep > up with the list traffic; so something is happening that a simple > volume count doesn't capture. Perhaps it's just subjective: we're all getting older. Soon, these pesky whippersnappers will want to twitter their PG questions to this list over YouTube.
On Thu, 2008-11-20 at 23:46 -0500, Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > > I know that my email (I am pretty sure I am subscribed to at least as > > many lists as you) has been on a steady incline, especially through > > -general and -hackers. > > I would have said the same, which is why I find it noteworthy that > my mail logs don't seem to support that impression. Have you got > actual log data on the point? I purge my postgresql logs except for some specific ones (like PGFG). however, I have the entire archives.postgresql.org. pgsql-hackers (since inception, 1997), first line date, second line number of messages. 1997-01 939 1997-02 300 1997-03 534 1997-04 865 1997-05 484 1997-06 601 1997-07 392 1997-08 399 1997-09 579 1997-10 594 1997-11 381 1997-12 351 1998-01 870 1998-02 1326 1998-03 1121 1998-04 707 1998-05 632 1998-06 493 1998-07 490 1998-08 867 1998-09 675 1998-10 1221 1998-11 609 1998-12 600 1999-01 769 1999-02 699 1999-03 1008 1999-04 217 1999-05 1155 1999-06 1241 1999-07 1052 1999-08 705 1999-09 945 1999-10 962 1999-11 929 1999-12 1065 2000-01 1688 2000-02 1460 2000-03 288 2000-04 187 2000-05 1686 2000-06 1283 2000-07 1477 2000-08 890 2000-09 642 2000-10 1320 2000-11 1419 2000-12 1234 2001-01 1469 2001-02 1178 2001-03 1708 2001-04 1181 2001-05 1478 2001-06 1151 2001-07 955 2001-08 1220 2001-09 921 2001-10 1165 2001-11 1318 2001-12 970 2002-01 1411 2002-02 1233 2002-03 1246 2002-04 1565 2002-05 1169 2002-06 1045 2002-07 1339 2002-08 2308 2002-09 1843 2002-10 1469 2002-11 1257 2002-12 1172 2003-01 1356 2003-02 1324 2003-03 1262 2003-04 1033 2003-05 812 2003-06 1316 2003-07 1068 2003-08 1373 2003-09 1695 2003-10 1631 2003-11 1643 2003-12 836 2004-01 878 2004-02 1017 2004-03 1352 2004-04 1177 2004-05 1495 2004-06 1025 2004-07 1430 2004-08 1620 2004-09 953 2004-10 1084 2004-11 1226 2004-12 963 2005-01 1116 2005-02 987 2005-03 1086 2005-04 1022 2005-05 1626 2005-06 1598 2005-07 1162 2005-08 1217 2005-09 1484 2005-10 1442 2005-11 1587 2005-12 1278 2006-01 1050 2006-02 1282 2006-03 1343 2006-04 1158 2006-05 1386 2006-06 1645 2006-07 1660 2006-08 2060 2006-09 2397 2006-10 1583 2006-11 1031 2006-12 1437 2007-01 1663 2007-02 1953 2007-03 1871 2007-04 1285 2007-05 1201 2007-06 1140 2007-07 1019 2007-08 1244 2007-09 1230 2007-10 1575 2007-11 1380 2007-12 1000 2008-01 1236 2008-02 1324 2008-03 1308 2008-04 1928 2008-05 1128 2008-06 1161 2008-07 1512 2008-08 1391 2008-09 1910 2008-10 1715 2008-11 1431 > > regards, tom lane > --
On Thu, 2008-11-20 at 21:19 -0800, Joshua D. Drake wrote: > On Thu, 2008-11-20 at 23:46 -0500, Tom Lane wrote: > > "Joshua D. Drake" <jd@commandprompt.com> writes: > > > I know that my email (I am pretty sure I am subscribed to at least as > > > many lists as you) has been on a steady incline, especially through > > > -general and -hackers. > > > > I would have said the same, which is why I find it noteworthy that > > my mail logs don't seem to support that impression. Have you got > > actual log data on the point? > > I purge my postgresql logs except for some specific ones (like PGFG). > however, I have the entire archives.postgresql.org. > > pgsql-hackers (since inception, 1997), first line date, second line > number of messages. > pgsql-general 1998-05 139 1998-06 337 1998-07 438 1998-08 226 1998-09 187 1998-10 283 1998-11 269 1998-12 242 1999-01 302 1999-02 356 1999-03 385 1999-04 332 1999-05 404 1999-06 470 1999-07 411 1999-08 496 1999-09 385 1999-10 606 1999-11 512 1999-12 631 2000-01 667 2000-02 477 2000-03 219 2000-04 705 2000-05 843 2000-06 803 2000-07 1180 2000-08 861 2000-09 999 2000-10 1337 2000-11 1084 2000-12 1002 2001-01 1700 2001-02 1623 2001-03 1656 2001-04 1568 2001-05 1710 2001-06 1651 2001-07 1342 2001-08 1303 2001-09 1195 2001-10 1223 2001-11 1124 2001-12 901 2002-01 1216 2002-02 1419 2002-03 1388 2002-04 1287 2002-05 1192 2002-06 1366 2002-07 1893 2002-08 1261 2002-09 1438 2002-10 1444 2002-11 1517 2002-12 1225 2003-01 1657 2003-02 1760 2003-03 1597 2003-04 1611 2003-05 1295 2003-06 1951 2003-07 1586 2003-08 1836 2003-09 1880 2003-10 1604 2003-11 1768 2003-12 1664 2004-01 1708 2004-02 1355 2004-03 1215 2004-04 1210 2004-05 965 2004-06 1236 2004-07 973 2004-08 1677 2004-09 1337 2004-10 1579 2004-11 1557 2004-12 1358 2005-01 1877 2005-02 1535 2005-03 1622 2005-04 1460 2005-05 1379 2005-06 1413 2005-07 1332 2005-08 1632 2005-09 1232 2005-10 1945 2005-11 1438 2005-12 1402 2006-01 1743 2006-02 1218 2006-03 1602 2006-04 1372 2006-05 1604 2006-06 1268 2006-07 1170 2006-08 1501 2006-09 1289 2006-10 1588 2006-11 1866 2006-12 1619 2007-01 1953 2007-02 1720 2007-03 1724 2007-04 1304 2007-05 1650 2007-06 1796 2007-07 1257 2007-08 2097 2007-09 1385 2007-10 1722 2007-11 1770 2007-12 1487 2008-01 1621 2008-02 1527 2008-03 1666 2008-04 1446 2008-05 1144 2008-06 1055 2008-07 1251 2008-08 1188 2008-09 1252 2008-10 1485 2008-11 1045 --
On Fri, 2008-11-21 at 00:06 -0500, brian wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > >> > >> I am finding the email traffic > >> almost impossible to continue tracking, so something different is > >> happening, but it seems it is not volume-related. > > > > Yes, my perception also is that it's getting harder and harder to keep > > up with the list traffic; so something is happening that a simple > > volume count doesn't capture. > > Perhaps it's just subjective: we're all getting older. ouch > Soon, these pesky whippersnappers will want to twitter their PG > questions to this list over YouTube. > I assume you don't realize that is already happening :P Joshua D. Drake --
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> So, to a first approximation, the PG list traffic has been constant
> >> since 2000. Not the result I expected.
>
> > I also was confused by its flatness. I am finding the email traffic
> > almost impossible to continue tracking, so something different is
> > happening, but it seems it is not volume-related.
>
> Yes, my perception also is that it's getting harder and harder to keep
> up with the list traffic; so something is happening that a simple
> volume count doesn't capture.
The numbers posted show a slow but steady increase, but I am wondering if there's more distinct subjects ?
Can we get a count on distinct threads per month (obviously some slop as some threads last for a while).
Greg Williamson
Senior DBA
DigitalGlobe
Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
(My corporate masters made me say this.)
Magnus Hagander wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> Tom Lane wrote: >>>> So, to a first approximation, the PG list traffic has been constant >>>> since 2000. Not the result I expected. >>> I also was confused by its flatness. I am finding the email traffic >>> almost impossible to continue tracking, so something different is >>> happening, but it seems it is not volume-related. >> Yes, my perception also is that it's getting harder and harder to keep >> up with the list traffic; so something is happening that a simple >> volume count doesn't capture. >> >> Does anyone have the data to break it down per mailing list? That might >> yield some more insight. > > Here's a graph of the more popular mailinglists (I couldn't include all > - the graph was completely unreadable) as seen in the archives search db. Pfft, -general didn't like that file even though it was only 60k or so. Here's a link to an uploaded version: http://www.smugmug.com/photos/421507651_8pe6C-O.png //Magnus
Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > > I know that my email (I am pretty sure I am subscribed to at least as > > many lists as you) has been on a steady incline, especially through > > -general and -hackers. > > I would have said the same, which is why I find it noteworthy that > my mail logs don't seem to support that impression. Have you got > actual log data on the point? Markmail shows some graphs. The one on the "main page" gives the traffic for all the lists: http://pgsql.markmail.org/ If you search for "pgsql-general" you get a graph for that list: http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-general Same for -hackers: http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-hackers -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Nov 20, 2008 at 10:59:31PM -0500, Tom Lane wrote: > Yes, my perception also is that it's getting harder and harder to keep > up with the list traffic; so something is happening that a simple > volume count doesn't capture. > > Does anyone have the data to break it down per mailing list? That might > yield some more insight. The markmail archives generate pretty graphs and they seem to have a good coverage from quite a few of the lists. e.g.: http://markmail.org/search/?q=list:org.postgresql.pgsql-general http://markmail.org/search/?q=list:org.postgresql.pgsql-hackers the following has links to more: http://markmail.org/search/?q=list:org.postgresql be interesting to see how their servers take the hammering! Sam
On Fri, 2008-11-21 at 10:43 -0300, Alvaro Herrera wrote: > Tom Lane wrote: > Markmail shows some graphs. The one on the "main page" gives the > traffic for all the lists: > http://pgsql.markmail.org/ > > If you search for "pgsql-general" you get a graph for that list: > http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-general > > Same for -hackers: > http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-hackers > The top "Who sent it" list is very telling. It says, "Paging Tom Lane... take a vacation!" :) Joshua D. Drake --
Sam Mason wrote: > the following has links to more: > > http://markmail.org/search/?q=list:org.postgresql Wow, the spanish list is the 3rd in traffic after hackers and general! -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Bruce Momjian wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> I also was confused by its flatness. I am finding the email traffic >>> almost impossible to continue tracking, so something different is >>> happening, but it seems it is not volume-related. >> Yes, my perception also is that it's getting harder and harder to keep >> up with the list traffic; so something is happening that a simple >> volume count doesn't capture. If measured in "bytes of the gzipped mbox" it looks like there's a *huge* increase of volume on Hackers in the past 3 months - well over twice the historical levels; and maybe 4X 2002-2006. Graphs of this metric can be seen here: http://0ape.com/postgres_mailinglist_size/ In some ways I think compressed mbox sizes are a more fair way of measuring the bandwidth for these lists since it (correctly) counts a large gzipped path as requiring more mental effort than people top-posting brief messages on top of old threads. (Data from commands like HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz | grep Content-Length )
On Thursday 20 November 2008 7:59:31 pm Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> So, to a first approximation, the PG list traffic has been constant > >> since 2000. Not the result I expected. > > > > I also was confused by its flatness. I am finding the email traffic > > almost impossible to continue tracking, so something different is > > happening, but it seems it is not volume-related. > > Yes, my perception also is that it's getting harder and harder to keep > up with the list traffic; so something is happening that a simple > volume count doesn't capture. I am still relatively new to Postgres, but my impression is that the questions have gotten harder/more in depth. Fewer, How do you pronounce Postgres? and more, Explain the various isolation levels for transactions and how does that affect my particular situation? > > Does anyone have the data to break it down per mailing list? That might > yield some more insight. > > regards, tom lane -- Adrian Klaver aklaver@comcast.net
On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: > Bruce Momjian wrote: > > Tom Lane wrote: > >> Bruce Momjian <bruce@momjian.us> writes: > >>> I also was confused by its flatness. I am finding the email traffic > >>> almost impossible to continue tracking, so something different is > >>> happening, but it seems it is not volume-related. > >> Yes, my perception also is that it's getting harder and harder to keep > >> up with the list traffic; so something is happening that a simple > >> volume count doesn't capture. > > If measured in "bytes of the gzipped mbox" it looks like there's a > *huge* increase of volume on Hackers in the past 3 months - well > over twice the historical levels; and maybe 4X 2002-2006. Its because we eliminated the -patches mailing list. Joshua D. Drake --
"Joshua D. Drake" <jd@commandprompt.com> writes: > On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: >> If measured in "bytes of the gzipped mbox" it looks like there's a >> *huge* increase of volume on Hackers in the past 3 months - well >> over twice the historical levels; and maybe 4X 2002-2006. > Its because we eliminated the -patches mailing list. Yeah, I think this is most probably explained by repeat postings of successive versions of large patches. Still, Ron might be on to something. I had not considered message lengths in my previous numbers ... regards, tom lane
Joshua D. Drake wrote: > On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: >> Bruce Momjian wrote: >>> Tom Lane wrote: >>>> ... harder to keep >>>> up with the list traffic; so something is happening that a simple >>>> volume count doesn't capture. >> If measured in "bytes of the gzipped mbox" it ... > > Its because we eliminated the -patches mailing list. That's part of it. I've added -patches to the graph at http://0ape.com/postgres_mailinglist_size/ as well as a graph of hackers+patches combined; and it still looks like hackers+patches is quite high in the past 3 months. With hackers+patches it looks like 2002-08 was the biggest month; but the past 3 months still look roughly twice late 2007's numbers.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Joshua D. Drake wrote: >> Its because we eliminated the -patches mailing list. > That's part of it. I've added -patches to the graph at > http://0ape.com/postgres_mailinglist_size/ as well as > a graph of hackers+patches combined; and it still looks > like hackers+patches is quite high in the past 3 months. One of the reasons we got rid of -patches was the frequency of cross-posting to both -hackers and -patches. Are you double-counting cross-posted messages? regards, tom lane
Adrian Klaver wrote: >> Yes, my perception also is that it's getting harder and harder to keep >> up with the list traffic; so something is happening that a simple >> volume count doesn't capture. > > I am still relatively new to Postgres, but my impression is that the questions > have gotten harder/more in depth. Fewer, How do you pronounce Postgres? and > more, Explain the various isolation levels for transactions and how does that > affect my particular situation? This is definitely the case. Whether it's because the documentation is better, or we're getting a more sophisticated user the questions are certainly more involved. Some of the EXPLAINs on the performance list are practically impossible to read unless you've got the time to cut+paste and fix line-endings. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Some of the EXPLAINs on the performance list are practically impossible > to read unless you've got the time to cut+paste and fix line-endings. Maybe we should start recommending people to post those via http://explain-analyze.info/ -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > >> Its because we eliminated the -patches mailing list. >> > > Yeah, I think this is most probably explained by repeat postings > of successive versions of large patches. Still, Ron might be on to > something. I had not considered message lengths in my previous > numbers ... Message size, but also the sophistication of new patches makes them harder to keep up with and that there are more people writing code which takes time to review and keep up with. These probably explain why you and Bruce feel the increase more than the rest of us.
Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >> Joshua D. Drake wrote: >>> Its because we eliminated the -patches mailing list. > >> That's part of it. I've added -patches to the graph at >> http://0ape.com/postgres_mailinglist_size/ as well as >> a graph of hackers+patches combined; and it still looks >> like hackers+patches is quite high in the past 3 months. > > One of the reasons we got rid of -patches was the frequency of > cross-posting to both -hackers and -patches. Are you double-counting > cross-posted messages? For the combined graph I just summed the output of: HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz | grep Content-Length HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-patches.2008-09.gz | grep Content-Length I didn't look to see if the downloadable mboxes had duplicate messages. If people want the raw data, here's the script I used to get it. ============================================================================ #!/usr/bin/env ruby %W{rubygems hpricot open-uri gruff}.each{|l| require l} def chart(url) h = Hpricot.parse(open(url){|f| f.read}) mboxes = (h / "//a").map{|x| x.attributes['href']}. select{|x| x=~/\.gz/} mboxes.sort.each{|x| y = `HEAD #{url}/#{x}` =~ /Content-Length: (\d+)/ && $1 puts "#{x} #{y}" } end patches = chart('http://archives.postgresql.org/pgsql-patches') general = chart('http://archives.postgresql.org/pgsql-general') hackers = chart('http://archives.postgresql.org/pgsql-hackers') ============================================================================ Perhaps some of the extra burden on the experienced hackers is a larger volume of newer people trying to contribute that are needing more handholding (and thus more re-posted updated patches, etc)?
brian wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > >> > >> I am finding the email traffic > >> almost impossible to continue tracking, so something different is > >> happening, but it seems it is not volume-related. > > > > Yes, my perception also is that it's getting harder and harder to keep > > up with the list traffic; so something is happening that a simple > > volume count doesn't capture. > > Perhaps it's just subjective: we're all getting older. I thought about that, which is scary in itself. :-( But I don't think Tom and I have both gotten significantly older in the past year, and we are slightly different ages. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Steve Crawford wrote: > Bruce Momjian wrote: > > brian wrote: > > > >> Tom Lane wrote: > >> > >> Perhaps it's just subjective: we're all getting older. > >> > Which, as "Dr. A" (aka Isaac Asimov) pointed out in "The Sensuous Dirty > Old Man", beats the alternative. > > I thought about that, which is scary in itself. :-( But I don't think > > Tom and I have both gotten significantly older in the past year, and we > > are slightly different ages. > > > > > Would that it were linear. The change from 2 to 3 is striking. 32 to 33, > not so much. 82 to 83 may be life and death. > > The rate of change of my near-vision has certainly been non-linear of > late. :( Tom, is their a Postgres old-age home yet? ;-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > brian wrote: > >> Tom Lane wrote: >> >> Perhaps it's just subjective: we're all getting older. >> Which, as "Dr. A" (aka Isaac Asimov) pointed out in "The Sensuous Dirty Old Man", beats the alternative. > I thought about that, which is scary in itself. :-( But I don't think > Tom and I have both gotten significantly older in the past year, and we > are slightly different ages. > > Would that it were linear. The change from 2 to 3 is striking. 32 to 33, not so much. 82 to 83 may be life and death. The rate of change of my near-vision has certainly been non-linear of late. :( Cheers, Steve
On Friday 21 November 2008 19:10:45 Tom Lane wrote: > Yeah, I think this is most probably explained by repeat postings > of successive versions of large patches. Still, Ron might be on to > something. I had not considered message lengths in my previous > numbers ... Also consider that since we started using the wiki for tracking patches, a lot of trivial emails like "your patch has been added to the queue" and "where are we on this" have disappeared.
Alvaro Herrera wrote: > Sam Mason wrote: > >> the following has links to more: >> >> http://markmail.org/search/?q=list:org.postgresql > > Wow, the spanish list is the 3rd in traffic after hackers and general! yeah and that tom lane guy sent over 77000(!!!) mails to the lists up to now ... Stefan
Ron Mayer wrote: > Joshua D. Drake wrote: > > On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: > >> Bruce Momjian wrote: > >>> Tom Lane wrote: > >>>> ... harder to keep > >>>> up with the list traffic; so something is happening that a simple > >>>> volume count doesn't capture. > >> If measured in "bytes of the gzipped mbox" it ... > > > > Its because we eliminated the -patches mailing list. > > That's part of it. I've added -patches to the graph at > http://0ape.com/postgres_mailinglist_size/ as well as > a graph of hackers+patches combined; and it still looks > like hackers+patches is quite high in the past 3 months. > > With hackers+patches it looks like 2002-08 was the biggest > month; but the past 3 months still look roughly twice > late 2007's numbers. Can someoone graph CVS traffic, showing the historical number of commits and number of changed lines? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Ron Mayer wrote: >> Joshua D. Drake wrote: >>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: >>>> Bruce Momjian wrote: >>>>> Tom Lane wrote: >>>>>> ... harder to keep >>>>>> up with the list traffic; so something is happening that a simple >>>>>> volume count doesn't capture. >>>> If measured in "bytes of the gzipped mbox" it ... >>> Its because we eliminated the -patches mailing list. >> That's part of it. I've added -patches to the graph at >> http://0ape.com/postgres_mailinglist_size/ as well as >> a graph of hackers+patches combined; and it still looks >> like hackers+patches is quite high in the past 3 months. >> >> With hackers+patches it looks like 2002-08 was the biggest >> month; but the past 3 months still look roughly twice >> late 2007's numbers. > > Can someoone graph CVS traffic, showing the historical number of commits > and number of changed lines? Ohloh has some graphs, are they detailed enough? http://www.ohloh.net/projects/postgres/analyses/latest //Magnus
Alvaro Herrera <alvherre@commandprompt.com> writes: > Richard Huxton wrote: > >> Some of the EXPLAINs on the performance list are practically impossible >> to read unless you've got the time to cut+paste and fix line-endings. > > Maybe we should start recommending people to post those via > http://explain-analyze.info/ What would be really neat would be having the mailing list do something automatically. Either fix the message inline or generate a link to something like this. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Tom Lane <tgl@sss.pgh.pa.us> writes: > Bruce Momjian <bruce@momjian.us> writes: >> Tom Lane wrote: >>> So, to a first approximation, the PG list traffic has been constant >>> since 2000. Not the result I expected. > >> I also was confused by its flatness. I am finding the email traffic >> almost impossible to continue tracking, so something different is >> happening, but it seems it is not volume-related. > > Yes, my perception also is that it's getting harder and harder to keep > up with the list traffic; so something is happening that a simple > volume count doesn't capture. I've noticed recently that the mailing list traffic seems very "bursty". We have days with hundreds of messages on lots of different in-depth topics and other days with hardly any messages at all. I wonder if it's hard to follow because we've been picking up more simultaneous threads instead of all being on one thread together before moving on to the next one. Another idea, I wonder if the project has gone more international and therefore has more traffic at odd hours of the day for everyone. It would also mean more long-lived threads with large latencies between messages and replies. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
Magnus Hagander wrote: > Bruce Momjian wrote: > > Ron Mayer wrote: > >> Joshua D. Drake wrote: > >>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: > >>>> Bruce Momjian wrote: > >>>>> Tom Lane wrote: > >>>>>> ... harder to keep > >>>>>> up with the list traffic; so something is happening that a simple > >>>>>> volume count doesn't capture. > >>>> If measured in "bytes of the gzipped mbox" it ... > >>> Its because we eliminated the -patches mailing list. > >> That's part of it. I've added -patches to the graph at > >> http://0ape.com/postgres_mailinglist_size/ as well as > >> a graph of hackers+patches combined; and it still looks > >> like hackers+patches is quite high in the past 3 months. > >> > >> With hackers+patches it looks like 2002-08 was the biggest > >> month; but the past 3 months still look roughly twice > >> late 2007's numbers. > > > > Can someoone graph CVS traffic, showing the historical number of commits > > and number of changed lines? > > Ohloh has some graphs, are they detailed enough? > http://www.ohloh.net/projects/postgres/analyses/latest I saw that but that only shows total lines, not the number of lines changed, or commits per hour, etc. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> Bruce Momjian <bruce@momjian.us> writes: >>> Tom Lane wrote: >>>> So, to a first approximation, the PG list traffic has been constant >>>> since 2000. Not the result I expected. >>> I also was confused by its flatness. I am finding the email traffic >>> almost impossible to continue tracking, so something different is >>> happening, but it seems it is not volume-related. >> Yes, my perception also is that it's getting harder and harder to keep >> up with the list traffic; so something is happening that a simple >> volume count doesn't capture. > > I've noticed recently that the mailing list traffic seems very "bursty". We > have days with hundreds of messages on lots of different in-depth topics and > other days with hardly any messages at all. I wonder if it's hard to follow > because we've been picking up more simultaneous threads instead of all being > on one thread together before moving on to the next one. > > Another idea, I wonder if the project has gone more international and > therefore has more traffic at odd hours of the day for everyone. It would also > mean more long-lived threads with large latencies between messages and replies. I wouldn't be at all surprised if that were the case. Alas, it's not possible to analyze usefully because so many companies use .com addresses instead of addresses under a cctld, and because so many people use webmail services like gmail that provide no geographical information in the domain. Certainly the variety of languages seen in error messages, the variation in English language skills, etc would tend to suggest a pretty strong user base outside the US/Uk/Au . -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > Gregory Stark wrote: >> Another idea, I wonder if the project has gone more international and >> therefore has more traffic at odd hours of the day for everyone. It would also >> mean more long-lived threads with large latencies between messages and replies. > > I wouldn't be at all surprised if that were the case. Alas, it's not > possible to analyze usefully because so many companies use .com > addresses instead of addresses under a cctld, and because so many people > use webmail services like gmail that provide no geographical information > in the domain. I would be curious to see the average lifespan of threads over time. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
Craig Ringer <craig@postnewspapers.com.au> writes: > Gregory Stark wrote: >> Another idea, I wonder if the project has gone more international and >> therefore has more traffic at odd hours of the day for everyone. > I wouldn't be at all surprised if that were the case. Alas, it's not > possible to analyze usefully because so many companies use .com > addresses instead of addresses under a cctld, and because so many people > use webmail services like gmail that provide no geographical information > in the domain. You can often get a sense of where someone is by noting the timezone of the Date: header in their messages. That seems to get localized correctly even by a lot of the big services like gmail. FWIW, this project has always been pretty diversified geographically; we've had major contributors in Russia, Japan, and Australia for as far back as I can remember, not just Europe and the Americas. I think there are more people now, but I'm not convinced that the distribution has changed much. regards, tom lane
Gregory Stark wrote: > I would be curious to see the average lifespan of threads over time. I happen to have the mail archives stored in a database, so I've expressed this in SQL and below are some results for hackers and general, 2007-2008. count is the number of distinct threads whose oldest message is in the specified month. A thread is started as soon as a message has an In-Reply-To field pointing to an existing Message-Id. Results for pgsql-hackers: month | avg span | median span | count ---------+------------------+-----------------+------- 2007-01 | 7 days 10:00:00 | 1 day 04:18:00 | 211 2007-02 | 7 days 10:00:00 | 1 day 00:23:48 | 186 2007-03 | 16 days 30:00:00 | 1 day 05:45:37 | 171 2007-04 | 13 days 26:00:00 | 19:07:00 | 142 2007-05 | 19 days 30:00:00 | 1 day 04:46:36 | 122 2007-06 | 15 days 19:00:00 | 23:38:13 | 111 2007-07 | 19 days 25:00:00 | 21:04:04 | 106 2007-08 | 13 days 30:00:00 | 20:26:39 | 133 2007-09 | 21 days 32:00:00 | 1 day 16:43:10 | 121 2007-10 | 13 days 19:00:00 | 17:23:24 | 148 2007-11 | 16 days 15:00:00 | 16:23:00 | 140 2007-12 | 17 days 16:00:00 | 1 day 07:28:05 | 81 2008-01 | 13 days 12:00:00 | 23:02:33 | 127 2008-02 | 9 days 11:00:00 | 12:44:28 | 130 2008-03 | 10 days 14:00:00 | 22:57:18 | 140 2008-04 | 10 days 14:00:00 | 1 day 00:32:34 | 132 2008-05 | 13 days 09:00:00 | 1 day 20:57:57 | 113 2008-06 | 7 days 27:00:00 | 1 day 05:42:46 | 102 2008-07 | 13 days 26:00:00 | 2 days 07:43:34 | 133 2008-08 | 9 days 33:00:00 | 1 day 07:47:09 | 121 2008-09 | 7 days 25:00:00 | 1 day 19:00:50 | 125 2008-10 | 6 days 14:00:00 | 1 day 10:31:01 | 178 Results for pgsql-general: month | avg span | median span | count ---------+-----------------+-------------+------- 2007-01 | 1 day 25:00:00 | 10:57:11 | 329 2007-02 | 2 days 28:00:00 | 10:50:38 | 295 2007-03 | 3 days 08:00:00 | 14:54:08 | 310 2007-04 | 6 days 18:00:00 | 17:40:55 | 244 2007-05 | 3 days 22:00:00 | 16:43:54 | 287 2007-06 | 2 days 13:00:00 | 11:26:46 | 297 2007-07 | 2 days 19:00:00 | 11:59:40 | 263 2007-08 | 3 days 14:00:00 | 16:35:16 | 335 2007-09 | 3 days 14:00:00 | 13:23:09 | 245 2007-10 | 2 days 16:00:00 | 08:46:09 | 302 2007-11 | 3 days 07:00:00 | 08:28:06 | 294 2007-12 | 2 days 31:00:00 | 10:25:14 | 255 2008-01 | 2 days 14:00:00 | 13:23:12 | 248 2008-02 | 2 days 14:00:00 | 10:02:16 | 257 2008-03 | 1 day 25:00:00 | 13:20:06 | 245 2008-04 | 1 day 30:00:00 | 08:26:06 | 238 2008-05 | 3 days 22:00:00 | 18:58:27 | 211 2008-06 | 2 days 24:00:00 | 14:46:02 | 191 2008-07 | 1 day 29:00:00 | 10:37:17 | 221 2008-08 | 1 day 22:00:00 | 14:14:45 | 205 2008-09 | 1 day 24:00:00 | 14:26:26 | 202 2008-10 | 1 day 19:00:00 | 12:32:56 | 219 "median span" is the median computed with the pl/R median function applied to intervals as a number of seconds and then cast back to intervals for display. I believe the median is good to mitigate the contribution of messages with wrong dates and posters that reply to very old messages. And median span appears to differs a lot from the average span. If people feel like playing with the database to build other queries, feel free to bug me off-list about it. I can arrange to make a dump available or share the scripts to build it yourself from the mailboxes archives. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Tom Lane wrote: > FWIW, this project has always been pretty diversified geographically; > we've had major contributors in Russia, Japan, and Australia for as far > back as I can remember, not just Europe and the Americas. I think there > are more people now, but I'm not convinced that the distribution has > changed much. How about getting a new version of the world map showing developer's location? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 23/11/2008 20:58, Alvaro Herrera wrote: > How about getting a new version of the world map showing developer's > location? Cool! Definitely +1 if we can show contributors to the list generally, not just developers. Ray. ------------------------------------------------------------------ Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland rod@iol.ie Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals ------------------------------------------------------------------
Daniel Verite wrote: > Gregory Stark wrote: > > > I would be curious to see the average lifespan of threads over time. > > I happen to have the mail archives stored in a database, [...] When I saw the manitou-mail.org stuff some days ago I was curious -- how feasible would it be to host our web archives using a database of some sort, instead of the current mbox-based Mhonarc installation we use, which is so full of problems and limitations? I wondered about using Oryx some time ago, and got in contact with Abhijit Menon-Sen to that end, but that never fructified. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > When I saw the manitou-mail.org stuff some days ago I was curious -- how > feasible would it be to host our web archives using a database of some > sort, instead of the current mbox-based Mhonarc installation we use, > which is so full of problems and limitations? > > I wondered about using Oryx some time ago, and got in contact with > Abhijit Menon-Sen to that end, but that never fructified. We are a DB project after all and hosting our own archives might make for a good example, eating our own dogfood so to speak. I would image the biggest problem is just finding someone who wants to take this on and maintain it.
Bruce Momjian wrote: > Magnus Hagander wrote: >> Bruce Momjian wrote: >>> Ron Mayer wrote: >>>> Joshua D. Drake wrote: >>>>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: >>>>>> Bruce Momjian wrote: >>>>>>> Tom Lane wrote: >>>>>>>> ... harder to keep >>>>>>>> up with the list traffic; so something is happening that a simple >>>>>>>> volume count doesn't capture. >>>>>> If measured in "bytes of the gzipped mbox" it ... >>>>> Its because we eliminated the -patches mailing list. >>>> That's part of it. I've added -patches to the graph at >>>> http://0ape.com/postgres_mailinglist_size/ as well as >>>> a graph of hackers+patches combined; and it still looks >>>> like hackers+patches is quite high in the past 3 months. >>>> >>>> With hackers+patches it looks like 2002-08 was the biggest >>>> month; but the past 3 months still look roughly twice >>>> late 2007's numbers. >>> Can someoone graph CVS traffic, showing the historical number of commits >>> and number of changed lines? >> Ohloh has some graphs, are they detailed enough? >> http://www.ohloh.net/projects/postgres/analyses/latest > > I saw that but that only shows total lines, not the number of lines > changed, or commits per hour, etc. I've got a database of all our commits with info like: timestamp, author, number of rows added/deleted, number of files modified, which files modified, rows modified in each file. Basically it's data quickly parsed from a "git log --stat" of HEAD (because it was a whole lot easier to parse the git stuff). It's got about 27,500 commits in it - only the stuff that happened on HEAD, nothing for backbranches. So, if you can be a bit more specific in what you want :) Attached is for example "commits per month" and "lines per month". //Magnus
Attachment
Magnus Hagander wrote: > Bruce Momjian wrote: >> Magnus Hagander wrote: >>> Bruce Momjian wrote: >>>> Ron Mayer wrote: >>>>> Joshua D. Drake wrote: >>>>>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote: >>>>>>> Bruce Momjian wrote: >>>>>>>> Tom Lane wrote: >>>>>>>>> ... harder to keep >>>>>>>>> up with the list traffic; so something is happening that a simple >>>>>>>>> volume count doesn't capture. >>>>>>> If measured in "bytes of the gzipped mbox" it ... >>>>>> Its because we eliminated the -patches mailing list. >>>>> That's part of it. I've added -patches to the graph at >>>>> http://0ape.com/postgres_mailinglist_size/ as well as >>>>> a graph of hackers+patches combined; and it still looks >>>>> like hackers+patches is quite high in the past 3 months. >>>>> >>>>> With hackers+patches it looks like 2002-08 was the biggest >>>>> month; but the past 3 months still look roughly twice >>>>> late 2007's numbers. >>>> Can someoone graph CVS traffic, showing the historical number of commits >>>> and number of changed lines? >>> Ohloh has some graphs, are they detailed enough? >>> http://www.ohloh.net/projects/postgres/analyses/latest >> I saw that but that only shows total lines, not the number of lines >> changed, or commits per hour, etc. > > I've got a database of all our commits with info like: timestamp, > author, number of rows added/deleted, number of files modified, which > files modified, rows modified in each file. Basically it's data quickly > parsed from a "git log --stat" of HEAD (because it was a whole lot > easier to parse the git stuff). It's got about 27,500 commits in it - > only the stuff that happened on HEAD, nothing for backbranches. > > So, if you can be a bit more specific in what you want :) Attached is > for example "commits per month" and "lines per month". Here's another one that crudely shows the amount of code vs docs commits (just looking at docs/* vs src/* - clearly very crude) Sent as a separate mail since -general won't accept large mails. //Magnus
Attachment
Alvaro Herrera wrote: > When I saw the manitou-mail.org stuff some days ago I was curious > -- how feasible would it be to host our web archives using a > database of some sort, instead of the current mbox-based Mhonarc > installation we use, which is so full of problems and limitations? One problem I've noticed on archives.postgresql.org is that threads don't cross month boundaries. For example if I'm looking at: http://archives.postgresql.org/pgsql-general/2008-09/msg01003.php , according to the webpage, this message doesn't has references nor follow-up. But actually it's a reply to this one: http://archives.postgresql.org/pgsql-general/2008-05/msg00404.php and it has this followup: http://archives.postgresql.org/pgsql-general/2008-10/msg00466.php In fact it looks like all threads are cut at the end of each month, and that everything is partitioned by month anyway. I guess it's because mhonarc operates only on the current month by design, which makes sense if its storage doesn't scale. What manitou-mail could provide here is the database structure and the scripts that feed the live archive, and it wouldn't have these limitations of mhonarc. As a bonus, it opens up the data to SQL interfaces, so you can think of querying messages using complex criteria, or producing statistics, reports... But it doesn't provide the generation of webpages that is after all the whole point of this web archive. I assume that the idea is to generate everything in static pages like mhonarc seems to do rather than live-querying the database. Anyway that HTML generation part would need to be recreated or changed to deal with a different "data source" and a different partitioning of data, if it's modular enough that such a thing is possible. How hard would that be? Personally I have no idea, anyone who is familiar with that code? Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
On Sun, Nov 23, 2008 at 11:31 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Daniel Verite wrote: >> Gregory Stark wrote: >> >> > I would be curious to see the average lifespan of threads over time. >> >> I happen to have the mail archives stored in a database, [...] > > When I saw the manitou-mail.org stuff some days ago I was curious -- how > feasible would it be to host our web archives using a database of some > sort, instead of the current mbox-based Mhonarc installation we use, > which is so full of problems and limitations? Didn't I send you a copy of the prototype code I'd written to do that? The biggest issue for third party code is that we need to preserve our existing URLs. /D -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Magnus Hagander wrote: > > I saw that but that only shows total lines, not the number of lines > > changed, or commits per hour, etc. > > I've got a database of all our commits with info like: timestamp, > author, number of rows added/deleted, number of files modified, which > files modified, rows modified in each file. Basically it's data quickly > parsed from a "git log --stat" of HEAD (because it was a whole lot > easier to parse the git stuff). It's got about 27,500 commits in it - > only the stuff that happened on HEAD, nothing for backbranches. > > So, if you can be a bit more specific in what you want :) Attached is > for example "commits per month" and "lines per month". Yea, this is the graph I was looking for; unfortunately it does not shed any insight on why things seems busier; 'old age' is starting to look plausible. ;-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 2008-12-02 at 15:47 -0500, Bruce Momjian wrote: > Magnus Hagander wrote: > > > I saw that but that only shows total lines, not the number of lines > > > changed, or commits per hour, etc. > > > > I've got a database of all our commits with info like: timestamp, > > author, number of rows added/deleted, number of files modified, which > > files modified, rows modified in each file. Basically it's data quickly > > parsed from a "git log --stat" of HEAD (because it was a whole lot > > easier to parse the git stuff). It's got about 27,500 commits in it - > > only the stuff that happened on HEAD, nothing for backbranches. > > > > So, if you can be a bit more specific in what you want :) Attached is > > for example "commits per month" and "lines per month". > > Yea, this is the graph I was looking for; unfortunately it does not > shed any insight on why things seems busier; 'old age' is starting to > look plausible. ;-) It could also be that a lot of work is happening off channel. I know that many contributors are having the first 50 replies of the email on jabber, irc or directly and then posting to various lists at any given point. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997