Thread: Postgres mail list traffic over time

Postgres mail list traffic over time

From
Tom Lane
Date:
I got interested by Bruce's plot of PG email traffic here
http://momjian.us/main/img/pgincoming.gif
and decided to try to extend it into the past.  The data I have
available is just my own incoming mail log, but being a pack-rat by
nature I have that back to April 1998.  Attached is a graph of Postgres
list messages per month since then.  I should note that this covers only
the mail lists I'm subscribed to, which has been most of them since
about 1999; but the first few numbers in this chart are undercounts by
comparison.  Also, the very last dot is month-to-date for November and
so is an underestimate.

So, to a first approximation, the PG list traffic has been constant
since 2000.  Not the result I expected.

            regards, tom lane


Attachment

Re: Postgres mail list traffic over time

From
Bruce Momjian
Date:
Tom Lane wrote:
> I got interested by Bruce's plot of PG email traffic here
> http://momjian.us/main/img/pgincoming.gif
> and decided to try to extend it into the past.  The data I have
> available is just my own incoming mail log, but being a pack-rat by
> nature I have that back to April 1998.  Attached is a graph of Postgres
> list messages per month since then.  I should note that this covers only
> the mail lists I'm subscribed to, which has been most of them since
> about 1999; but the first few numbers in this chart are undercounts by
> comparison.  Also, the very last dot is month-to-date for November and
> so is an underestimate.
>
> So, to a first approximation, the PG list traffic has been constant
> since 2000.  Not the result I expected.

Yes, I know Magnus did a graph for the PG-EU conference and it was also
flat;  perhaps he can post it here.  His chart was pulled from the
Postgres archives, so it is even more accurate than our graphs.

I also was confused by its flatness.  I am finding the email traffic
almost impossible to continue tracking, so something different is
happening, but it seems it is not volume-related.  I am going to post
another blog tomorrow with more thoughts.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Postgres mail list traffic over time

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> So, to a first approximation, the PG list traffic has been constant
>> since 2000.  Not the result I expected.

> I also was confused by its flatness.  I am finding the email traffic
> almost impossible to continue tracking, so something different is
> happening, but it seems it is not volume-related.

Yes, my perception also is that it's getting harder and harder to keep
up with the list traffic; so something is happening that a simple
volume count doesn't capture.

Does anyone have the data to break it down per mailing list?  That might
yield some more insight.

            regards, tom lane

Re: Postgres mail list traffic over time

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> So, to a first approximation, the PG list traffic has been constant
> >> since 2000.  Not the result I expected.
>
> > I also was confused by its flatness.  I am finding the email traffic
> > almost impossible to continue tracking, so something different is
> > happening, but it seems it is not volume-related.
>
> Yes, my perception also is that it's getting harder and harder to keep
> up with the list traffic; so something is happening that a simple
> volume count doesn't capture.

Agreed.  I am struggling to put into words some of my angst, but I am
concerned I will not be able to offer the same guarantees I have done in
previous releases that every bug has been either fixed or added to the
TODO list, and every submitted patch has been either applied or rejected.

There, I said it.  :-(

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Postgres mail list traffic over time

From
"Joshua D. Drake"
Date:
On Thu, 2008-11-20 at 22:36 -0500, Tom Lane wrote:
> I got interested by Bruce's plot of PG email traffic here
> http://momjian.us/main/img/pgincoming.gifto
> and decided to try to extend it into the past.  The data I have
> available is just my own incoming mail log, but being a pack-rat by
> nature I have that back to April 1998.  Attached is a graph of Postgres
> list messages per month since then.  I should note that this covers only
> the mail lists I'm subscribed to, which has been most of them since
> about 1999; but the first few numbers in this chart are undercounts by
> comparison.  Also, the very last dot is month-to-date for November and
> so is an underestimate.
>
> So, to a first approximation, the PG list traffic has been constant
> since 2000.  Not the result I expected.

Am I reading your graph wrong? I show a sharp increase right before 2006
and then a small drop off but a constant after that?

I know that my email (I am pretty sure I am subscribed to at least as
many lists as you) has been on a steady incline, especially through
-general and -hackers.

Joshua D. Drake


>
>             regards, tom lane
>
--


Re: Postgres mail list traffic over time

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> I know that my email (I am pretty sure I am subscribed to at least as
> many lists as you) has been on a steady incline, especially through
> -general and -hackers.

I would have said the same, which is why I find it noteworthy that
my mail logs don't seem to support that impression.  Have you got
actual log data on the point?

            regards, tom lane

Re: Postgres mail list traffic over time

From
brian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
 >>
>> I am finding the email traffic
>> almost impossible to continue tracking, so something different is
>> happening, but it seems it is not volume-related.
>
> Yes, my perception also is that it's getting harder and harder to keep
> up with the list traffic; so something is happening that a simple
> volume count doesn't capture.

Perhaps it's just subjective: we're all getting older.

Soon, these pesky whippersnappers will want to twitter their PG
questions to this list over YouTube.

Re: Postgres mail list traffic over time

From
"Joshua D. Drake"
Date:
On Thu, 2008-11-20 at 23:46 -0500, Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> > I know that my email (I am pretty sure I am subscribed to at least as
> > many lists as you) has been on a steady incline, especially through
> > -general and -hackers.
>
> I would have said the same, which is why I find it noteworthy that
> my mail logs don't seem to support that impression.  Have you got
> actual log data on the point?

I purge my postgresql logs except for some specific ones (like PGFG).
however, I have the entire archives.postgresql.org.

pgsql-hackers (since inception, 1997), first line date, second line
number of messages.

1997-01
939
1997-02
300
1997-03
534
1997-04
865
1997-05
484
1997-06
601
1997-07
392
1997-08
399
1997-09
579
1997-10
594
1997-11
381
1997-12
351
1998-01
870
1998-02
1326
1998-03
1121
1998-04
707
1998-05
632
1998-06
493
1998-07
490
1998-08
867
1998-09
675
1998-10
1221
1998-11
609
1998-12
600
1999-01
769
1999-02
699
1999-03
1008
1999-04
217
1999-05
1155
1999-06
1241
1999-07
1052
1999-08
705
1999-09
945
1999-10
962
1999-11
929
1999-12
1065
2000-01
1688
2000-02
1460
2000-03
288
2000-04
187
2000-05
1686
2000-06
1283
2000-07
1477
2000-08
890
2000-09
642
2000-10
1320
2000-11
1419
2000-12
1234
2001-01
1469
2001-02
1178
2001-03
1708
2001-04
1181
2001-05
1478
2001-06
1151
2001-07
955
2001-08
1220
2001-09
921
2001-10
1165
2001-11
1318
2001-12
970
2002-01
1411
2002-02
1233
2002-03
1246
2002-04
1565
2002-05
1169
2002-06
1045
2002-07
1339
2002-08
2308
2002-09
1843
2002-10
1469
2002-11
1257
2002-12
1172
2003-01
1356
2003-02
1324
2003-03
1262
2003-04
1033
2003-05
812
2003-06
1316
2003-07
1068
2003-08
1373
2003-09
1695
2003-10
1631
2003-11
1643
2003-12
836
2004-01
878
2004-02
1017
2004-03
1352
2004-04
1177
2004-05
1495
2004-06
1025
2004-07
1430
2004-08
1620
2004-09
953
2004-10
1084
2004-11
1226
2004-12
963
2005-01
1116
2005-02
987
2005-03
1086
2005-04
1022
2005-05
1626
2005-06
1598
2005-07
1162
2005-08
1217
2005-09
1484
2005-10
1442
2005-11
1587
2005-12
1278
2006-01
1050
2006-02
1282
2006-03
1343
2006-04
1158
2006-05
1386
2006-06
1645
2006-07
1660
2006-08
2060
2006-09
2397
2006-10
1583
2006-11
1031
2006-12
1437
2007-01
1663
2007-02
1953
2007-03
1871
2007-04
1285
2007-05
1201
2007-06
1140
2007-07
1019
2007-08
1244
2007-09
1230
2007-10
1575
2007-11
1380
2007-12
1000
2008-01
1236
2008-02
1324
2008-03
1308
2008-04
1928
2008-05
1128
2008-06
1161
2008-07
1512
2008-08
1391
2008-09
1910
2008-10
1715
2008-11
1431



>
>             regards, tom lane
>
--


Re: Postgres mail list traffic over time

From
"Joshua D. Drake"
Date:
On Thu, 2008-11-20 at 21:19 -0800, Joshua D. Drake wrote:
> On Thu, 2008-11-20 at 23:46 -0500, Tom Lane wrote:
> > "Joshua D. Drake" <jd@commandprompt.com> writes:
> > > I know that my email (I am pretty sure I am subscribed to at least as
> > > many lists as you) has been on a steady incline, especially through
> > > -general and -hackers.
> >
> > I would have said the same, which is why I find it noteworthy that
> > my mail logs don't seem to support that impression.  Have you got
> > actual log data on the point?
>
> I purge my postgresql logs except for some specific ones (like PGFG).
> however, I have the entire archives.postgresql.org.
>
> pgsql-hackers (since inception, 1997), first line date, second line
> number of messages.
>

pgsql-general

1998-05
139
1998-06
337
1998-07
438
1998-08
226
1998-09
187
1998-10
283
1998-11
269
1998-12
242
1999-01
302
1999-02
356
1999-03
385
1999-04
332
1999-05
404
1999-06
470
1999-07
411
1999-08
496
1999-09
385
1999-10
606
1999-11
512
1999-12
631
2000-01
667
2000-02
477
2000-03
219
2000-04
705
2000-05
843
2000-06
803
2000-07
1180
2000-08
861
2000-09
999
2000-10
1337
2000-11
1084
2000-12
1002
2001-01
1700
2001-02
1623
2001-03
1656
2001-04
1568
2001-05
1710
2001-06
1651
2001-07
1342
2001-08
1303
2001-09
1195
2001-10
1223
2001-11
1124
2001-12
901
2002-01
1216
2002-02
1419
2002-03
1388
2002-04
1287
2002-05
1192
2002-06
1366
2002-07
1893
2002-08
1261
2002-09
1438
2002-10
1444
2002-11
1517
2002-12
1225
2003-01
1657
2003-02
1760
2003-03
1597
2003-04
1611
2003-05
1295
2003-06
1951
2003-07
1586
2003-08
1836
2003-09
1880
2003-10
1604
2003-11
1768
2003-12
1664
2004-01
1708
2004-02
1355
2004-03
1215
2004-04
1210
2004-05
965
2004-06
1236
2004-07
973
2004-08
1677
2004-09
1337
2004-10
1579
2004-11
1557
2004-12
1358
2005-01
1877
2005-02
1535
2005-03
1622
2005-04
1460
2005-05
1379
2005-06
1413
2005-07
1332
2005-08
1632
2005-09
1232
2005-10
1945
2005-11
1438
2005-12
1402
2006-01
1743
2006-02
1218
2006-03
1602
2006-04
1372
2006-05
1604
2006-06
1268
2006-07
1170
2006-08
1501
2006-09
1289
2006-10
1588
2006-11
1866
2006-12
1619
2007-01
1953
2007-02
1720
2007-03
1724
2007-04
1304
2007-05
1650
2007-06
1796
2007-07
1257
2007-08
2097
2007-09
1385
2007-10
1722
2007-11
1770
2007-12
1487
2008-01
1621
2008-02
1527
2008-03
1666
2008-04
1446
2008-05
1144
2008-06
1055
2008-07
1251
2008-08
1188
2008-09
1252
2008-10
1485
2008-11
1045
--


Re: Postgres mail list traffic over time

From
"Joshua D. Drake"
Date:
On Fri, 2008-11-21 at 00:06 -0500, brian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
>  >>
> >> I am finding the email traffic
> >> almost impossible to continue tracking, so something different is
> >> happening, but it seems it is not volume-related.
> >
> > Yes, my perception also is that it's getting harder and harder to keep
> > up with the list traffic; so something is happening that a simple
> > volume count doesn't capture.
>
> Perhaps it's just subjective: we're all getting older.

ouch


> Soon, these pesky whippersnappers will want to twitter their PG
> questions to this list over YouTube.
>

I assume you don't realize that is already happening :P

Joshua D. Drake


--


Re: Postgres mail list traffic over time

From
"Gregory Williamson"
Date:

Tom Lane wrote:

> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> So, to a first approximation, the PG list traffic has been constant
> >> since 2000.  Not the result I expected.
>
> > I also was confused by its flatness.  I am finding the email traffic
> > almost impossible to continue tracking, so something different is
> > happening, but it seems it is not volume-related.
>
> Yes, my perception also is that it's getting harder and harder to keep
> up with the list traffic; so something is happening that a simple
> volume count doesn't capture.

The numbers posted show a slow but steady increase, but I am wondering if there's more distinct subjects ?

Can we get a count on distinct threads per month (obviously some slop as some threads last for a while).

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)

Re: Postgres mail list traffic over time

From
Magnus Hagander
Date:
Magnus Hagander wrote:
> Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> Tom Lane wrote:
>>>> So, to a first approximation, the PG list traffic has been constant
>>>> since 2000.  Not the result I expected.
>>> I also was confused by its flatness.  I am finding the email traffic
>>> almost impossible to continue tracking, so something different is
>>> happening, but it seems it is not volume-related.
>> Yes, my perception also is that it's getting harder and harder to keep
>> up with the list traffic; so something is happening that a simple
>> volume count doesn't capture.
>>
>> Does anyone have the data to break it down per mailing list?  That might
>> yield some more insight.
>
> Here's a graph of the more popular mailinglists (I couldn't include all
> - the graph was completely unreadable) as seen in the archives search db.

Pfft, -general didn't like that file even though it was only 60k or so.

Here's a link to an uploaded version:
http://www.smugmug.com/photos/421507651_8pe6C-O.png

//Magnus

Re: Postgres mail list traffic over time

From
Alvaro Herrera
Date:
Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> > I know that my email (I am pretty sure I am subscribed to at least as
> > many lists as you) has been on a steady incline, especially through
> > -general and -hackers.
>
> I would have said the same, which is why I find it noteworthy that
> my mail logs don't seem to support that impression.  Have you got
> actual log data on the point?

Markmail shows some graphs.  The one on the "main page" gives the
traffic for all the lists:
http://pgsql.markmail.org/

If you search for "pgsql-general" you get a graph for that list:
http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-general

Same for -hackers:
http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-hackers

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Postgres mail list traffic over time

From
Sam Mason
Date:
On Thu, Nov 20, 2008 at 10:59:31PM -0500, Tom Lane wrote:
> Yes, my perception also is that it's getting harder and harder to keep
> up with the list traffic; so something is happening that a simple
> volume count doesn't capture.
>
> Does anyone have the data to break it down per mailing list?  That might
> yield some more insight.

The markmail archives generate pretty graphs and they seem to have a
good coverage from quite a few of the lists. e.g.:

  http://markmail.org/search/?q=list:org.postgresql.pgsql-general
  http://markmail.org/search/?q=list:org.postgresql.pgsql-hackers

the following has links to more:

  http://markmail.org/search/?q=list:org.postgresql

be interesting to see how their servers take the hammering!


  Sam

Re: Postgres mail list traffic over time

From
"Joshua D. Drake"
Date:
On Fri, 2008-11-21 at 10:43 -0300, Alvaro Herrera wrote:
> Tom Lane wrote:

> Markmail shows some graphs.  The one on the "main page" gives the
> traffic for all the lists:
> http://pgsql.markmail.org/
>
> If you search for "pgsql-general" you get a graph for that list:
> http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-general
>
> Same for -hackers:
> http://pgsql.markmail.org/search/?q=list%3Aorg.postgresql.pgsql-hackers
>

The top "Who sent it" list is very telling. It says, "Paging Tom Lane...
take a vacation!" :)



Joshua D. Drake



--


Re: Postgres mail list traffic over time

From
Alvaro Herrera
Date:
Sam Mason wrote:

> the following has links to more:
>
>   http://markmail.org/search/?q=list:org.postgresql

Wow, the spanish list is the 3rd in traffic after hackers and general!

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Postgres mail list traffic over time

From
Ron Mayer
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> I also was confused by its flatness.  I am finding the email traffic
>>> almost impossible to continue tracking, so something different is
>>> happening, but it seems it is not volume-related.
>> Yes, my perception also is that it's getting harder and harder to keep
>> up with the list traffic; so something is happening that a simple
>> volume count doesn't capture.

If measured in "bytes of the gzipped mbox" it looks like there's a
*huge* increase of volume on Hackers in the past 3 months - well
over twice the historical levels; and maybe 4X 2002-2006.

Graphs of this metric can be seen here:

   http://0ape.com/postgres_mailinglist_size/

In some ways I think compressed mbox sizes are a more fair way
of measuring the bandwidth for these lists since it (correctly)
counts a large gzipped path as requiring more mental effort than
people top-posting brief messages on top of old threads.


(Data from commands like
HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz | grep Content-Length
)

Re: Postgres mail list traffic over time

From
Adrian Klaver
Date:
On Thursday 20 November 2008 7:59:31 pm Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> So, to a first approximation, the PG list traffic has been constant
> >> since 2000.  Not the result I expected.
> >
> > I also was confused by its flatness.  I am finding the email traffic
> > almost impossible to continue tracking, so something different is
> > happening, but it seems it is not volume-related.
>
> Yes, my perception also is that it's getting harder and harder to keep
> up with the list traffic; so something is happening that a simple
> volume count doesn't capture.

I am still relatively new to Postgres, but my impression is that the questions
have gotten harder/more in depth. Fewer, How do you pronounce Postgres? and
more, Explain the various isolation levels for transactions and how does that
affect my particular situation?

>
> Does anyone have the data to break it down per mailing list?  That might
> yield some more insight.
>
>             regards, tom lane



--
Adrian Klaver
aklaver@comcast.net

Re: Postgres mail list traffic over time

From
"Joshua D. Drake"
Date:
On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
> Bruce Momjian wrote:
> > Tom Lane wrote:
> >> Bruce Momjian <bruce@momjian.us> writes:
> >>> I also was confused by its flatness.  I am finding the email traffic
> >>> almost impossible to continue tracking, so something different is
> >>> happening, but it seems it is not volume-related.
> >> Yes, my perception also is that it's getting harder and harder to keep
> >> up with the list traffic; so something is happening that a simple
> >> volume count doesn't capture.
>
> If measured in "bytes of the gzipped mbox" it looks like there's a
> *huge* increase of volume on Hackers in the past 3 months - well
> over twice the historical levels; and maybe 4X 2002-2006.

Its because we eliminated the -patches mailing list.

Joshua D. Drake

--


Re: Postgres mail list traffic over time

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
>> If measured in "bytes of the gzipped mbox" it looks like there's a
>> *huge* increase of volume on Hackers in the past 3 months - well
>> over twice the historical levels; and maybe 4X 2002-2006.

> Its because we eliminated the -patches mailing list.

Yeah, I think this is most probably explained by repeat postings
of successive versions of large patches.  Still, Ron might be on to
something.  I had not considered message lengths in my previous
numbers ...

            regards, tom lane

Re: Postgres mail list traffic over time

From
Ron Mayer
Date:
Joshua D. Drake wrote:
> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
>> Bruce Momjian wrote:
>>> Tom Lane wrote:
>>>> ... harder to keep
>>>> up with the list traffic; so something is happening that a simple
>>>> volume count doesn't capture.
>> If measured in "bytes of the gzipped mbox" it ...
>
> Its because we eliminated the -patches mailing list.

That's part of it.  I've added -patches to the graph at
http://0ape.com/postgres_mailinglist_size/ as well as
a graph of hackers+patches combined; and it still looks
like hackers+patches is quite high in the past 3 months.

With hackers+patches it looks like 2002-08 was the biggest
month; but the past 3 months still look roughly twice
late 2007's numbers.

Re: Postgres mail list traffic over time

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> Joshua D. Drake wrote:
>> Its because we eliminated the -patches mailing list.

> That's part of it.  I've added -patches to the graph at
> http://0ape.com/postgres_mailinglist_size/ as well as
> a graph of hackers+patches combined; and it still looks
> like hackers+patches is quite high in the past 3 months.

One of the reasons we got rid of -patches was the frequency of
cross-posting to both -hackers and -patches.  Are you double-counting
cross-posted messages?

            regards, tom lane

Re: Postgres mail list traffic over time

From
Richard Huxton
Date:
Adrian Klaver wrote:
>> Yes, my perception also is that it's getting harder and harder to keep
>> up with the list traffic; so something is happening that a simple
>> volume count doesn't capture.
>
> I am still relatively new to Postgres, but my impression is that the questions
> have gotten harder/more in depth. Fewer, How do you pronounce Postgres? and
> more, Explain the various isolation levels for transactions and how does that
> affect my particular situation?

This is definitely the case. Whether it's because the documentation is
better, or we're getting a more sophisticated user the questions are
certainly more involved.

Some of the EXPLAINs on the performance list are practically impossible
to read unless you've got the time to cut+paste and fix line-endings.

--
  Richard Huxton
  Archonet Ltd

Re: Postgres mail list traffic over time

From
Alvaro Herrera
Date:
Richard Huxton wrote:

> Some of the EXPLAINs on the performance list are practically impossible
> to read unless you've got the time to cut+paste and fix line-endings.

Maybe we should start recommending people to post those via
http://explain-analyze.info/

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Postgres mail list traffic over time

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
>
>> Its because we eliminated the -patches mailing list.
>>
>
> Yeah, I think this is most probably explained by repeat postings
> of successive versions of large patches.  Still, Ron might be on to
> something.  I had not considered message lengths in my previous
> numbers ...

Message size, but also the sophistication of new patches makes them
harder to keep up with and that there are more people writing code which
takes time to review and keep up with.  These probably explain why you
and Bruce feel the increase more than the rest of us.


Re: Postgres mail list traffic over time

From
Ron Mayer
Date:
Tom Lane wrote:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>> Joshua D. Drake wrote:
>>> Its because we eliminated the -patches mailing list.
>
>> That's part of it.  I've added -patches to the graph at
>> http://0ape.com/postgres_mailinglist_size/ as well as
>> a graph of hackers+patches combined; and it still looks
>> like hackers+patches is quite high in the past 3 months.
>
> One of the reasons we got rid of -patches was the frequency of
> cross-posting to both -hackers and -patches.  Are you double-counting
> cross-posted messages?

For the combined graph I just summed the output of:
  HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz | grep Content-Length
  HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-patches.2008-09.gz | grep Content-Length

I didn't look to see if the downloadable mboxes had duplicate messages.

If people want the raw data, here's the script I used to get it.
============================================================================
#!/usr/bin/env ruby
%W{rubygems hpricot open-uri gruff}.each{|l| require l}
def chart(url)
   h   = Hpricot.parse(open(url){|f| f.read})
   mboxes = (h / "//a").map{|x| x.attributes['href']}. select{|x| x=~/\.gz/}
   mboxes.sort.each{|x|
     y = `HEAD #{url}/#{x}` =~ /Content-Length: (\d+)/ && $1
     puts "#{x} #{y}"
   }
end
patches = chart('http://archives.postgresql.org/pgsql-patches')
general = chart('http://archives.postgresql.org/pgsql-general')
hackers = chart('http://archives.postgresql.org/pgsql-hackers')
============================================================================

Perhaps some of the extra burden on the experienced hackers is
a larger volume of newer people trying to contribute that are needing
more handholding (and thus more re-posted updated patches, etc)?


Re: Postgres mail list traffic over time

From
Bruce Momjian
Date:
brian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
>  >>
> >> I am finding the email traffic
> >> almost impossible to continue tracking, so something different is
> >> happening, but it seems it is not volume-related.
> >
> > Yes, my perception also is that it's getting harder and harder to keep
> > up with the list traffic; so something is happening that a simple
> > volume count doesn't capture.
>
> Perhaps it's just subjective: we're all getting older.

I thought about that, which is scary in itself.  :-(  But I don't think
Tom and I have both gotten significantly older in the past year, and we
are slightly different ages.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Postgres mail list traffic over time

From
Bruce Momjian
Date:
Steve Crawford wrote:
> Bruce Momjian wrote:
> > brian wrote:
> >
> >> Tom Lane wrote:
> >>
> >> Perhaps it's just subjective: we're all getting older.
> >>
> Which, as "Dr. A" (aka Isaac Asimov) pointed out in "The Sensuous Dirty
> Old Man", beats the alternative.
> > I thought about that, which is scary in itself.  :-(  But I don't think
> > Tom and I have both gotten significantly older in the past year, and we
> > are slightly different ages.
> >
> >
> Would that it were linear. The change from 2 to 3 is striking. 32 to 33,
> not so much. 82 to 83 may be life and death.
>
> The rate of change of my near-vision has certainly been non-linear of
> late. :(

Tom, is their a Postgres old-age home yet?  ;-)

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Postgres mail list traffic over time

From
Steve Crawford
Date:
Bruce Momjian wrote:
> brian wrote:
>
>> Tom Lane wrote:
>>
>> Perhaps it's just subjective: we're all getting older.
>>
Which, as "Dr. A" (aka Isaac Asimov) pointed out in "The Sensuous Dirty
Old Man", beats the alternative.
> I thought about that, which is scary in itself.  :-(  But I don't think
> Tom and I have both gotten significantly older in the past year, and we
> are slightly different ages.
>
>
Would that it were linear. The change from 2 to 3 is striking. 32 to 33,
not so much. 82 to 83 may be life and death.

The rate of change of my near-vision has certainly been non-linear of
late. :(

Cheers,
Steve


Re: Postgres mail list traffic over time

From
Peter Eisentraut
Date:
On Friday 21 November 2008 19:10:45 Tom Lane wrote:
> Yeah, I think this is most probably explained by repeat postings
> of successive versions of large patches.  Still, Ron might be on to
> something.  I had not considered message lengths in my previous
> numbers ...

Also consider that since we started using the wiki for tracking patches, a lot
of trivial emails like "your patch has been added to the queue" and "where
are we on this" have disappeared.

Re: Postgres mail list traffic over time

From
Stefan Kaltenbrunner
Date:
Alvaro Herrera wrote:
> Sam Mason wrote:
>
>> the following has links to more:
>>
>>   http://markmail.org/search/?q=list:org.postgresql
>
> Wow, the spanish list is the 3rd in traffic after hackers and general!

yeah and that tom lane guy sent over 77000(!!!) mails to the lists up to
now ...


Stefan

Re: Postgres mail list traffic over time

From
Bruce Momjian
Date:
Ron Mayer wrote:
> Joshua D. Drake wrote:
> > On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
> >> Bruce Momjian wrote:
> >>> Tom Lane wrote:
> >>>> ... harder to keep
> >>>> up with the list traffic; so something is happening that a simple
> >>>> volume count doesn't capture.
> >> If measured in "bytes of the gzipped mbox" it ...
> >
> > Its because we eliminated the -patches mailing list.
>
> That's part of it.  I've added -patches to the graph at
> http://0ape.com/postgres_mailinglist_size/ as well as
> a graph of hackers+patches combined; and it still looks
> like hackers+patches is quite high in the past 3 months.
>
> With hackers+patches it looks like 2002-08 was the biggest
> month; but the past 3 months still look roughly twice
> late 2007's numbers.

Can someoone graph CVS traffic, showing the historical number of commits
and number of changed lines?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Postgres mail list traffic over time

From
Magnus Hagander
Date:
Bruce Momjian wrote:
> Ron Mayer wrote:
>> Joshua D. Drake wrote:
>>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
>>>> Bruce Momjian wrote:
>>>>> Tom Lane wrote:
>>>>>> ... harder to keep
>>>>>> up with the list traffic; so something is happening that a simple
>>>>>> volume count doesn't capture.
>>>> If measured in "bytes of the gzipped mbox" it ...
>>> Its because we eliminated the -patches mailing list.
>> That's part of it.  I've added -patches to the graph at
>> http://0ape.com/postgres_mailinglist_size/ as well as
>> a graph of hackers+patches combined; and it still looks
>> like hackers+patches is quite high in the past 3 months.
>>
>> With hackers+patches it looks like 2002-08 was the biggest
>> month; but the past 3 months still look roughly twice
>> late 2007's numbers.
>
> Can someoone graph CVS traffic, showing the historical number of commits
> and number of changed lines?

Ohloh has some graphs, are they detailed enough?
http://www.ohloh.net/projects/postgres/analyses/latest

//Magnus

Re: Postgres mail list traffic over time

From
Gregory Stark
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:

> Richard Huxton wrote:
>
>> Some of the EXPLAINs on the performance list are practically impossible
>> to read unless you've got the time to cut+paste and fix line-endings.
>
> Maybe we should start recommending people to post those via
> http://explain-analyze.info/

What would be really neat would be having the mailing list do something
automatically. Either fix the message inline or generate a link to something
like this.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Postgres mail list traffic over time

From
Gregory Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Bruce Momjian <bruce@momjian.us> writes:
>> Tom Lane wrote:
>>> So, to a first approximation, the PG list traffic has been constant
>>> since 2000.  Not the result I expected.
>
>> I also was confused by its flatness.  I am finding the email traffic
>> almost impossible to continue tracking, so something different is
>> happening, but it seems it is not volume-related.
>
> Yes, my perception also is that it's getting harder and harder to keep
> up with the list traffic; so something is happening that a simple
> volume count doesn't capture.

I've noticed recently that the mailing list traffic seems very "bursty". We
have days with hundreds of messages on lots of different in-depth topics and
other days with hardly any messages at all. I wonder if it's hard to follow
because we've been picking up more simultaneous threads instead of all being
on one thread together before moving on to the next one.

Another idea, I wonder if the project has gone more international and
therefore has more traffic at odd hours of the day for everyone. It would also
mean more long-lived threads with large latencies between messages and replies.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Postgres mail list traffic over time

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> Bruce Momjian wrote:
> > Ron Mayer wrote:
> >> Joshua D. Drake wrote:
> >>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
> >>>> Bruce Momjian wrote:
> >>>>> Tom Lane wrote:
> >>>>>> ... harder to keep
> >>>>>> up with the list traffic; so something is happening that a simple
> >>>>>> volume count doesn't capture.
> >>>> If measured in "bytes of the gzipped mbox" it ...
> >>> Its because we eliminated the -patches mailing list.
> >> That's part of it.  I've added -patches to the graph at
> >> http://0ape.com/postgres_mailinglist_size/ as well as
> >> a graph of hackers+patches combined; and it still looks
> >> like hackers+patches is quite high in the past 3 months.
> >>
> >> With hackers+patches it looks like 2002-08 was the biggest
> >> month; but the past 3 months still look roughly twice
> >> late 2007's numbers.
> >
> > Can someoone graph CVS traffic, showing the historical number of commits
> > and number of changed lines?
>
> Ohloh has some graphs, are they detailed enough?
> http://www.ohloh.net/projects/postgres/analyses/latest

I saw that but that only shows total lines, not the number of lines
changed, or commits per hour, etc.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Postgres mail list traffic over time

From
Craig Ringer
Date:
Gregory Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>
>> Bruce Momjian <bruce@momjian.us> writes:
>>> Tom Lane wrote:
>>>> So, to a first approximation, the PG list traffic has been constant
>>>> since 2000.  Not the result I expected.
>>> I also was confused by its flatness.  I am finding the email traffic
>>> almost impossible to continue tracking, so something different is
>>> happening, but it seems it is not volume-related.
>> Yes, my perception also is that it's getting harder and harder to keep
>> up with the list traffic; so something is happening that a simple
>> volume count doesn't capture.
>
> I've noticed recently that the mailing list traffic seems very "bursty". We
> have days with hundreds of messages on lots of different in-depth topics and
> other days with hardly any messages at all. I wonder if it's hard to follow
> because we've been picking up more simultaneous threads instead of all being
> on one thread together before moving on to the next one.
>
> Another idea, I wonder if the project has gone more international and
> therefore has more traffic at odd hours of the day for everyone. It would also
> mean more long-lived threads with large latencies between messages and replies.

I wouldn't be at all surprised if that were the case. Alas, it's not
possible to analyze usefully because so many companies use .com
addresses instead of addresses under a cctld, and because so many people
use webmail services like gmail that provide no geographical information
in the domain.

Certainly the variety of languages seen in error messages, the variation
in English language skills, etc would tend to suggest a pretty strong
user base outside the US/Uk/Au .

--
Craig Ringer

Re: Postgres mail list traffic over time

From
Gregory Stark
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:

> Gregory Stark wrote:
>> Another idea, I wonder if the project has gone more international and
>> therefore has more traffic at odd hours of the day for everyone. It would also
>> mean more long-lived threads with large latencies between messages and replies.
>
> I wouldn't be at all surprised if that were the case. Alas, it's not
> possible to analyze usefully because so many companies use .com
> addresses instead of addresses under a cctld, and because so many people
> use webmail services like gmail that provide no geographical information
> in the domain.

I would be curious to see the average lifespan of threads over time.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: Postgres mail list traffic over time

From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:
> Gregory Stark wrote:
>> Another idea, I wonder if the project has gone more international and
>> therefore has more traffic at odd hours of the day for everyone.

> I wouldn't be at all surprised if that were the case. Alas, it's not
> possible to analyze usefully because so many companies use .com
> addresses instead of addresses under a cctld, and because so many people
> use webmail services like gmail that provide no geographical information
> in the domain.

You can often get a sense of where someone is by noting the timezone of
the Date: header in their messages.  That seems to get localized
correctly even by a lot of the big services like gmail.

FWIW, this project has always been pretty diversified geographically;
we've had major contributors in Russia, Japan, and Australia for as far
back as I can remember, not just Europe and the Americas.  I think there
are more people now, but I'm not convinced that the distribution has
changed much.

            regards, tom lane

mail list traffic

From
"Daniel Verite"
Date:
    Gregory Stark wrote:

> I would be curious to see the average lifespan of threads over time.

I happen to have the mail archives stored in a database, so I've
expressed this in SQL and below are some results for hackers and
general, 2007-2008. count is the number of distinct threads whose
oldest message is in the specified month. A thread is started as
soon as a message has an In-Reply-To field pointing to an
existing Message-Id.

Results for pgsql-hackers:

  month  |     avg span     |    median span   | count
---------+------------------+-----------------+-------
 2007-01 | 7 days 10:00:00  | 1 day 04:18:00  |   211
 2007-02 | 7 days 10:00:00  | 1 day 00:23:48  |   186
 2007-03 | 16 days 30:00:00 | 1 day 05:45:37  |   171
 2007-04 | 13 days 26:00:00 | 19:07:00          |   142
 2007-05 | 19 days 30:00:00 | 1 day 04:46:36  |   122
 2007-06 | 15 days 19:00:00 | 23:38:13          |   111
 2007-07 | 19 days 25:00:00 | 21:04:04          |   106
 2007-08 | 13 days 30:00:00 | 20:26:39          |   133
 2007-09 | 21 days 32:00:00 | 1 day 16:43:10  |   121
 2007-10 | 13 days 19:00:00 | 17:23:24          |   148
 2007-11 | 16 days 15:00:00 | 16:23:00          |   140
 2007-12 | 17 days 16:00:00 | 1 day 07:28:05  |    81
 2008-01 | 13 days 12:00:00 | 23:02:33          |   127
 2008-02 | 9 days 11:00:00  | 12:44:28          |   130
 2008-03 | 10 days 14:00:00 | 22:57:18          |   140
 2008-04 | 10 days 14:00:00 | 1 day 00:32:34  |   132
 2008-05 | 13 days 09:00:00 | 1 day 20:57:57  |   113
 2008-06 | 7 days 27:00:00  | 1 day 05:42:46  |   102
 2008-07 | 13 days 26:00:00 | 2 days 07:43:34 |   133
 2008-08 | 9 days 33:00:00  | 1 day 07:47:09  |   121
 2008-09 | 7 days 25:00:00  | 1 day 19:00:50  |   125
 2008-10 | 6 days 14:00:00  | 1 day 10:31:01  |   178

 Results for pgsql-general:

  month  |    avg span       | median span | count
---------+-----------------+-------------+-------
 2007-01 | 1 day 25:00:00  | 10:57:11     |   329
 2007-02 | 2 days 28:00:00 | 10:50:38     |   295
 2007-03 | 3 days 08:00:00 | 14:54:08     |   310
 2007-04 | 6 days 18:00:00 | 17:40:55     |   244
 2007-05 | 3 days 22:00:00 | 16:43:54     |   287
 2007-06 | 2 days 13:00:00 | 11:26:46     |   297
 2007-07 | 2 days 19:00:00 | 11:59:40     |   263
 2007-08 | 3 days 14:00:00 | 16:35:16     |   335
 2007-09 | 3 days 14:00:00 | 13:23:09     |   245
 2007-10 | 2 days 16:00:00 | 08:46:09     |   302
 2007-11 | 3 days 07:00:00 | 08:28:06     |   294
 2007-12 | 2 days 31:00:00 | 10:25:14     |   255
 2008-01 | 2 days 14:00:00 | 13:23:12     |   248
 2008-02 | 2 days 14:00:00 | 10:02:16     |   257
 2008-03 | 1 day 25:00:00  | 13:20:06     |   245
 2008-04 | 1 day 30:00:00  | 08:26:06     |   238
 2008-05 | 3 days 22:00:00 | 18:58:27     |   211
 2008-06 | 2 days 24:00:00 | 14:46:02     |   191
 2008-07 | 1 day 29:00:00  | 10:37:17     |   221
 2008-08 | 1 day 22:00:00  | 14:14:45     |   205
 2008-09 | 1 day 24:00:00  | 14:26:26     |   202
 2008-10 | 1 day 19:00:00  | 12:32:56     |   219

"median span" is the median computed with the pl/R median function
applied to intervals as a number of seconds and then cast back to
intervals for display. I believe the median is good to mitigate the
contribution of messages with wrong dates and posters that reply to
very old messages. And median span appears to differs a lot from the
average span.

If people feel like playing with the database to build other queries,
feel free to bug me off-list about it. I can arrange to make a dump
available or share the scripts to build it yourself from the
mailboxes archives.

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org



Re: Postgres mail list traffic over time

From
Alvaro Herrera
Date:
Tom Lane wrote:

> FWIW, this project has always been pretty diversified geographically;
> we've had major contributors in Russia, Japan, and Australia for as far
> back as I can remember, not just Europe and the Americas.  I think there
> are more people now, but I'm not convinced that the distribution has
> changed much.

How about getting a new version of the world map showing developer's
location?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Postgres mail list traffic over time

From
Raymond O'Donnell
Date:
On 23/11/2008 20:58, Alvaro Herrera wrote:

> How about getting a new version of the world map showing developer's
> location?

Cool! Definitely +1 if we can show contributors to the list generally,
not just developers.

Ray.


------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: mail list traffic

From
Alvaro Herrera
Date:
Daniel Verite wrote:
>     Gregory Stark wrote:
>
> > I would be curious to see the average lifespan of threads over time.
>
> I happen to have the mail archives stored in a database, [...]

When I saw the manitou-mail.org stuff some days ago I was curious -- how
feasible would it be to host our web archives using a database of some
sort, instead of the current mbox-based Mhonarc installation we use,
which is so full of problems and limitations?

I wondered about using Oryx some time ago, and got in contact with
Abhijit Menon-Sen to that end, but that never fructified.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: mail list traffic

From
"Matthew T. O'Connor"
Date:
Alvaro Herrera wrote:
> When I saw the manitou-mail.org stuff some days ago I was curious -- how
> feasible would it be to host our web archives using a database of some
> sort, instead of the current mbox-based Mhonarc installation we use,
> which is so full of problems and limitations?
>
> I wondered about using Oryx some time ago, and got in contact with
> Abhijit Menon-Sen to that end, but that never fructified.

We are a DB project after all and hosting our own archives might make
for a good example, eating our own dogfood so to speak.  I would image
the biggest problem is just finding someone who wants to take this on
and maintain it.



Re: Postgres mail list traffic over time

From
Magnus Hagander
Date:
Bruce Momjian wrote:
> Magnus Hagander wrote:
>> Bruce Momjian wrote:
>>> Ron Mayer wrote:
>>>> Joshua D. Drake wrote:
>>>>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
>>>>>> Bruce Momjian wrote:
>>>>>>> Tom Lane wrote:
>>>>>>>> ... harder to keep
>>>>>>>> up with the list traffic; so something is happening that a simple
>>>>>>>> volume count doesn't capture.
>>>>>> If measured in "bytes of the gzipped mbox" it ...
>>>>> Its because we eliminated the -patches mailing list.
>>>> That's part of it.  I've added -patches to the graph at
>>>> http://0ape.com/postgres_mailinglist_size/ as well as
>>>> a graph of hackers+patches combined; and it still looks
>>>> like hackers+patches is quite high in the past 3 months.
>>>>
>>>> With hackers+patches it looks like 2002-08 was the biggest
>>>> month; but the past 3 months still look roughly twice
>>>> late 2007's numbers.
>>> Can someoone graph CVS traffic, showing the historical number of commits
>>> and number of changed lines?
>> Ohloh has some graphs, are they detailed enough?
>> http://www.ohloh.net/projects/postgres/analyses/latest
>
> I saw that but that only shows total lines, not the number of lines
> changed, or commits per hour, etc.

I've got a database of all our commits with info like: timestamp,
author, number of rows added/deleted, number of files modified, which
files modified, rows modified in each file. Basically it's data quickly
parsed from a "git log --stat" of HEAD (because it was a whole lot
easier to parse the git stuff). It's got about 27,500 commits in it -
only the stuff that happened on HEAD, nothing for backbranches.

So, if you can be a bit more specific in what you want :) Attached is
for example "commits per month" and "lines per month".

//Magnus

Attachment

Re: Postgres mail list traffic over time

From
Magnus Hagander
Date:
Magnus Hagander wrote:
> Bruce Momjian wrote:
>> Magnus Hagander wrote:
>>> Bruce Momjian wrote:
>>>> Ron Mayer wrote:
>>>>> Joshua D. Drake wrote:
>>>>>> On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:
>>>>>>> Bruce Momjian wrote:
>>>>>>>> Tom Lane wrote:
>>>>>>>>> ... harder to keep
>>>>>>>>> up with the list traffic; so something is happening that a simple
>>>>>>>>> volume count doesn't capture.
>>>>>>> If measured in "bytes of the gzipped mbox" it ...
>>>>>> Its because we eliminated the -patches mailing list.
>>>>> That's part of it.  I've added -patches to the graph at
>>>>> http://0ape.com/postgres_mailinglist_size/ as well as
>>>>> a graph of hackers+patches combined; and it still looks
>>>>> like hackers+patches is quite high in the past 3 months.
>>>>>
>>>>> With hackers+patches it looks like 2002-08 was the biggest
>>>>> month; but the past 3 months still look roughly twice
>>>>> late 2007's numbers.
>>>> Can someoone graph CVS traffic, showing the historical number of commits
>>>> and number of changed lines?
>>> Ohloh has some graphs, are they detailed enough?
>>> http://www.ohloh.net/projects/postgres/analyses/latest
>> I saw that but that only shows total lines, not the number of lines
>> changed, or commits per hour, etc.
>
> I've got a database of all our commits with info like: timestamp,
> author, number of rows added/deleted, number of files modified, which
> files modified, rows modified in each file. Basically it's data quickly
> parsed from a "git log --stat" of HEAD (because it was a whole lot
> easier to parse the git stuff). It's got about 27,500 commits in it -
> only the stuff that happened on HEAD, nothing for backbranches.
>
> So, if you can be a bit more specific in what you want :) Attached is
> for example "commits per month" and "lines per month".

Here's another one that crudely shows the amount of code vs docs commits
(just looking at docs/* vs src/* - clearly very crude)

Sent as a separate mail since -general won't accept large mails.

//Magnus


Attachment

Re: mail list traffic

From
"Daniel Verite"
Date:
    Alvaro Herrera wrote:

> When I saw the manitou-mail.org stuff some days ago I was curious
> -- how feasible would it be to host our web archives using a
> database of some sort, instead of the current mbox-based Mhonarc
> installation we use, which is so full of problems and limitations?

One problem I've noticed on archives.postgresql.org is that threads
don't cross month boundaries.
For example if I'm looking at:
http://archives.postgresql.org/pgsql-general/2008-09/msg01003.php ,
according to the webpage, this message doesn't has references nor
follow-up.
But actually it's a reply to this one:
http://archives.postgresql.org/pgsql-general/2008-05/msg00404.php
and it has this followup:
http://archives.postgresql.org/pgsql-general/2008-10/msg00466.php

In fact it looks like all threads are cut at the end of each month, and
that everything is partitioned by month anyway. I guess it's because
mhonarc operates only on the current month by design, which makes sense
if its storage doesn't scale.

What manitou-mail could provide here is the database structure and the
scripts that feed the live archive, and it wouldn't have these
limitations of mhonarc. As a bonus, it opens up the data to SQL
interfaces, so you can think of querying messages using complex
criteria, or producing statistics, reports...
But it doesn't provide the generation of webpages that is after all the
whole point of this web archive. I assume that the idea is to generate
everything in static pages like mhonarc seems to do rather than
live-querying the database. Anyway that HTML generation part would need
to be recreated or changed to deal with a different "data source" and a
different partitioning of data, if it's modular enough that such a
thing is possible. How hard would that be? Personally I have no idea,
anyone who is familiar with that code?

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

Re: mail list traffic

From
"Dave Page"
Date:
On Sun, Nov 23, 2008 at 11:31 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Daniel Verite wrote:
>>       Gregory Stark wrote:
>>
>> > I would be curious to see the average lifespan of threads over time.
>>
>> I happen to have the mail archives stored in a database, [...]
>
> When I saw the manitou-mail.org stuff some days ago I was curious -- how
> feasible would it be to host our web archives using a database of some
> sort, instead of the current mbox-based Mhonarc installation we use,
> which is so full of problems and limitations?

Didn't I send you a copy of the prototype code I'd written to do that?
The biggest issue for third party code is that we need to preserve our
existing URLs.

/D


--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Postgres mail list traffic over time

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> > I saw that but that only shows total lines, not the number of lines
> > changed, or commits per hour, etc.
>
> I've got a database of all our commits with info like: timestamp,
> author, number of rows added/deleted, number of files modified, which
> files modified, rows modified in each file. Basically it's data quickly
> parsed from a "git log --stat" of HEAD (because it was a whole lot
> easier to parse the git stuff). It's got about 27,500 commits in it -
> only the stuff that happened on HEAD, nothing for backbranches.
>
> So, if you can be a bit more specific in what you want :) Attached is
> for example "commits per month" and "lines per month".

Yea, this is the graph I was looking for;  unfortunately it does not
shed any insight on why things seems busier;  'old age' is starting to
look plausible.  ;-)

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Postgres mail list traffic over time

From
"Joshua D. Drake"
Date:
On Tue, 2008-12-02 at 15:47 -0500, Bruce Momjian wrote:
> Magnus Hagander wrote:
> > > I saw that but that only shows total lines, not the number of lines
> > > changed, or commits per hour, etc.
> >
> > I've got a database of all our commits with info like: timestamp,
> > author, number of rows added/deleted, number of files modified, which
> > files modified, rows modified in each file. Basically it's data quickly
> > parsed from a "git log --stat" of HEAD (because it was a whole lot
> > easier to parse the git stuff). It's got about 27,500 commits in it -
> > only the stuff that happened on HEAD, nothing for backbranches.
> >
> > So, if you can be a bit more specific in what you want :) Attached is
> > for example "commits per month" and "lines per month".
>
> Yea, this is the graph I was looking for;  unfortunately it does not
> shed any insight on why things seems busier;  'old age' is starting to
> look plausible.  ;-)

It could also be that a lot of work is happening off channel. I know
that many contributors are having the first 50 replies of the email on
jabber, irc or directly and then posting to various lists at any given
point.



Joshua D. Drake


--
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997