Thread: postgresql.org Weblogs?

postgresql.org Weblogs?

From
Josh Berkus
Date:
Dave, Marc, Magnus etc.

I, GreenPlum, JasperSoft, and others are setting up a demo of data
warehousing on Bizgres/PostgreSQL for OSCON.   The demo will involve doing
sophisticated reporting on clickstream (weblog) data.

We've asked a couple of high-profile web sites for their weblog data for
this demo, but due to corporate beaurocracy, they may not come through in
time.  Would it be possible for us, you think, to use the weblogs of some
of the PostgreSQL.org sites?   The end product will be OSS and will run on
PostgreSQL.

What we're looking for is 2+weeks of web logs in extended format.  We want
the PostgreSQL.org data as a backup in case neither of these big-name web
sites comes through.  Does this sound possible?

Thanks!

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: postgresql.org Weblogs?

From
"Marc G. Fournier"
Date:
Sure ... there is nothing confidential in them, just let us know what you
want (if you want it) ...

On Mon, 11 Jul 2005, Josh Berkus wrote:

> Dave, Marc, Magnus etc.
>
> I, GreenPlum, JasperSoft, and others are setting up a demo of data
> warehousing on Bizgres/PostgreSQL for OSCON.   The demo will involve doing
> sophisticated reporting on clickstream (weblog) data.
>
> We've asked a couple of high-profile web sites for their weblog data for
> this demo, but due to corporate beaurocracy, they may not come through in
> time.  Would it be possible for us, you think, to use the weblogs of some
> of the PostgreSQL.org sites?   The end product will be OSS and will run on
> PostgreSQL.
>
> What we're looking for is 2+weeks of web logs in extended format.  We want
> the PostgreSQL.org data as a backup in case neither of these big-name web
> sites comes through.  Does this sound possible?
>
> Thanks!
>
> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: postgresql.org Weblogs?

From
Josh Berkus
Date:
Marc,

> Sure ... there is nothing confidential in them, just let us know what
> you want (if you want it) ...

Yeah, I thought I'd ask now, because presumably you don't keep the weblogs
indefinitely.

We'd like 2-or-more-weeks worth with one of the following formats:

==============================

Extended/Combined log format is ok but cookies are not included but for
a demo it should work ok. The inclusion of the user agent reduces merges
but only to a certain extent.

Fields
------
remotehost      
rfc931          
authuser        
[date]          
"request"       
status          
bytes           
"referer"       
"user_agent"

NCSA Combined Log Format, W3CA format or the combined plus cookies are
the most useful.

So you end up with

Fields
------
remotehost      
rfc931          
authuser        
[date]          
"request"       
status          
bytes           
"referer"       
"user_agent"
In-cookie
Out-cookie
Server (optional)

==================================

--Josh

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: postgresql.org Weblogs?

From
"Marc G. Fournier"
Date:
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-368409868-1121114493=:1788
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Mon, 11 Jul 2005, Josh Berkus wrote:

> Marc,
>
>> Sure ... there is nothing confidential in them, just let us know what
>> you want (if you want it) ...
>
> Yeah, I thought I'd ask now, because presumably you don't keep the weblogs
> indefinitely.

Actually ... unless a server crashes so that we lose them, we do ...
svr1.postgresql (developer) has stuff going back to May 2004 ... we used
ot have www.postgresql.org *way* back before all the moving around ...
pgfoundry.org goes back to May 7th, 2004 ... I'm a pack rat :)

> Extended/Combined log format is ok but cookies are not included but for
> a demo it should work ok. The inclusion of the user agent reduces merges
> but only to a certain extent.
>
> Fields
> ------
> remotehost      
> rfc931          
> authuser        
> [date]          
> "request"       
> status          
> bytes           
> "referer"       
> "user_agent"
>
> NCSA Combined Log Format, W3CA format or the combined plus cookies are
> the most useful.
>
> So you end up with
>
> Fields
> ------
> remotehost      
> rfc931          
> authuser        
> [date]          
> "request"       
> status          
> bytes           
> "referer"       
> "user_agent"
> In-cookie
> Out-cookie
> Server (optional)

Can you send me a CustomLog entry for apache that I can add to the
configuration, that lays things out exactly as you want it?


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664
--0-368409868-1121114493=:1788--

Re: postgresql.org Weblogs?

From
"Magnus Hagander"
Date:
> > Marc,
> >
> >> Sure ... there is nothing confidential in them, just let
> us know what
> >> you want (if you want it) ...
> >
> > Yeah, I thought I'd ask now, because presumably you don't keep the
> > weblogs indefinitely.
>
> Actually ... unless a server crashes so that we lose them, we do ...
> svr1.postgresql (developer) has stuff going back to May 2004
> ... we used ot have www.postgresql.org *way* back before all
> the moving around ...
> pgfoundry.org goes back to May 7th, 2004 ... I'm a pack rat :)

Hmm, IIRC the logs on all the static mirrors are going to /dev/null, for
performance reasons. Specifiaclly the FreeBSD jails couldn't deal with
the write activity. We got like 10-15 times better performance after
disabling it. Got significantly better on the other machines as well
(linux mirrors), but the freebsd jails was the main reason we set up
that policy. With logging enabled, all the servers just fell over. IIRC,
this includes wwwmaster, which we also don't have logs for.

If the data is good enough, go for the pgfoundry ones. Then consider
disabling the logging there as well to see if it helps with the
performance issues ;-) Or use the old logs - I s'pose to test your
systems you don't need "up to date" logs?


//Magnus


Re: postgresql.org Weblogs?

From
"Marc G. Fournier"
Date:
On Tue, 12 Jul 2005, Magnus Hagander wrote:

>>> Marc,
>>>
>>>> Sure ... there is nothing confidential in them, just let
>> us know what
>>>> you want (if you want it) ...
>>>
>>> Yeah, I thought I'd ask now, because presumably you don't keep the
>>> weblogs indefinitely.
>>
>> Actually ... unless a server crashes so that we lose them, we do ...
>> svr1.postgresql (developer) has stuff going back to May 2004
>> ... we used ot have www.postgresql.org *way* back before all
>> the moving around ...
>> pgfoundry.org goes back to May 7th, 2004 ... I'm a pack rat :)
>
> Hmm, IIRC the logs on all the static mirrors are going to /dev/null, for
> performance reasons. Specifiaclly the FreeBSD jails couldn't deal with
> the write activity. We got like 10-15 times better performance after
> disabling it. Got significantly better on the other machines as well
> (linux mirrors), but the freebsd jails was the main reason we set up
> that policy. With logging enabled, all the servers just fell over. IIRC,
> this includes wwwmaster, which we also don't have logs for.

Ya, that's why we've started to work on moving to 5.x ... I'm getting
really tired of the "unsupported version" :(

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664