Re: Partitioning / Clustering - Mailing list pgsql-performance

From Alex Turner
Subject Re: Partitioning / Clustering
Date
Msg-id 33c6269f0505120905192dc723@mail.gmail.com
Whole thread Raw
In response to Re: Partitioning / Clustering  (Alex Stapleton <alexs@advfn.com>)
Responses Re: Partitioning / Clustering
List pgsql-performance
Ok - my common sense alarm is going off here...

There are only 6.446 billion people worldwide.  100 Billion page views
would require every person in the world to view 18 pages of yahoo
every day.  Not very likely.

http://www.internetworldstats.com/stats.htm
suggests that there are around 1 billion people actualy on the internet.

That means each and every person on the internet has to view 100 pages
per day of yahoo.

pretty unlikely IMHO.  I for one don't even use Yahoo ;)

100 million page views per day suggests that 1 in 100 people on the
internet each viewed 10 pages of a site.  Thats a pretty high
percentage if you ask me.

If I visit 20 web sites in a day, and see an average of 10 pages per
site. that means only about 2000 or so sites generate 100 million page
views in a day or better.

100 million pageviews averages to 1157/sec, which we'll double for
peak load to 2314.

I can easily see a system doing 2314 hash lookups per second.  Hell I
wrote a system that could do a thousand times that four years ago on a
single 1Ghz Athlon.  Heck - you can get 2314 lookups/sec on a 486 ;)

Given that session information doesn't _have_ to persist to storage,
and can be kept in RAM.  A single server could readily manage session
information for even very large sites (of course over a million
concurrent users could really start chewing into RAM, but if you are
Yahoo, you can probably afford a box with 100GB of RAM ;).

We get over 1000 tps on a dual opteron with a couple of mid size RAID
arrays on 10k discs with fsync on for small transactions.  I'm sure
that could easily be bettered with a few more dollars.

Maybe my number are off, but somehow it doesn't seem like that many
people need a highly complex session solution to me.

Alex Turner
netEconomist

On 5/12/05, Alex Stapleton <alexs@advfn.com> wrote:
>
> On 12 May 2005, at 15:08, Alex Turner wrote:
>
> > Having local sessions is unnesesary, and here is my logic:
> >
> > Generaly most people have less than 100Mb of bandwidth to the
> > internet.
> >
> > If you make the assertion that you are transferring equal or less
> > session data between your session server (lets say an RDBMS) and the
> > app server than you are between the app server and the client, an out
> > of band 100Mb network for session information is plenty of bandwidth.
> > This also represents OLTP style traffic, which postgresql is pretty
> > good at.  You should easily be able to get over 100Tps.  100 hits per
> > second is an awful lot of traffic, more than any website I've managed
> > will ever see.
> >
> > Why solve the complicated clustered sessions problem, when you don't
> > really need to?
>
> 100 hits a second = 8,640,000 hits a day. I work on a site which does
>  > 100 million dynamic pages a day. In comparison Yahoo probably does
>  > 100,000,000,000 (100 billion) views a day
>   if I am interpreting Alexa's charts correctly. Which is about
> 1,150,000 a second.
>
> Now considering the site I work on is not even in the top 1000 on
> Alexa, theres a lot of sites out there which need to solve this
> problem I would assume.
>
> There are also only so many hash table lookups a single machine can
> do, even if its a Quad Opteron behemoth.
>
>
> > Alex Turner
> > netEconomist
> >
> > On 5/11/05, PFC <lists@boutiquenumerique.com> wrote:
> >
> >>
> >>
> >>
> >>> However, memcached (and for us, pg_memcached) is an excellent way to
> >>> improve
> >>> horizontal scalability by taking disposable data (like session
> >>> information)
> >>> out of the database and putting it in protected RAM.
> >>>
> >>
> >>         So, what is the advantage of such a system versus, say, a
> >> "sticky
> >> sessions" system where each session is assigned to ONE application
> >> server
> >> (not PHP then) which keeps it in RAM as native objects instead of
> >> serializing and deserializing it on each request ?
> >>         I'd say the sticky sessions should perform a lot better,
> >> and if one
> >> machine dies, only the sessions on this one are lost.
> >>         But of course you can't do it with PHP as you need an app
> >> server which
> >> can manage sessions. Potentially the savings are huge, though.
> >>
> >>         On Google, their distributed system spans a huge number of
> >> PCs and it has
> >> redundancy, ie. individual PC failure is a normal thing and is a
> >> part of
> >> the system, it is handled gracefully. I read a paper on this
> >> matter, it's
> >> pretty impressive. The google filesystem has nothing to do with
> >> databases
> >> though, it's more a massive data store / streaming storage.
> >>
> >> ---------------------------(end of
> >> broadcast)---------------------------
> >> TIP 1: subscribe and unsubscribe commands go to
> >> majordomo@postgresql.org
> >>
> >>
> >
> >
>
>

pgsql-performance by date:

Previous
From: Alex Stapleton
Date:
Subject: Re: Partitioning / Clustering
Next
From: John A Meinel
Date:
Subject: Re: Partitioning / Clustering