Home > mailing lists

Re: Hadoop backend? - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Hadoop backend?
Date	February 23, 2009 01:09:28
Msg-id	603c8f070902221809w21b084ddra1b1636f31699959@mail.gmail.com Whole thread Raw
In response to	Re: Hadoop backend? (pi song <pi.songs@gmail.com>)
Responses	Re: Hadoop backend? (Paul Sheer <paulsheer@gmail.com>)
List	pgsql-hackers

Tree view

On Sun, Feb 22, 2009 at 5:18 PM, pi song <pi.songs@gmail.com> wrote:
> One more problem is that data placement on HDFS is inherent, meaning you
> have no explicit control. Thus, you cannot place two sets of data which are
> likely to be joined together on the same node = uncontrollable latency
> during query processing.
> Pi Song

It would only be possible to have the actual PostgreSQL backends
running on a single node anyway, because they use shared memory to
hold lock tables and things.  The advantage of a distributed file
system would be that you could access more storage (and more system
buffer cache) than would be possible on a single system (or perhaps
the same amount but at less cost).  Assuming some sort of
per-tablespace control over the storage manager, you could put your
most frequently accessed data locally and the less frequently accessed
data into the DFS.

But you'd still have to pull all the data back to the master node to
do anything with it.  Being able to actually distribute the
computation would be a much harder problem.  Currently, we don't even
have the ability to bring multiple CPUs to bear on (for example) a
large sequential scan (even though all the data is on a single node).

...Robert

pgsql-hackers by date:

From: Tom Lane
Date: 22 February 2009, 22:51:40
Subject: Re: some broken on pg_stat_user_functions

From: Bruce Momjian
Date: 23 February 2009, 02:03:56
Subject: 8.4 features presentation

Re: Hadoop backend? - Mailing list pgsql-hackers

Previous

Next