Thread: Distributed/Parallel Computing

Distributed/Parallel Computing

From
Viji V Nair
Date:
Hi Team,

This question may have asked many times previously also, but I could not find a solution for this in any post. any help on the following will be greatly appreciated.

We have a PG DB with PostGIS functions. There are around 100 tables in the DB and almost all the tables contains 1 million records, around 5 table contains more than 20 million records. The total DB size is 40GB running on a 16GB, 2 x XEON 5420, RAID6, RHEL5 64bit machines, the questions is
   
1. The geometry calculations which we does are very complex and it is taking a very long time to complete. We have optimised PG config to the best, now we need a mechanism to distribute these queries to multiple boxes. What is best recommended way for this distributed/parallel deployment. We have tried PGPOOL II, but the performance is not satisfactory. Going for a try with GridSQL

2. How we can distribute/split these large tables to multiple disks of different nodes?

Thanks in advance

Viji


Re: Distributed/Parallel Computing

From
Jeff Janes
Date:
On Mon, Oct 5, 2009 at 12:11 PM, Viji V Nair <viji@fedoraproject.org> wrote:
> Hi Team,
>
> This question may have asked many times previously also, but I could not
> find a solution for this in any post. any help on the following will be
> greatly appreciated.
>
> We have a PG DB with PostGIS functions. There are around 100 tables in the
> DB and almost all the tables contains 1 million records, around 5 table
> contains more than 20 million records. The total DB size is 40GB running on
> a 16GB, 2 x XEON 5420, RAID6, RHEL5 64bit machines, the questions is
>
> 1. The geometry calculations which we does are very complex and it is taking
> a very long time to complete. We have optimised PG config to the best, now
> we need a mechanism to distribute these queries to multiple boxes. What is
> best recommended way for this distributed/parallel deployment. We have tried
> PGPOOL II, but the performance is not satisfactory. Going for a try with
> GridSQL

What is the nature of the transactions being run?  Are they primarily
read-only other than bulk updates to the GIS data, are they OLTP in
regards to the GIS data, or are they transactional with regards to
other tables but read-only with respect to the GIS?

Jeff

Re: Distributed/Parallel Computing

From
Viji V Nair
Date:
Hi Jeff,

These are bulk updates of GIS data and OLTP. For example, we are running some sqls to remove specific POIs those are intersecting with others, for such exercise we need to compare and remove the data form diffrent tables including the 20M data tables.

Apart form these there are bulk selects (read only) which are coming form the client systems also.

Thanks
Viji

On Tue, Oct 6, 2009 at 8:10 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Oct 5, 2009 at 12:11 PM, Viji V Nair <viji@fedoraproject.org> wrote:
> Hi Team,
>
> This question may have asked many times previously also, but I could not
> find a solution for this in any post. any help on the following will be
> greatly appreciated.
>
> We have a PG DB with PostGIS functions. There are around 100 tables in the
> DB and almost all the tables contains 1 million records, around 5 table
> contains more than 20 million records. The total DB size is 40GB running on
> a 16GB, 2 x XEON 5420, RAID6, RHEL5 64bit machines, the questions is
>
> 1. The geometry calculations which we does are very complex and it is taking
> a very long time to complete. We have optimised PG config to the best, now
> we need a mechanism to distribute these queries to multiple boxes. What is
> best recommended way for this distributed/parallel deployment. We have tried
> PGPOOL II, but the performance is not satisfactory. Going for a try with
> GridSQL

What is the nature of the transactions being run?  Are they primarily
read-only other than bulk updates to the GIS data, are they OLTP in
regards to the GIS data, or are they transactional with regards to
other tables but read-only with respect to the GIS?

Jeff