Re: Postgres vs other Postgres based MPP implementations - Mailing list pgsql-general

From Ondrej Ivanič
Subject Re: Postgres vs other Postgres based MPP implementations
Date
Msg-id CAM6mie+eR8COxcvpgTQwSW0+vaPWPPk+ZfK6Xx9amtLrdVgaXQ@mail.gmail.com
Whole thread Raw
In response to Re: Postgres vs other Postgres based MPP implementations  (Craig Ringer <ringerc@ringerc.id.au>)
Responses Re: Postgres vs other Postgres based MPP implementations  (John R Pierce <pierce@hogranch.com>)
Re: Postgres vs other Postgres based MPP implementations  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-general
Hi,

On 8 November 2011 16:58, Craig Ringer <ringerc@ringerc.id.au> wrote:
> Which one(s) are you referring to? In what kind of workloads?
>
> Are you talking about Greenplum or similar?

Yes, mainly Geenplum and nCluster (AsterData). I haven't played with
gridSQL and pgpool-II's parallel query mode too much. Queries are
simple aggregations/drill downs/roll ups/... -- mostly heavy read
workloads but OLTP performance is required (like run query over 100m+
dataset in 15 sec)

> Pg isn't very good at parallelism within a single query. It handles lots of
> small queries concurrently fairly well, but isn't as good at using all the
> resources of a box on one big query because it can only use one CPU per
> query and has only a very limited ability to do concurrent I/O on a single
> query too.

Usually CPU is not bottleneck but I it was when I put Pustgres on
FusionIO. The problem is that PG spreads reads too much . iostat
reports very low drive utilisation and very low queue size.

> That said, you should be tuning effective_io_concurrency to match your
> storage; if you're not, then you aren't getting the benefit of the
> concurrent I/O that PostgreSQL *is* capable of. You'll also need to have
> tweaked your shared_buffers, work_mem etc appropriately for your query
> workload.

I've played with effective_io_concurrency (went thru entire range: 1,
2, 5, 10, 20, 50, 100, 200, 500, 1000) but nothing improved. Is there
a way to get PG backed IO stats using stock CentOS (5.7) kernel and
tools? (I can't change my env easily)

> queries it won't perform all that well without something to try to
> parallelise the queries outside Pg.

yeah, I have one moster query which needs half a day to finish but it
finishes in less than two hours on the same hw if is executed in
parallel...

> I'm not at all surprised by that. PostgreSQL couldn't use the full resources
> of your system when it was expressed as just one query.

This is very interesting area to work in but my lack of C/C++ and PG
internals puts me out of the game :)

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

pgsql-general by date:

Previous
From: pasman pasmański
Date:
Subject: Re: Www emulator
Next
From: John R Pierce
Date:
Subject: Re: Www emulator