Home > mailing lists

Re: Index only scan - Mailing list pgsql-general

From	Ondrej Ivanič
Subject	Re: Index only scan
Date	October 11, 2012 02:36:23
Msg-id	CAM6mieKhkwhgkzsO2aqe51v_HOHGvB89g9BEsE_fypOBLhSm=w@mail.gmail.com Whole thread Raw
In response to	Index only scan (Lars Helge Øverland <larshelge@gmail.com>)
List	pgsql-general

Tree view

Hi,

On 10 October 2012 23:03, Lars Helge Øverland <larshelge@gmail.com> wrote:
> We are now in the process of designing a new component for analytics
> and this feature got me thinking we could utilize postgres over other
> alternatives like column-oriented databases. Basically we will have a
> wide, denormalized table with 20+ columns with relatively low
> cardinality. Typically we will have queries which sums a fact column
> based on where/group by clauses on several dimension columns (standard
> data warehouse stuff). An example would be "select a, b, c, sum(d)
> from analytics where a=1 and b=2 group by a,b,c";
>
> Finally, is there anyone else who are using postgres for this purpose
> and have some good tips to share in order to achieve good performance,
> including index strategies, beyond the standard config best practices?

yes, we had fact table which has around 250 columns and 250mil rows.
The question is if you can partition your data set. For example,
monthly partition. This keeps indexes small but all queries must be
constrained by the same column as is used for partitioning (ie.
monthly partitions -> every query should have "datetime between ...
and ...")

From my experience postgres is not good with large group by queries.
For example, your query:
select a, b, c, sum(d) from analytics where a=1 and b=2 group by a,b,c

could be executed over multiple connections:
insert into t select select a, b, c, sum(d) as d from analytics where
c >= val1 and c < val2 and a=1 and b=2 group by a,b,c
insert into t select select a, b, c, sum(d) as d from analytics where
c >= val2 and c < val3 and a=1 and b=2 group by a,b,c
...
insert into t select select a, b, c, sum(d) as d from analytics where
c >= valN-1 and c < valN and a=1 and b=2 group by a,b,c

and then get the final result:
select a, b, c, sum(d) from t group by a,b,c

You can use pgpool-II parallel query feature instead of manual slicing.

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)
(http://www.linkedin.com/in/ondrejivanic)

pgsql-general by date:

From: Gavin Flower
Date: 11 October 2012, 02:26:12
Subject: Re: Index only scan

From: Greg Sabino Mullane
Date: 11 October 2012, 02:36:36
Subject: Re: Planner chooses multi-column index in 9.2 when maybe it should not

Re: Index only scan - Mailing list pgsql-general

Previous

Next