Thread: [GENERAL] Aggregate query on large tables

[GENERAL] Aggregate query on large tables

From
Job
Date:
Hi,

i have a table with about 400 millions of rows and i need to build some aggregate function for reporting.
I noticed that query performances are slowing down, even though indexes are present.

Query is simple (i make an example, my table is in italian language):
select a,sum(b) from table where a=x and c=y group by a

a is a varchar
b is an integer

x and y are two field i use for filter results.

I tried to create different indexes to try to speed up performance:
index1 (a)
index2 (c)
index3 (a,c)

I noticed, with query planner, that the mulfi-field index is not used.
Postgresql 9.6.1 still use scan without indexes.

I obtain significant improvements only if i create a materialized view with aggregated data.

But i would like to avoid - if possible - creating (and mantaining) the materialized view.

Than you!
/F

Re: [GENERAL] Aggregate query on large tables

From
Bill Moran
Date:
On Sun, 9 Apr 2017 17:05:56 +0200
Job <Job@colliniconsulting.it> wrote:

> Hi,
>
> i have a table with about 400 millions of rows and i need to build some aggregate function for reporting.
> I noticed that query performances are slowing down, even though indexes are present.
>
> Query is simple (i make an example, my table is in italian language):
> select a,sum(b) from table where a=x and c=y group by a
>
> a is a varchar
> b is an integer
>
> x and y are two field i use for filter results.
>
> I tried to create different indexes to try to speed up performance:
> index1 (a)
> index2 (c)
> index3 (a,c)
>
> I noticed, with query planner, that the mulfi-field index is not used.
> Postgresql 9.6.1 still use scan without indexes.
>
> I obtain significant improvements only if i create a materialized view with aggregated data.

It helps a lot if you provide EXPLAIN output for questions like this. Also,
definitions of what you consider "fast" and "slow" are helpful, as everyone
seems to have a different opinion on what those words mean.

However, my guess is that your WHERE condition isn't significantly restrictive
to make use of the index worth the time. If you'll be fetching a significant
percentage of the rows anyway, using the index would actually slow things
down.

You _might_ get better performance if you create an index on (a,c,b) which
would allow the query to run without ever needing to access the actual
table; but I'm just speculating.

In my experience, queries like these rarely benefit from filter indexes,
because most of the time involved is in the grouping and aggregate processing,
and the index does nothing to help with that. But, again, without EXPLAIN
output I'm only speculating.

--
Bill Moran <wmoran@potentialtech.com>


Re: [GENERAL] Aggregate query on large tables

From
Adrian Klaver
Date:
On 04/09/2017 08:05 AM, Job wrote:
> Hi,
>
> i have a table with about 400 millions of rows and i need to build some aggregate function for reporting.
> I noticed that query performances are slowing down, even though indexes are present.
>
> Query is simple (i make an example, my table is in italian language):
> select a,sum(b) from table where a=x and c=y group by a
>
> a is a varchar
> b is an integer
>
> x and y are two field i use for filter results.
>
> I tried to create different indexes to try to speed up performance:
> index1 (a)
> index2 (c)
> index3 (a,c)
>
> I noticed, with query planner, that the mulfi-field index is not used.
> Postgresql 9.6.1 still use scan without indexes.

At some point doing a scan over a table is better then using an index,
as index use incurs a lookup cost. Unfortunately there is no way to
determine what is actually happening in your case without more information.

So:

1) Table schema(language does not matter, the names are just tags).

2) The actual query run through EXPLAIN ANALYZE.

>
> I obtain significant improvements only if i create a materialized view with aggregated data.
>
> But i would like to avoid - if possible - creating (and mantaining) the materialized view.
>
> Than you!
> /F
>


--
Adrian Klaver
adrian.klaver@aklaver.com