Re: Low perfomance SUM and Group by large databse - Mailing list pgsql-performance

From Craig Ringer
Subject Re: Low perfomance SUM and Group by large databse
Date
Msg-id 4C1FE277.5030603@postnewspapers.com.au
Whole thread Raw
In response to Low perfomance SUM and Group by large databse  ("Sergio Charpinel Jr." <sergiocharpinel@gmail.com>)
List pgsql-performance
On 21/06/10 22:42, Sergio Charpinel Jr. wrote:
> Hi,
>
> I'm getting low performance on SUM and GROUP BY queries.
> How can I improve my database to perform such queries.

>                ->  Sort  (cost=3499863.27..3523695.89 rows=9533049
> width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
>                      Sort Key: ip_src, port_src, ip_dst, port_dst,
> tcp_flags, ip_proto
>                      ->  Seq Scan on acct_2010_25  (cost=0.00..352648.10
> rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1)
>                            Filter: ((stamp_inserted >= '2010-06-20
> 10:10:00'::timestamp without time zone) AND (stamp_inserted <
> '2010-06-21 10:10:00'::timestamp without time zone))

Provide an index on at least (ip_src,port_src,ip_dst,port_dst). If you
frequently do other queries that only want some of that information you
could create several individual indexes for those columns instead, as Pg
will combine them for a query, but that is much less efficient than an
index across all four columns.

CREATE INDEX ip_peers_idx ON acct_2010_25(ip_src,port_src,ip_dst_port_dst);

Every index added costs you insert/update/delete speed, so try to find
the smallest/simplest index that gives you acceptable performance.

> Another one just summing bytes (still low):
>
> => explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
> ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
> "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21
> 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto
> LIMIT 50 OFFSET 0;

Same deal. You have no suitable index, so Pg has to do a sequential scan
of the table. Since you appear to query on stamp_inserted a lot, you
should index it.

--
Craig Ringer

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: mysql to postgresql, performance questions
Next
From: Robert Haas
Date:
Subject: Re: PostgreSQL as a local in-memory cache