optimizing query performance - Mailing list pgsql-performance

From Frits Hoogland
Subject optimizing query performance
Date
Msg-id fbb8fbcd0803310457u510f4309qfb151e6fe0e124c9@mail.gmail.com
Whole thread Raw
List pgsql-performance
Hi! I've got the following statement:

SELECT DISTINCT sub.os,
             COUNT(sub.os) as total
FROM (
            SELECT split_part(system.name, ' ', 1) as os
            FROM system, attacks
            WHERE 1 = 1
            AND  timestamp >= 1205708400
            AND timestamp <= 1206313200
            AND attacks.source = system.ip_addr
            AND NOT attacks.source IN (
                    SELECT exclusion
                    FROM org_excl
                    WHERE orgid=2
                     )
              ) as sub
             GROUP BY sub.os
             ORDER BY total DESC LIMIT 5

which has the following execution plan:

Limit  (cost=1831417.45..1831417.48 rows=5 width=34) (actual time=1599.915..1599.925 rows=3 loops=1)
  ->  Unique  (cost=1831417.45..1831417.75 rows=41 width=34) (actualtime=1599.912..1599.918 rows=3 loops=1)
        ->  Sort  (cost=1831417.45..1831417.55 rows=41 width=34) (actual time=1599.911..1599.913 rows=3 loops=1)
              Sort Key: count(split_part(("system".name)::text, ''::text, 1)), split_part(("system".name)::text, ' '::text, 1)
              ->  HashAggregate  (cost=1831415.63..1831416.35 rows=41 width=34) (actual time=1599.870..1599.876 rows=3 loops=1)
                    ->  Nested Loop  (cost=23.77..1829328.68 rows=417390 width=34) (actual time=0.075..1474.260 rows=75609 loops=1)
                          ->  Index Scan using index_attacks_timestamp on attacks  (cost=23.77..2454.92 rows=36300 width=11) (actual time=0.041..137.045 rows=72380 loops=1)
                                Index Cond: (("timestamp" >= 1205708400) AND ("timestamp" <= 1206313200))
                                Filter: (NOT (hashed subplan))
                                SubPlan
                                  ->  Seq Scan on org_excl (cost=0.00..23.75 rows=6 width=32) (actual time=0.014..0.014 rows=0 loops=1)
                                        Filter: (orgid = 2)
                          ->  Index Scan using ip_addr_name_index on "system"  (cost=0.00..50.15 rows=12 width=45) (actual time=0.009..0.012 rows=1 loops=72380)
                                Index Cond: ("outer".source = "system".ip_addr)

Total runtime: 1600.056 ms
the NL (nested loop) is accountable for most of the total query time. Is there any way to avoid the NL and/or speed up the query?

Thanks,

Frits

pgsql-performance by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Bad prepare performance
Next
From: James Mansion
Date:
Subject: POSIX file updates