Re: Better performance possible for a pathological query? - Mailing list pgsql-performance

From Alexis Lê-Quôc
Subject Re: Better performance possible for a pathological query?
Date
Msg-id CAAGz8TPfEP10twyxyWDkv1u2Gvhd9NE9EaXyM9x68-=1O2ip=A@mail.gmail.com
Whole thread Raw
In response to Re: Better performance possible for a pathological query?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Wed, Aug 7, 2013 at 12:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexis Lê-Quôc <alq@datadoghq.com> writes:
> The query itself is very simple: a primary key lookup on a 1.5x10^7 rows.
> The issue is that we are looking up over 11,000 primary keys at once,
> causing the db to consume a lot of CPU.

It looks like most of the runtime is probably going into checking the
c.key = ANY (ARRAY[...]) construct.  PG isn't especially smart about that
if it fails to optimize the construct into an index operation --- I think
it's just searching the array linearly for each row meeting the other
restrictions on c.

You could try writing the test like this:
    c.key = ANY (VALUES (1), (17), (42), ...)
to see if the sub-select code path gives better results than the array
code path.  In a quick check it looked like this might produce a hash
join, which seemed promising anyway.

                        regards, tom lane

Thank you very much Tom, your suggestion is spot on. Runtime decreased 100-fold, from 20s to 200ms with a simple search-and-replace.

Here's the updated plan for the record.

 Nested Loop  (cost=168.22..2116.29 rows=148 width=362) (actual time=22.134..256.531 rows=10858 loops=1)  
   Buffers: shared hit=44967  
   ->  Index Scan using x_pkey on x  (cost=0.00..8.27 rows=1 width=37) (actual time=0.071..0.073 rows=1 loops=1)
         Index Cond: (id = 1)
         Buffers: shared hit=4
   ->  Nested Loop  (cost=168.22..2106.54 rows=148 width=329) (actual time=22.060..242.406 rows=10858 loops=1)
         Buffers: shared hit=44963
         ->  HashAggregate  (cost=168.22..170.22 rows=200 width=4) (actual time=21.529..32.820 rows=11215 loops=1)
               ->  Values Scan on "*VALUES*"  (cost=0.00..140.19 rows=11215 width=4) (actual time=0.005..9.527 rows=11215 loops=1)
         ->  Index Scan using dim_context_pkey on dim_context c  (cost=0.00..9.67 rows=1 width=329) (actual time=0.015..0.016 rows=1 loops=11215)
               Index Cond: (c.key = "*VALUES*".column1)
               Filter: ((c.tags @> '{blah}'::text[]) AND (c.org_id = 1))
               Buffers: shared hit=44963
 Total runtime: 263.639 ms

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Better performance possible for a pathological query?
Next
From: Robert DiFalco
Date:
Subject: Efficiently query for the most recent record for a given user