Thread: slow join not using index properly

slow join not using index properly

From
Stefan Amshey
Date:
We have a slow performing query that we are trying to improve, and it appears to be performing a sequential scan at a point where it should be utilizing an index. Can anyone tell me why postgres is opting to do it this way?

The original query is as follows:

SELECT DISTINCT
    a1.context_key
FROM
    virtual_ancestors a1, collection_data, virtual_ancestors a2
WHERE
    a1.ancestor_key =  collection_data.context_key
    AND collection_data.collection_context_key = a2.context_key
    AND a2.ancestor_key = ?

The key relationships should all using indexed columns, but the query plan that postgres comes up with ends up performing a sequential scan on the collection_data table (in this case about 602k rows) where we would have expected it to utilize the index:

 HashAggregate  (cost=60905.73..60935.73 rows=3000 width=4) (actual time=3366.165..3367.354 rows=3492 loops=1)
   Buffers: shared hit=16291 read=1222
   ->  Nested Loop  (cost=17546.26..60898.23 rows=3000 width=4) (actual time=438.332..3357.918 rows=13037 loops=1)
         Buffers: shared hit=16291 read=1222
         ->  Hash Join  (cost=17546.26..25100.94 rows=98 width=4) (actual time=408.554..415.767 rows=2092 loops=1)
               Hash Cond: (a2.context_key = collection_data.collection_context_key)
               Buffers: shared hit=4850 read=3
               ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a2  (cost=0.00..233.32 rows=270 width=4) (actual time=8.532..10.703 rows=1960 loops=1)
                     Index Cond: (ancestor_key = 1072173)
                     Heap Fetches: 896
                     Buffers: shared hit=859 read=3
               ->  Hash  (cost=10015.56..10015.56 rows=602456 width=8) (actual time=399.708..399.708 rows=602570 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 23538kB
                     Buffers: shared hit=3991
######## sequential scan occurs here ##########
                     ->  Seq Scan on collection_data  (cost=0.00..10015.56 rows=602456 width=8) (actual time=0.013..163.509 rows=602570 loops=1)
                           Buffers: shared hit=3991
         ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a1  (cost=0.00..360.70 rows=458 width=8) (actual time=1.339..1.403 rows=6 loops=2092)
               Index Cond: (ancestor_key = collection_data.context_key)
               Heap Fetches: 7067
               Buffers: shared hit=11441 read=1219
 Total runtime: 3373.058 ms


The table definitions are as follows:

  Table "public.virtual_ancestors"
    Column    |   Type   | Modifiers
--------------+----------+-----------
 ancestor_key | integer  | not null
 context_key  | integer  | not null
 degree       | smallint | not null
Indexes:
    "virtual_ancestors_pkey" PRIMARY KEY, btree (ancestor_key, context_key)
    "virtual_context_key_idx" btree (context_key)
Foreign-key constraints:
    "virtual_ancestors_ancestor_key_fkey" FOREIGN KEY (ancestor_key) REFERENCES contexts(context_key)
    "virtual_ancestors_context_key_fkey" FOREIGN KEY (context_key) REFERENCES contexts(context_key)

             Table "public.collection_data"
         Column              |         Type         | Modifiers
------------------------+----------------------+-----------
 collection_context_key | integer              | not null
 context_key                | integer              | not null
 type                            | character varying(1) | not null
 source                        | character varying(1) | not null
Indexes:
    "collection_data_context_key_idx" btree (context_key)
    "collection_data_context_key_index" btree (collection_context_key) CLUSTER
Foreign-key constraints:
    "collection_data_collection_context_key_fkey" FOREIGN KEY (collection_context_key) REFERENCES contexts(context_key) ON DELETE CASCADE
    "collection_data_context_key_fkey" FOREIGN KEY (context_key) REFERENCES contexts(context_key) ON DELETE CASCADE

Can anyone suggest a way that we can get postgres to use the collection_data_context_key_index properly? I thought that it might be related to the fact that collection_data_context_key_index is a CLUSTERED index, but we did some basic experimentation that seems to indicate otherwise, i.e. the bad plan persists despite re-clustering the index.

We are using PostgreSQL 9.2.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit

Interestingly, on an instance running PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit where I copied the 2 tables over to a temporary database, the plan comes out differently:

 HashAggregate  (cost=39692.03..39739.98 rows=4795 width=4) (actual time=73.285..75.141 rows=3486 loops=1)
   Buffers: shared hit=22458
   ->  Nested Loop  (cost=0.00..39680.05 rows=4795 width=4) (actual time=0.077..63.116 rows=13007 loops=1)
         Buffers: shared hit=22458
         ->  Nested Loop  (cost=0.00..32823.38 rows=164 width=4) (actual time=0.056..17.685 rows=2084 loops=1)
               Buffers: shared hit=7529
               ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a2  (cost=0.00..1220.85 rows=396 width=4) (actual time=0.025..2.732 rows=1954 loops=1)
                     Index Cond: (ancestor_key = 1072173)
                     Heap Fetches: 1954
                     Buffers: shared hit=1397
######## Note the index scan here - this is what it SHOULD be doing ##############
               ->  Index Scan using collection_data_context_key_index on collection_data  (cost=0.00..79.24 rows=56 width=8) (actual time=0.004..0.005 rows=1 loops=1954)
                     Index Cond: (collection_context_key = a2.context_key)
                     Buffers: shared hit=6132
         ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a1  (cost=0.00..35.40 rows=641 width=8) (actual time=0.007..0.015 rows=6 loops=2084)
               Index Cond: (ancestor_key = collection_data.context_key)
               Heap Fetches: 13007
               Buffers: shared hit=14929
 Total runtime: 76.431 ms

Why can't I get the Postgres 9.2.5 instance to use the optimal plan?

Thanks in advance!
     /Stefan

--
-
Stefan Amshey

Re: slow join not using index properly

From
Ilya Kosmodemiansky
Date:
Hi Stefan!

Probably you need to rewrite your query like this (check it first):

with RECURSIVE qq(cont_key, anc_key) as
(
select min(a1.context_key), ancestor_key from virtual_ancestors a1
 union select
  (SELECT
    a1.context_key, ancestor_key
  FROM
    virtual_ancestors a1 where context_key > cont_key order by
context_key limit 1) from qq where cont_key is not null
)
select a1.cont_key
 from qq a1, collection_data, virtual_ancestors a2
WHERE
    a1.anc_key =  collection_data.context_key
    AND collection_data.collection_context_key = a2.context_key
    AND a2.ancestor_key = ?

best regards,
Ilya

On Fri, Mar 21, 2014 at 12:56 AM, Stefan Amshey <sramshey@gmail.com> wrote:
> We have a slow performing query that we are trying to improve, and it
> appears to be performing a sequential scan at a point where it should be
> utilizing an index. Can anyone tell me why postgres is opting to do it this
> way?
>
> The original query is as follows:
>
> SELECT DISTINCT
>     a1.context_key
> FROM
>     virtual_ancestors a1, collection_data, virtual_ancestors a2
> WHERE
>     a1.ancestor_key =  collection_data.context_key
>     AND collection_data.collection_context_key = a2.context_key
>     AND a2.ancestor_key = ?
>
> The key relationships should all using indexed columns, but the query plan
> that postgres comes up with ends up performing a sequential scan on the
> collection_data table (in this case about 602k rows) where we would have
> expected it to utilize the index:
>
>  HashAggregate  (cost=60905.73..60935.73 rows=3000 width=4) (actual
> time=3366.165..3367.354 rows=3492 loops=1)
>    Buffers: shared hit=16291 read=1222
>    ->  Nested Loop  (cost=17546.26..60898.23 rows=3000 width=4) (actual
> time=438.332..3357.918 rows=13037 loops=1)
>          Buffers: shared hit=16291 read=1222
>          ->  Hash Join  (cost=17546.26..25100.94 rows=98 width=4) (actual
> time=408.554..415.767 rows=2092 loops=1)
>                Hash Cond: (a2.context_key =
> collection_data.collection_context_key)
>                Buffers: shared hit=4850 read=3
>                ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a2  (cost=0.00..233.32 rows=270 width=4) (actual
> time=8.532..10.703 rows=1960 loops=1)
>                      Index Cond: (ancestor_key = 1072173)
>                      Heap Fetches: 896
>                      Buffers: shared hit=859 read=3
>                ->  Hash  (cost=10015.56..10015.56 rows=602456 width=8)
> (actual time=399.708..399.708 rows=602570 loops=1)
>                      Buckets: 65536  Batches: 1  Memory Usage: 23538kB
>                      Buffers: shared hit=3991
> ######## sequential scan occurs here ##########
>                      ->  Seq Scan on collection_data  (cost=0.00..10015.56
> rows=602456 width=8) (actual time=0.013..163.509 rows=602570 loops=1)
>                            Buffers: shared hit=3991
>          ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a1  (cost=0.00..360.70 rows=458 width=8) (actual
> time=1.339..1.403 rows=6 loops=2092)
>                Index Cond: (ancestor_key = collection_data.context_key)
>                Heap Fetches: 7067
>                Buffers: shared hit=11441 read=1219
>  Total runtime: 3373.058 ms
>
>
> The table definitions are as follows:
>
>   Table "public.virtual_ancestors"
>     Column    |   Type   | Modifiers
> --------------+----------+-----------
>  ancestor_key | integer  | not null
>  context_key  | integer  | not null
>  degree       | smallint | not null
> Indexes:
>     "virtual_ancestors_pkey" PRIMARY KEY, btree (ancestor_key, context_key)
>     "virtual_context_key_idx" btree (context_key)
> Foreign-key constraints:
>     "virtual_ancestors_ancestor_key_fkey" FOREIGN KEY (ancestor_key)
> REFERENCES contexts(context_key)
>     "virtual_ancestors_context_key_fkey" FOREIGN KEY (context_key)
> REFERENCES contexts(context_key)
>
>              Table "public.collection_data"
>          Column              |         Type         | Modifiers
> ------------------------+----------------------+-----------
>  collection_context_key | integer              | not null
>  context_key                | integer              | not null
>  type                            | character varying(1) | not null
>  source                        | character varying(1) | not null
> Indexes:
>     "collection_data_context_key_idx" btree (context_key)
>     "collection_data_context_key_index" btree (collection_context_key)
> CLUSTER
> Foreign-key constraints:
>     "collection_data_collection_context_key_fkey" FOREIGN KEY
> (collection_context_key) REFERENCES contexts(context_key) ON DELETE CASCADE
>     "collection_data_context_key_fkey" FOREIGN KEY (context_key) REFERENCES
> contexts(context_key) ON DELETE CASCADE
>
> Can anyone suggest a way that we can get postgres to use the
> collection_data_context_key_index properly? I thought that it might be
> related to the fact that collection_data_context_key_index is a CLUSTERED
> index, but we did some basic experimentation that seems to indicate
> otherwise, i.e. the bad plan persists despite re-clustering the index.
>
> We are using PostgreSQL 9.2.5 on x86_64-unknown-linux-gnu, compiled by gcc
> (Debian 4.4.5-8) 4.4.5, 64-bit
>
> Interestingly, on an instance running PostgreSQL 9.2.4 on
> x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
> where I copied the 2 tables over to a temporary database, the plan comes out
> differently:
>
>  HashAggregate  (cost=39692.03..39739.98 rows=4795 width=4) (actual
> time=73.285..75.141 rows=3486 loops=1)
>    Buffers: shared hit=22458
>    ->  Nested Loop  (cost=0.00..39680.05 rows=4795 width=4) (actual
> time=0.077..63.116 rows=13007 loops=1)
>          Buffers: shared hit=22458
>          ->  Nested Loop  (cost=0.00..32823.38 rows=164 width=4) (actual
> time=0.056..17.685 rows=2084 loops=1)
>                Buffers: shared hit=7529
>                ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a2  (cost=0.00..1220.85 rows=396 width=4) (actual
> time=0.025..2.732 rows=1954 loops=1)
>                      Index Cond: (ancestor_key = 1072173)
>                      Heap Fetches: 1954
>                      Buffers: shared hit=1397
> ######## Note the index scan here - this is what it SHOULD be doing
> ##############
>                ->  Index Scan using collection_data_context_key_index on
> collection_data  (cost=0.00..79.24 rows=56 width=8) (actual
> time=0.004..0.005 rows=1 loops=1954)
>                      Index Cond: (collection_context_key = a2.context_key)
>                      Buffers: shared hit=6132
>          ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a1  (cost=0.00..35.40 rows=641 width=8) (actual
> time=0.007..0.015 rows=6 loops=2084)
>                Index Cond: (ancestor_key = collection_data.context_key)
>                Heap Fetches: 13007
>                Buffers: shared hit=14929
>  Total runtime: 76.431 ms
>
> Why can't I get the Postgres 9.2.5 instance to use the optimal plan?
>
> Thanks in advance!
>      /Stefan
>
> --
> -
> Stefan Amshey



--
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com


Re: slow join not using index properly

From
Vincent
Date:
On 21-03-14 00:56, Stefan Amshey wrote:
We have a slow performing query that we are trying to improve, and it appears to be performing a sequential scan at a point where it should be utilizing an index. Can anyone tell me why postgres is opting to do it this way?

The original query is as follows:

SELECT DISTINCT
    a1.context_key
FROM
    virtual_ancestors a1, collection_data, virtual_ancestors a2
WHERE
    a1.ancestor_key =  collection_data.context_key
    AND collection_data.collection_context_key = a2.context_key
    AND a2.ancestor_key = ?

The key relationships should all using indexed columns, but the query plan that postgres comes up with ends up performing a sequential scan on the collection_data table (in this case about 602k rows) where we would have expected it to utilize the index:

 HashAggregate  (cost=60905.73..60935.73 rows=3000 width=4) (actual time=3366.165..3367.354 rows=3492 loops=1)
   Buffers: shared hit=16291 read=1222
   ->  Nested Loop  (cost=17546.26..60898.23 rows=3000 width=4) (actual time=438.332..3357.918 rows=13037 loops=1)
         Buffers: shared hit=16291 read=1222
         ->  Hash Join  (cost=17546.26..25100.94 rows=98 width=4) (actual time=408.554..415.767 rows=2092 loops=1)
               Hash Cond: (a2.context_key = collection_data.collection_context_key)
               Buffers: shared hit=4850 read=3
               ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a2  (cost=0.00..233.32 rows=270 width=4) (actual time=8.532..10.703 rows=1960 loops=1)
                     Index Cond: (ancestor_key = 1072173)
                     Heap Fetches: 896
                     Buffers: shared hit=859 read=3
               ->  Hash  (cost=10015.56..10015.56 rows=602456 width=8) (actual time=399.708..399.708 rows=602570 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 23538kB
                     Buffers: shared hit=3991
######## sequential scan occurs here ##########
                     ->  Seq Scan on collection_data  (cost=0.00..10015.56 rows=602456 width=8) (actual time=0.013..163.509 rows=602570 loops=1)
                           Buffers: shared hit=3991
         ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a1  (cost=0.00..360.70 rows=458 width=8) (actual time=1.339..1.403 rows=6 loops=2092)
               Index Cond: (ancestor_key = collection_data.context_key)
               Heap Fetches: 7067
               Buffers: shared hit=11441 read=1219
 Total runtime: 3373.058 ms


The table definitions are as follows:

  Table "public.virtual_ancestors"
    Column    |   Type   | Modifiers
--------------+----------+-----------
 ancestor_key | integer  | not null
 context_key  | integer  | not null
 degree       | smallint | not null
Indexes:
    "virtual_ancestors_pkey" PRIMARY KEY, btree (ancestor_key, context_key)
    "virtual_context_key_idx" btree (context_key)
Foreign-key constraints:
    "virtual_ancestors_ancestor_key_fkey" FOREIGN KEY (ancestor_key) REFERENCES contexts(context_key)
    "virtual_ancestors_context_key_fkey" FOREIGN KEY (context_key) REFERENCES contexts(context_key)

             Table "public.collection_data"
         Column              |         Type         | Modifiers
------------------------+----------------------+-----------
 collection_context_key | integer              | not null
 context_key                | integer              | not null
 type                            | character varying(1) | not null
 source                        | character varying(1) | not null
Indexes:
    "collection_data_context_key_idx" btree (context_key)
    "collection_data_context_key_index" btree (collection_context_key) CLUSTER
Foreign-key constraints:
    "collection_data_collection_context_key_fkey" FOREIGN KEY (collection_context_key) REFERENCES contexts(context_key) ON DELETE CASCADE
    "collection_data_context_key_fkey" FOREIGN KEY (context_key) REFERENCES contexts(context_key) ON DELETE CASCADE

Can anyone suggest a way that we can get postgres to use the collection_data_context_key_index properly? I thought that it might be related to the fact that collection_data_context_key_index is a CLUSTERED index, but we did some basic experimentation that seems to indicate otherwise, i.e. the bad plan persists despite re-clustering the index.

We are using PostgreSQL 9.2.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit

Interestingly, on an instance running PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit where I copied the 2 tables over to a temporary database, the plan comes out differently:

 HashAggregate  (cost=39692.03..39739.98 rows=4795 width=4) (actual time=73.285..75.141 rows=3486 loops=1)
   Buffers: shared hit=22458
   ->  Nested Loop  (cost=0.00..39680.05 rows=4795 width=4) (actual time=0.077..63.116 rows=13007 loops=1)
         Buffers: shared hit=22458
         ->  Nested Loop  (cost=0.00..32823.38 rows=164 width=4) (actual time=0.056..17.685 rows=2084 loops=1)
               Buffers: shared hit=7529
               ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a2  (cost=0.00..1220.85 rows=396 width=4) (actual time=0.025..2.732 rows=1954 loops=1)
                     Index Cond: (ancestor_key = 1072173)
                     Heap Fetches: 1954
                     Buffers: shared hit=1397
######## Note the index scan here - this is what it SHOULD be doing ##############
               ->  Index Scan using collection_data_context_key_index on collection_data  (cost=0.00..79.24 rows=56 width=8) (actual time=0.004..0.005 rows=1 loops=1954)
                     Index Cond: (collection_context_key = a2.context_key)
                     Buffers: shared hit=6132
         ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a1  (cost=0.00..35.40 rows=641 width=8) (actual time=0.007..0.015 rows=6 loops=2084)
               Index Cond: (ancestor_key = collection_data.context_key)
               Heap Fetches: 13007
               Buffers: shared hit=14929
 Total runtime: 76.431 ms

Why can't I get the Postgres 9.2.5 instance to use the optimal plan?

Thanks in advance!
     /Stefan

--
-
Stefan Amshey
The first plan expects to process 600000 rows in the sequential scan, and the second plan expects to process only one,
so it looks like the statistics in the first database are out of date, did you run vacuum analyse?

Re: slow join not using index properly

From
Stefan Amshey
Date:
Hi Ilya-

Thanks so much for taking a stab at optimizing that query.  I had to fiddle a bit with your proposed version in order to get it function. Here's what I came up with in the end:

with RECURSIVE qq(cont_key, anc_key) AS
  (
    SELECT
      a1.context_key, ancestor_key
    FROM
      virtual_ancestors a1
    UNION (
      SELECT
        a1.context_key, a1.ancestor_key
      FROM
        virtual_ancestors a1, qq
      WHERE
        context_key > qq.cont_key
      ORDER BY
        context_key LIMIT 1
    )
  )
  SELECT
    distinct a.cont_key
  FROM
    qq a, collection_data, virtual_ancestors a2
  WHERE
    a.cont_key IS NOT NULL
    AND a.anc_key = collection_data.context_key
    AND collection_data.collection_context_key = a2.context_key
    AND a2.ancestor_key = 1072173;

I had to drop the MIN( a1.context_key ) and LIMIT 1 from your version off of the first select statement in order to avoid syntax issues or other errors. The version above does produce the same counts as the original, but in the end it wasn't really a win for us. Here's the plan it produced:

HashAggregate  (cost=707724.36..707726.36 rows=200 width=4) (actual time=27638.844..27639.706 rows=3522 loops=1)
   Buffers: shared hit=79323, temp read=49378 written=47557
   CTE qq
     ->  Recursive Union  (cost=0.00..398869.78 rows=10814203 width=8) (actual time=0.018..20196.397 rows=10821685 loops=1)
           Buffers: shared hit=74449, temp read=49378 written=23779
           ->  Seq Scan on virtual_ancestors a1  (cost=0.00..182584.93 rows=10814193 width=8) (actual time=0.010..2585.411 rows=10821685 loops=1)
                 Buffers: shared hit=74443
           ->  Limit  (cost=0.00..0.08 rows=1 width=8) (actual time=7973.297..7973.298 rows=1 loops=1)
                 Buffers: shared hit=6, temp read=49378 written=1
                 ->  Nested Loop  (cost=0.00..30881281719119.79 rows=389822567470830 width=8) (actual time=7973.296..7973.296 rows=1 loops=1)
                       Join Filter: (a1.context_key > qq.cont_key)
                       Rows Removed by Join Filter: 22470607
                       Buffers: shared hit=6, temp read=49378 written=1
                       ->  Index Scan using virtual_context_key_idx on virtual_ancestors a1  (cost=0.00..18206859.46 rows=10814193 width=8) (actual time=0.018..0.036 rows=3 loops=1)
                             Buffers: shared hit=6
                       ->  WorkTable Scan on qq  (cost=0.00..2162838.60 rows=108141930 width=4) (actual time=0.008..1375.445 rows=7490203 loops=3)
                             Buffers: temp read=49378 written=1
   ->  Hash Join  (cost=25283.37..308847.31 rows=2905 width=4) (actual time=449.167..27629.759 rows=13152 loops=1)
         Hash Cond: (a.anc_key = collection_data.context_key)
         Buffers: shared hit=79323, temp read=49378 written=47557
         ->  CTE Scan on qq a  (cost=0.00..216284.06 rows=10760132 width=8) (actual time=0.021..25265.179 rows=10821685 loops=1)
               Filter: (cont_key IS NOT NULL)
               Buffers: shared hit=74449, temp read=49378 written=47557
         ->  Hash  (cost=25282.14..25282.14 rows=98 width=4) (actual time=373.836..373.836 rows=2109 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 75kB
               Buffers: shared hit=4874
               ->  Hash Join  (cost=17557.15..25282.14 rows=98 width=4) (actual time=368.374..373.013 rows=2109 loops=1)
                     Hash Cond: (a2.context_key = collection_data.collection_context_key)
                     Buffers: shared hit=4874
                     ->  Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a2  (cost=0.00..238.57 rows=272 width=4) (actual time=0.029..1.989 rows=1976 loops=1)
                           Index Cond: (ancestor_key = 1072173)
                           Heap Fetches: 917
                           Buffers: shared hit=883
                     ->  Hash  (cost=10020.40..10020.40 rows=602940 width=8) (actual time=368.057..368.057 rows=603066 loops=1)
                           Buckets: 65536  Batches: 1  Memory Usage: 23558kB
                           Buffers: shared hit=3991
                           ->  Seq Scan on collection_data  (cost=0.00..10020.40 rows=602940 width=8) (actual time=0.006..146.447 rows=603066 loops=1)
                                 Buffers: shared hit=3991
 Total runtime: 27854.200 ms

I also tried including the MIN( a1.context_key ) in the first select statement as you had written it, but upon doing that it became necessary to add a GROUP BY clause, and doing that changed the final number of rows selected:
ERROR:  column "a1.ancestor_key" must appear in the GROUP BY clause or be used in an aggregate function
LINE 4:       min( a1.context_key ), ancestor_key
                                     ^

Including the LIMIT 1 at the end of the first select statement gave a syntax error that I couldn't seem to get past, so I think it might be invalid:
ERROR:  syntax error at or near "UNION"
LINE 8:     UNION (
            ^

So I landed on the version that I posted above, which seems to select the same set in all of the cases that I tried.

Anyway, thanks again for taking a stab at helping, I do appreciate it. If you have any other ideas that might be of help I'd certainly be happy to hear them.

Take care,
     /Stefan
 



On Thu, Mar 20, 2014 at 11:02 PM, Ilya Kosmodemiansky <ilya.kosmodemiansky@postgresql-consulting.com> wrote:
Hi Stefan!

Probably you need to rewrite your query like this (check it first):

with RECURSIVE qq(cont_key, anc_key) as
(
select min(a1.context_key), ancestor_key from virtual_ancestors a1
 union select
  (SELECT
    a1.context_key, ancestor_key
  FROM
    virtual_ancestors a1 where context_key > cont_key order by
context_key limit 1) from qq where cont_key is not null
)
select a1.cont_key
 from qq a1, collection_data, virtual_ancestors a2
WHERE
    a1.anc_key =  collection_data.context_key
    AND collection_data.collection_context_key = a2.context_key
    AND a2.ancestor_key = ?

best regards,
Ilya

On Fri, Mar 21, 2014 at 12:56 AM, Stefan Amshey <sramshey@gmail.com> wrote:
> We have a slow performing query that we are trying to improve, and it
> appears to be performing a sequential scan at a point where it should be
> utilizing an index. Can anyone tell me why postgres is opting to do it this
> way?
>
> The original query is as follows:
>
> SELECT DISTINCT
>     a1.context_key
> FROM
>     virtual_ancestors a1, collection_data, virtual_ancestors a2
> WHERE
>     a1.ancestor_key =  collection_data.context_key
>     AND collection_data.collection_context_key = a2.context_key
>     AND a2.ancestor_key = ?
>
> The key relationships should all using indexed columns, but the query plan
> that postgres comes up with ends up performing a sequential scan on the
> collection_data table (in this case about 602k rows) where we would have
> expected it to utilize the index:
>
>  HashAggregate  (cost=60905.73..60935.73 rows=3000 width=4) (actual
> time=3366.165..3367.354 rows=3492 loops=1)
>    Buffers: shared hit=16291 read=1222
>    ->  Nested Loop  (cost=17546.26..60898.23 rows=3000 width=4) (actual
> time=438.332..3357.918 rows=13037 loops=1)
>          Buffers: shared hit=16291 read=1222
>          ->  Hash Join  (cost=17546.26..25100.94 rows=98 width=4) (actual
> time=408.554..415.767 rows=2092 loops=1)
>                Hash Cond: (a2.context_key =
> collection_data.collection_context_key)
>                Buffers: shared hit=4850 read=3
>                ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a2  (cost=0.00..233.32 rows=270 width=4) (actual
> time=8.532..10.703 rows=1960 loops=1)
>                      Index Cond: (ancestor_key = 1072173)
>                      Heap Fetches: 896
>                      Buffers: shared hit=859 read=3
>                ->  Hash  (cost=10015.56..10015.56 rows=602456 width=8)
> (actual time=399.708..399.708 rows=602570 loops=1)
>                      Buckets: 65536  Batches: 1  Memory Usage: 23538kB
>                      Buffers: shared hit=3991
> ######## sequential scan occurs here ##########
>                      ->  Seq Scan on collection_data  (cost=0.00..10015.56
> rows=602456 width=8) (actual time=0.013..163.509 rows=602570 loops=1)
>                            Buffers: shared hit=3991
>          ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a1  (cost=0.00..360.70 rows=458 width=8) (actual
> time=1.339..1.403 rows=6 loops=2092)
>                Index Cond: (ancestor_key = collection_data.context_key)
>                Heap Fetches: 7067
>                Buffers: shared hit=11441 read=1219
>  Total runtime: 3373.058 ms
>
>
> The table definitions are as follows:
>
>   Table "public.virtual_ancestors"
>     Column    |   Type   | Modifiers
> --------------+----------+-----------
>  ancestor_key | integer  | not null
>  context_key  | integer  | not null
>  degree       | smallint | not null
> Indexes:
>     "virtual_ancestors_pkey" PRIMARY KEY, btree (ancestor_key, context_key)
>     "virtual_context_key_idx" btree (context_key)
> Foreign-key constraints:
>     "virtual_ancestors_ancestor_key_fkey" FOREIGN KEY (ancestor_key)
> REFERENCES contexts(context_key)
>     "virtual_ancestors_context_key_fkey" FOREIGN KEY (context_key)
> REFERENCES contexts(context_key)
>
>              Table "public.collection_data"
>          Column              |         Type         | Modifiers
> ------------------------+----------------------+-----------
>  collection_context_key | integer              | not null
>  context_key                | integer              | not null
>  type                            | character varying(1) | not null
>  source                        | character varying(1) | not null
> Indexes:
>     "collection_data_context_key_idx" btree (context_key)
>     "collection_data_context_key_index" btree (collection_context_key)
> CLUSTER
> Foreign-key constraints:
>     "collection_data_collection_context_key_fkey" FOREIGN KEY
> (collection_context_key) REFERENCES contexts(context_key) ON DELETE CASCADE
>     "collection_data_context_key_fkey" FOREIGN KEY (context_key) REFERENCES
> contexts(context_key) ON DELETE CASCADE
>
> Can anyone suggest a way that we can get postgres to use the
> collection_data_context_key_index properly? I thought that it might be
> related to the fact that collection_data_context_key_index is a CLUSTERED
> index, but we did some basic experimentation that seems to indicate
> otherwise, i.e. the bad plan persists despite re-clustering the index.
>
> We are using PostgreSQL 9.2.5 on x86_64-unknown-linux-gnu, compiled by gcc
> (Debian 4.4.5-8) 4.4.5, 64-bit
>
> Interestingly, on an instance running PostgreSQL 9.2.4 on
> x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
> where I copied the 2 tables over to a temporary database, the plan comes out
> differently:
>
>  HashAggregate  (cost=39692.03..39739.98 rows=4795 width=4) (actual
> time=73.285..75.141 rows=3486 loops=1)
>    Buffers: shared hit=22458
>    ->  Nested Loop  (cost=0.00..39680.05 rows=4795 width=4) (actual
> time=0.077..63.116 rows=13007 loops=1)
>          Buffers: shared hit=22458
>          ->  Nested Loop  (cost=0.00..32823.38 rows=164 width=4) (actual
> time=0.056..17.685 rows=2084 loops=1)
>                Buffers: shared hit=7529
>                ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a2  (cost=0.00..1220.85 rows=396 width=4) (actual
> time=0.025..2.732 rows=1954 loops=1)
>                      Index Cond: (ancestor_key = 1072173)
>                      Heap Fetches: 1954
>                      Buffers: shared hit=1397
> ######## Note the index scan here - this is what it SHOULD be doing
> ##############
>                ->  Index Scan using collection_data_context_key_index on
> collection_data  (cost=0.00..79.24 rows=56 width=8) (actual
> time=0.004..0.005 rows=1 loops=1954)
>                      Index Cond: (collection_context_key = a2.context_key)
>                      Buffers: shared hit=6132
>          ->  Index Only Scan using virtual_ancestors_pkey on
> virtual_ancestors a1  (cost=0.00..35.40 rows=641 width=8) (actual
> time=0.007..0.015 rows=6 loops=2084)
>                Index Cond: (ancestor_key = collection_data.context_key)
>                Heap Fetches: 13007
>                Buffers: shared hit=14929
>  Total runtime: 76.431 ms
>
> Why can't I get the Postgres 9.2.5 instance to use the optimal plan?
>
> Thanks in advance!
>      /Stefan
>
> --
> -
> Stefan Amshey



--
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



--
-
Stefan Amshey

Re: slow join not using index properly

From
Ilya Kosmodemiansky
Date:
Hi Stefan,

stupid me - Ive missed some

with RECURSIVE qq(cont_key, anc_key) AS
  (

    SELECT
      a1.context_key, ancestor_key
    FROM
      virtual_ancestors a1
    UNION select ( -- here, in the union
      SELECT
        a1.context_key, a1.ancestor_key
      FROM
        virtual_ancestors a1
      WHERE
        a1.context_key > cont_key
      ORDER BY
        a1.context_key LIMIT 1
    )
from qq where cont_key is not null -- and here
  )
  SELECT
    distinct a.cont_key
  FROM
    qq a, collection_data, virtual_ancestors a2
  WHERE
    a.cont_key IS NOT NULL
    AND a.anc_key = collection_data.context_key
    AND collection_data.collection_
>
> context_key = a2.context_key
>     AND a2.ancestor_key = 1072173;

sorry for disorientating

On Mon, Mar 24, 2014 at 7:40 PM, Stefan Amshey <sramshey@gmail.com> wrote:
> Hi Ilya-
>
> Thanks so much for taking a stab at optimizing that query.  I had to fiddle
> a bit with your proposed version in order to get it function. Here's what I
> came up with in the end:
>>
>> with RECURSIVE qq(cont_key, anc_key) AS
>>   (
>>
>>     SELECT
>>       a1.context_key, ancestor_key
>>     FROM
>>       virtual_ancestors a1
>>     UNION (
>>       SELECT
>>         a1.context_key, a1.ancestor_key
>>       FROM
>>         virtual_ancestors a1, qq
>>       WHERE
>>         context_key > qq.cont_key
>>       ORDER BY
>>         context_key LIMIT 1
>>     )
>>   )
>>   SELECT
>>     distinct a.cont_key
>>   FROM
>>     qq a, collection_data, virtual_ancestors a2
>>   WHERE
>>     a.cont_key IS NOT NULL
>>     AND a.anc_key = collection_data.context_key
>>     AND collection_data.collection_context_key = a2.context_key
>>     AND a2.ancestor_key = 1072173;
>
>
> I had to drop the MIN( a1.context_key ) and LIMIT 1 from your version off of
> the first select statement in order to avoid syntax issues or other errors.
> The version above does produce the same counts as the original, but in the
> end it wasn't really a win for us. Here's the plan it produced:
>
>> HashAggregate  (cost=707724.36..707726.36 rows=200 width=4) (actual
>> time=27638.844..27639.706 rows=3522 loops=1)
>>    Buffers: shared hit=79323, temp read=49378 written=47557
>>    CTE qq
>>      ->  Recursive Union  (cost=0.00..398869.78 rows=10814203 width=8)
>> (actual time=0.018..20196.397 rows=10821685 loops=1)
>>            Buffers: shared hit=74449, temp read=49378 written=23779
>>            ->  Seq Scan on virtual_ancestors a1  (cost=0.00..182584.93
>> rows=10814193 width=8) (actual time=0.010..2585.411 rows=10821685 loops=1)
>>                  Buffers: shared hit=74443
>>            ->  Limit  (cost=0.00..0.08 rows=1 width=8) (actual
>> time=7973.297..7973.298 rows=1 loops=1)
>>                  Buffers: shared hit=6, temp read=49378 written=1
>>                  ->  Nested Loop  (cost=0.00..30881281719119.79
>> rows=389822567470830 width=8) (actual time=7973.296..7973.296 rows=1
>> loops=1)
>>                        Join Filter: (a1.context_key > qq.cont_key)
>>                        Rows Removed by Join Filter: 22470607
>>                        Buffers: shared hit=6, temp read=49378 written=1
>>                        ->  Index Scan using virtual_context_key_idx on
>> virtual_ancestors a1  (cost=0.00..18206859.46 rows=10814193 width=8) (actual
>> time=0.018..0.036 rows=3 loops=1)
>>                              Buffers: shared hit=6
>>                        ->  WorkTable Scan on qq  (cost=0.00..2162838.60
>> rows=108141930 width=4) (actual time=0.008..1375.445 rows=7490203 loops=3)
>>                              Buffers: temp read=49378 written=1
>>    ->  Hash Join  (cost=25283.37..308847.31 rows=2905 width=4) (actual
>> time=449.167..27629.759 rows=13152 loops=1)
>>          Hash Cond: (a.anc_key = collection_data.context_key)
>>          Buffers: shared hit=79323, temp read=49378 written=47557
>>          ->  CTE Scan on qq a  (cost=0.00..216284.06 rows=10760132
>> width=8) (actual time=0.021..25265.179 rows=10821685 loops=1)
>>                Filter: (cont_key IS NOT NULL)
>>                Buffers: shared hit=74449, temp read=49378 written=47557
>>          ->  Hash  (cost=25282.14..25282.14 rows=98 width=4) (actual
>> time=373.836..373.836 rows=2109 loops=1)
>>                Buckets: 1024  Batches: 1  Memory Usage: 75kB
>>                Buffers: shared hit=4874
>>                ->  Hash Join  (cost=17557.15..25282.14 rows=98 width=4)
>> (actual time=368.374..373.013 rows=2109 loops=1)
>>
>>                      Hash Cond: (a2.context_key =
>> collection_data.collection_context_key)
>>                      Buffers: shared hit=4874
>>                      ->  Index Only Scan using virtual_ancestors_pkey on
>> virtual_ancestors a2  (cost=0.00..238.57 rows=272 width=4) (actual
>> time=0.029..1.989 rows=1976 loops=1)
>>
>>                            Index Cond: (ancestor_key = 1072173)
>>                            Heap Fetches: 917
>>                            Buffers: shared hit=883
>>                      ->  Hash  (cost=10020.40..10020.40 rows=602940
>> width=8) (actual time=368.057..368.057 rows=603066 loops=1)
>>                            Buckets: 65536  Batches: 1  Memory Usage:
>> 23558kB
>>                            Buffers: shared hit=3991
>>                            ->  Seq Scan on collection_data
>> (cost=0.00..10020.40 rows=602940 width=8) (actual time=0.006..146.447
>> rows=603066 loops=1)
>>                                  Buffers: shared hit=3991
>>  Total runtime: 27854.200 ms
>
>
> I also tried including the MIN( a1.context_key ) in the first select
> statement as you had written it, but upon doing that it became necessary to
> add a GROUP BY clause, and doing that changed the final number of rows
> selected:
>>
>> ERROR:  column "a1.ancestor_key" must appear in the GROUP BY clause or be
>> used in an aggregate function
>> LINE 4:       min( a1.context_key ), ancestor_key
>
>                                      ^
>
> Including the LIMIT 1 at the end of the first select statement gave a syntax
> error that I couldn't seem to get past, so I think it might be invalid:
>>
>> ERROR:  syntax error at or near "UNION"
>> LINE 8:     UNION (
>>             ^
>
>
> So I landed on the version that I posted above, which seems to select the
> same set in all of the cases that I tried.
>
> Anyway, thanks again for taking a stab at helping, I do appreciate it. If
> you have any other ideas that might be of help I'd certainly be happy to
> hear them.
>
> Take care,
>      /Stefan
>
>
>
>
> On Thu, Mar 20, 2014 at 11:02 PM, Ilya Kosmodemiansky
> <ilya.kosmodemiansky@postgresql-consulting.com> wrote:
>>
>> Hi Stefan!
>>
>> Probably you need to rewrite your query like this (check it first):
>>
>> with RECURSIVE qq(cont_key, anc_key) as
>> (
>> select min(a1.context_key), ancestor_key from virtual_ancestors a1
>>  union select
>>   (SELECT
>>     a1.context_key, ancestor_key
>>   FROM
>>     virtual_ancestors a1 where context_key > cont_key order by
>> context_key limit 1) from qq where cont_key is not null
>> )
>> select a1.cont_key
>>  from qq a1, collection_data, virtual_ancestors a2
>> WHERE
>>     a1.anc_key =  collection_data.context_key
>>     AND collection_data.collection_context_key = a2.context_key
>>     AND a2.ancestor_key = ?
>>
>> best regards,
>> Ilya
>>
>> On Fri, Mar 21, 2014 at 12:56 AM, Stefan Amshey <sramshey@gmail.com>
>> wrote:
>> > We have a slow performing query that we are trying to improve, and it
>> > appears to be performing a sequential scan at a point where it should be
>> > utilizing an index. Can anyone tell me why postgres is opting to do it
>> > this
>> > way?
>> >
>> > The original query is as follows:
>> >
>> > SELECT DISTINCT
>> >     a1.context_key
>> > FROM
>> >     virtual_ancestors a1, collection_data, virtual_ancestors a2
>> > WHERE
>> >     a1.ancestor_key =  collection_data.context_key
>> >     AND collection_data.collection_context_key = a2.context_key
>> >     AND a2.ancestor_key = ?
>> >
>> > The key relationships should all using indexed columns, but the query
>> > plan
>> > that postgres comes up with ends up performing a sequential scan on the
>> > collection_data table (in this case about 602k rows) where we would have
>> > expected it to utilize the index:
>> >
>> >  HashAggregate  (cost=60905.73..60935.73 rows=3000 width=4) (actual
>> > time=3366.165..3367.354 rows=3492 loops=1)
>> >    Buffers: shared hit=16291 read=1222
>> >    ->  Nested Loop  (cost=17546.26..60898.23 rows=3000 width=4) (actual
>> > time=438.332..3357.918 rows=13037 loops=1)
>> >          Buffers: shared hit=16291 read=1222
>> >          ->  Hash Join  (cost=17546.26..25100.94 rows=98 width=4)
>> > (actual
>> > time=408.554..415.767 rows=2092 loops=1)
>> >                Hash Cond: (a2.context_key =
>> > collection_data.collection_context_key)
>> >                Buffers: shared hit=4850 read=3
>> >                ->  Index Only Scan using virtual_ancestors_pkey on
>> > virtual_ancestors a2  (cost=0.00..233.32 rows=270 width=4) (actual
>> > time=8.532..10.703 rows=1960 loops=1)
>> >                      Index Cond: (ancestor_key = 1072173)
>> >                      Heap Fetches: 896
>> >                      Buffers: shared hit=859 read=3
>> >                ->  Hash  (cost=10015.56..10015.56 rows=602456 width=8)
>> > (actual time=399.708..399.708 rows=602570 loops=1)
>> >                      Buckets: 65536  Batches: 1  Memory Usage: 23538kB
>> >                      Buffers: shared hit=3991
>> > ######## sequential scan occurs here ##########
>> >                      ->  Seq Scan on collection_data
>> > (cost=0.00..10015.56
>> > rows=602456 width=8) (actual time=0.013..163.509 rows=602570 loops=1)
>> >                            Buffers: shared hit=3991
>> >          ->  Index Only Scan using virtual_ancestors_pkey on
>> > virtual_ancestors a1  (cost=0.00..360.70 rows=458 width=8) (actual
>> > time=1.339..1.403 rows=6 loops=2092)
>> >                Index Cond: (ancestor_key = collection_data.context_key)
>> >                Heap Fetches: 7067
>> >                Buffers: shared hit=11441 read=1219
>> >  Total runtime: 3373.058 ms
>> >
>> >
>> > The table definitions are as follows:
>> >
>> >   Table "public.virtual_ancestors"
>> >     Column    |   Type   | Modifiers
>> > --------------+----------+-----------
>> >  ancestor_key | integer  | not null
>> >  context_key  | integer  | not null
>> >  degree       | smallint | not null
>> > Indexes:
>> >     "virtual_ancestors_pkey" PRIMARY KEY, btree (ancestor_key,
>> > context_key)
>> >     "virtual_context_key_idx" btree (context_key)
>> > Foreign-key constraints:
>> >     "virtual_ancestors_ancestor_key_fkey" FOREIGN KEY (ancestor_key)
>> > REFERENCES contexts(context_key)
>> >     "virtual_ancestors_context_key_fkey" FOREIGN KEY (context_key)
>> > REFERENCES contexts(context_key)
>> >
>> >              Table "public.collection_data"
>> >          Column              |         Type         | Modifiers
>> > ------------------------+----------------------+-----------
>> >  collection_context_key | integer              | not null
>> >  context_key                | integer              | not null
>> >  type                            | character varying(1) | not null
>> >  source                        | character varying(1) | not null
>> > Indexes:
>> >     "collection_data_context_key_idx" btree (context_key)
>> >     "collection_data_context_key_index" btree (collection_context_key)
>> > CLUSTER
>> > Foreign-key constraints:
>> >     "collection_data_collection_context_key_fkey" FOREIGN KEY
>> > (collection_context_key) REFERENCES contexts(context_key) ON DELETE
>> > CASCADE
>> >     "collection_data_context_key_fkey" FOREIGN KEY (context_key)
>> > REFERENCES
>> > contexts(context_key) ON DELETE CASCADE
>> >
>> > Can anyone suggest a way that we can get postgres to use the
>> > collection_data_context_key_index properly? I thought that it might be
>> > related to the fact that collection_data_context_key_index is a
>> > CLUSTERED
>> > index, but we did some basic experimentation that seems to indicate
>> > otherwise, i.e. the bad plan persists despite re-clustering the index.
>> >
>> > We are using PostgreSQL 9.2.5 on x86_64-unknown-linux-gnu, compiled by
>> > gcc
>> > (Debian 4.4.5-8) 4.4.5, 64-bit
>> >
>> > Interestingly, on an instance running PostgreSQL 9.2.4 on
>> > x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
>> > where I copied the 2 tables over to a temporary database, the plan comes
>> > out
>> > differently:
>> >
>> >  HashAggregate  (cost=39692.03..39739.98 rows=4795 width=4) (actual
>> > time=73.285..75.141 rows=3486 loops=1)
>> >    Buffers: shared hit=22458
>> >    ->  Nested Loop  (cost=0.00..39680.05 rows=4795 width=4) (actual
>> > time=0.077..63.116 rows=13007 loops=1)
>> >          Buffers: shared hit=22458
>> >          ->  Nested Loop  (cost=0.00..32823.38 rows=164 width=4) (actual
>> > time=0.056..17.685 rows=2084 loops=1)
>> >                Buffers: shared hit=7529
>> >                ->  Index Only Scan using virtual_ancestors_pkey on
>> > virtual_ancestors a2  (cost=0.00..1220.85 rows=396 width=4) (actual
>> > time=0.025..2.732 rows=1954 loops=1)
>> >                      Index Cond: (ancestor_key = 1072173)
>> >                      Heap Fetches: 1954
>> >                      Buffers: shared hit=1397
>> > ######## Note the index scan here - this is what it SHOULD be doing
>> > ##############
>> >                ->  Index Scan using collection_data_context_key_index on
>> > collection_data  (cost=0.00..79.24 rows=56 width=8) (actual
>> > time=0.004..0.005 rows=1 loops=1954)
>> >                      Index Cond: (collection_context_key =
>> > a2.context_key)
>> >                      Buffers: shared hit=6132
>> >          ->  Index Only Scan using virtual_ancestors_pkey on
>> > virtual_ancestors a1  (cost=0.00..35.40 rows=641 width=8) (actual
>> > time=0.007..0.015 rows=6 loops=2084)
>> >                Index Cond: (ancestor_key = collection_data.context_key)
>> >                Heap Fetches: 13007
>> >                Buffers: shared hit=14929
>> >  Total runtime: 76.431 ms
>> >
>> > Why can't I get the Postgres 9.2.5 instance to use the optimal plan?
>> >
>> > Thanks in advance!
>> >      /Stefan
>> >
>> > --
>> > -
>> > Stefan Amshey
>>
>>
>>
>> --
>> Ilya Kosmodemiansky,
>>
>> PostgreSQL-Consulting.com
>> tel. +14084142500
>> cell. +4915144336040
>> ik@postgresql-consulting.com
>
>
>
>
> --
> -
> Stefan Amshey



--
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com