Re: Very slow query (3-4mn) on a table with 25millions rows - Mailing list pgsql-performance

From Abadie Lana
Subject Re: Very slow query (3-4mn) on a table with 25millions rows
Date
Msg-id E544BB9A64ABD24DA201745FD316D94551126530@XCH2.iter.org
Whole thread Raw
In response to Re: Very slow query (3-4mn) on a table with 25millions rows  (Félix GERZAGUET <felix.gerzaguet@gmail.com>)
List pgsql-performance

Hello Felix

Thanks indeed the new query is much faster…The query itself is complicated to explain basically you can view it as graph and want to make sure that there is no dependencies if I remove a set of points….

 

explain analyze with filtered_s as ( select s.attvalue  from functionalvarattributes s, tags t, variableattributetypes vat where t.id=s.tag_id and t.status!='Internal' and vat.fieldtype in ('DBF_INLINK','DBF_OUTLINK','DBF_FWDLINK')  and vat.id=s.atttype_id and split_part(split_part(s.attvalue,' ',1),'.',1) in ( select e.name from functionalvariables e, usertemplatevariable ut where e.usertemplatevar_id=ut.id and ut.usertempl_id=15) ) select s.attvalue from filtered_s s except select s.attvalue from filtered_s s , usertemplvarattribute utva, usertemplatevariable utv where utv.id=utva.usertempvariable_fk and  utv.usertempl_id=15;

                                                                                                QUERY PLAN                            

                                                                    

---------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------

HashSetOp Except  (cost=904251.31..2013436.93 rows=200 width=516) (actual time=40007.482..40007.482 rows=0 loops=1)

   CTE filtered_s

     ->  Hash Join  (cost=171506.51..904251.31 rows=310110 width=8) (actual time=13986.554..40005.687 rows=2 loops=1)

           Hash Cond: (split_part(split_part((s_2.attvalue)::text, ' '::text, 1), '.'::text, 1) = (e.name)::text)

           ->  Hash Join  (cost=193.91..726311.49 rows=310110 width=8) (actual time=2.675..30633.916 rows=308287 loops=1)

                 Hash Cond: (s_2.tag_id = t.id)

                 ->  Hash Join  (cost=188.03..716937.71 rows=1671149 width=16) (actual time=2.518..30249.987 rows=651155 loops=1)

                       Hash Cond: (s_2.atttype_id = vat.id)

                       ->  Seq Scan on functionalvarattributes s_2  (cost=0.00..604679.32 rows=25429032 width=24) (actual time=0.005..1

9229.473 rows=25429808 loops=1)

                       ->  Hash  (cost=183.18..183.18 rows=388 width=8) (actual time=2.433..2.433 rows=388 loops=1)

                             Buckets: 1024  Batches: 1  Memory Usage: 16kB

                             ->  Seq Scan on variableattributetypes vat  (cost=0.00..183.18 rows=388 width=8) (actual time=0.010..2.171

rows=388 loops=1)

                                   Filter: ((fieldtype)::text = ANY ('{DBF_INLINK,DBF_OUTLINK,DBF_FWDLINK}'::text[]))

                                   Rows Removed by Filter: 5516

                 ->  Hash  (cost=5.43..5.43 rows=36 width=8) (actual time=0.147..0.147 rows=36 loops=1)

                       Buckets: 1024  Batches: 1  Memory Usage: 2kB

                       ->  Seq Scan on tags t  (cost=0.00..5.43 rows=36 width=8) (actual time=0.015..0.119 rows=36 loops=1)

                             Filter: ((status)::text <> 'Internal'::text)

                             Rows Removed by Filter: 158

           ->  Hash  (cost=171251.03..171251.03 rows=4926 width=24) (actual time=8939.073..8939.073 rows=16 loops=1)

                 Buckets: 1024  Batches: 1  Memory Usage: 1kB

                 ->  HashAggregate  (cost=171201.77..171251.03 rows=4926 width=24) (actual time=8939.039..8939.058 rows=16 loops=1)

                       ->  Hash Join  (cost=8.95..171189.45 rows=4926 width=24) (actual time=3188.453..8938.943 rows=48 loops=1)

                             Hash Cond: (e.usertemplatevar_id = ut.id)

                             ->  Seq Scan on functionalvariables e  (cost=0.00..155513.72 rows=4164672 width=32) (actual time=0.004..65

54.351 rows=4164350 loops=1)

                             ->  Hash  (cost=8.75..8.75 rows=16 width=8) (actual time=0.042..0.042 rows=16 loops=1)

                                   Buckets: 1024  Batches: 1  Memory Usage: 1kB

                                   ->  Index Scan using usertemp_utv_idx on usertemplatevariable ut  (cost=0.29..8.75 rows=16 width=8)

(actual time=0.015..0.029 rows=16 loops=1)

                                         Index Cond: (usertempl_id = 15)

   ->  Append  (cost=0.00..999159.97 rows=44010259 width=516) (actual time=13986.564..40007.199 rows=320 loops=1)

         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..9303.30 rows=310110 width=516) (actual time=13986.563..40005.703 rows=2 loops=1

)

               ->  CTE Scan on filtered_s s  (cost=0.00..6202.20 rows=310110 width=516) (actual time=13986.561..40005.699 rows=2 loops=

1)

         ->  Subquery Scan on "*SELECT* 2"  (cost=0.70..989856.67 rows=43700149 width=516) (actual time=0.071..1.242 rows=318 loops=1)

               ->  Nested Loop  (cost=0.70..552855.18 rows=43700149 width=516) (actual time=0.069..0.941 rows=318 loops=1)

                     ->  CTE Scan on filtered_s s_1  (cost=0.00..6202.20 rows=310110 width=516) (actual time=0.003..0.005 rows=2 loops=

1)

                     ->  Materialize  (cost=0.70..84.46 rows=141 width=0) (actual time=0.032..0.331 rows=159 loops=2)

                           ->  Nested Loop  (cost=0.70..83.75 rows=141 width=0) (actual time=0.053..0.426 rows=159 loops=1)

                                 ->  Index Scan using usertemp_utv_idx on usertemplatevariable utv  (cost=0.29..8.75 rows=16 width=8) (

actual time=0.030..0.052 rows=16 loops=1)

                                       Index Cond: (usertempl_id = 15)

                                ->  Index Only Scan using usertemplvarattribute_atttypeid_key on usertemplvarattribute utva  (cost=0.4

2..4.60 rows=9 width=8) (actual time=0.005..0.011 rows=10 loops=16)

                                       Index Cond: (usertempvariable_fk = utv.id)

                                       Heap Fetches: 0

Total runtime: 40007.716 ms

 

 

Lana

From: Félix GERZAGUET [mailto:felix.gerzaguet@gmail.com]
Sent: 27 July 2016 11:16
To: Abadie Lana
Cc: Martín Marqués; Tom Lane; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very slow query (3-4mn) on a table with 25millions rows

 

Hello Lana,

 

On Wed, Jul 27, 2016 at 8:03 AM, Abadie Lana <Lana.Abadie@iter.org> wrote:

Here the result of explain (analyse, buffer). Thanks for your help and let me know if you need more information.

 

I noticed 3 things in your query:

1. In the second part (after the except), the 2 tables utva and utv are not joined against the others table. Is there a missing join somewhere ?


Let that snipset:

select s.attvalue
  from functionalvarattributes s
     , tags t
     , variableattributetypes vat
 where t.id=s.tag_id
   and t.status!='Internal'
   and vat.fieldtype in ('DBF_INLINK','DBF_OUTLINK','DBF_FWDLINK')
   and vat.id=s.atttype_id
   and split_part(split_part(s.attvalue,' ',1),'.',1) in ( select e.name
                                                             from functionalvariables e
                                                                , usertemplatevariable ut
                                                            where e.usertemplatevar_id=ut.id
                                                              and ut.usertempl_id=15
                                                           )

be called A

Let that snipset:

select *
  from usertemplvarattribute utva
     , usertemplatevariable utv
  where utv.id=utva.usertempvariable_fk
    and utv.usertempl_id=15

be called B

Then you query is:

A

except

A CROSS JOIN B

If B is not the empty set, than the above query is guaranteed to always have 0 row.

 

2. Assuming your query is right (even if I failed to understand its point), we could only do the A snipset once instead of twice using a with clause as in:

with filtered_s as (
select s.attvalue
  from functionalvarattributes s
     , tags t
     , variableattributetypes vat
 where t.id=s.tag_id
   and t.status!='Internal'
   and vat.fieldtype in ('DBF_INLINK','DBF_OUTLINK','DBF_FWDLINK')
   and vat.id=s.atttype_id
   and split_part(split_part(s.attvalue,' ',1),'.',1) in ( select e.name
                                                             from functionalvariables e
                                                                , usertemplatevariable ut
                                                            where e.usertemplatevar_id=ut.id
                                                              and ut.usertempl_id=15
                                                           )
)
select s.attvalue
  from filtered_s s
except
select s.attvalue
  from filtered_s s
     , usertemplvarattribute utva
     , usertemplatevariable utv
  where utv.id=utva.usertempvariable_fk
    and utv.usertempl_id=15
;

This rewritten query should run about 2x. faster.

3. The planner believe that the e.name subselect will give 4926 rows (instead of 16 in reality), due to this wrong estimate it will consider the vat_funcvaratt_multi_idx index as not usefull. I don't know how to give the planner more accurate info ...


--

Félix

 

pgsql-performance by date:

Previous
From: Félix GERZAGUET
Date:
Subject: Re: Very slow query (3-4mn) on a table with 25millions rows
Next
From: Abadie Lana
Date:
Subject: Re: Very slow query (3-4mn) on a table with 25millions rows