Re: Query optimization - Mailing list pgsql-general

From Jorge Arevalo
Subject Re: Query optimization
Date
Msg-id CAMhtMNOoGMB3fSMdjjYf2ukq=DmhDp4eHsnM8H_cdfATCjTRyQ@mail.gmail.com
Whole thread Raw
In response to Re: Query optimization  (David G Johnston <david.g.johnston@gmail.com>)
Responses Re: Query optimization
Re: Query optimization
List pgsql-general
Hello David, many thanks for your responses,

Sorry for not providing the content of the fill_table3_function, but it just executes 3 insert queries in 3 different tables. And I've checked the time consuming operation is in this query (by the way, there was a little mistake in the name of the fields of the inner select, I've corrected it)

SELECT value1,value2,value3,value4,
value5, hstore(ARRAY['field9', 'field10', 'field11', 'field12', 'field13', 'field14'], ARRAY[field9, field10, field11, field12, field13, field14) as metadata, value7, (select array((select row(f1, f2) from table2 p where p.f3 = field7))) as values_array FROM table1

This is the result of EXPLAIN ANALYZE

                                                                   QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using table1_pkey on table1  (cost=67846.38..395773.45 rows=8419127 width=88) (actual time=7122.704..22670.680 rows=8419127 loops=1)
   InitPlan 2 (returns $1)
     ->  Result  (cost=67846.29..67846.29 rows=1 width=0) (actual time=7009.063..7009.065 rows=1 loops=1)
           InitPlan 1 (returns $0)
             ->  Seq Scan on table2 p  (cost=0.00..67846.29 rows=12689 width=20) (actual time=14.971..5069.840 rows=2537787 loops=1)
                   Filter: (f3 = field7)

So, there's a sequential scan over table2 (in the query to get values_array), instead of a index scan. Could it be because the SELECT returns more than approximately 5-10% of all rows in the table? (I've heard that, under those conditions, a sequential scan is faster than index scan, because the amount of I/O operations required for each row)

Anyway, if I understood well, I should try:

- Avoiding that inner query by using a JOIN instead
- Return a composite type instead of an array

Am I right? What kind of additional context information would you need?

Many thanks!!

--

Jorge




On Wed, Oct 29, 2014 at 2:54 AM, David G Johnston <david.g.johnston@gmail.com> wrote:
Jorge Arévalo-2 wrote
> (SELECT value1,value2,value3,value4,value5, hstore(ARRAY['field9',
> 'field10', 'field11', 'field12', 'field13', 'field14'], ARRAY[field9,
> field10, field11, field12, field13, field14) as metadata, value7, (select
> array((select row(f1, f2) from table2 p where p.field7 = field7))) as
> values_array FROM table1)

You might try seeing whether:

FROM table1 JOIN (
SELECT field7, array_agg(row(f1, f2)) AS values_array FROM table2 GROUP BY
field7
) tbl2_agg USING (field7)

helps...

I'm also dubious (though this isn't necessarily a performance issue) of:

array[...] AS metadata

Without context I would say this would be better as a composite type instead
of an array.  You may find it useful to use named composite types elsewhere
too...

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Query-optimization-tp5824739p5824746.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Jorge Arevalo
Freelance developer

http://about.me/jorgeas80

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Appending new data to existing field of Json data type
Next
From: David Johnston
Date:
Subject: Re: Query optimization