Query optimization - Mailing list pgsql-general

From Jorge Arevalo
Subject Query optimization
Date
Msg-id CAMhtMNNTDmcTXZ8yjyxtaXvaYYaW04_tnqXRqajH=Kz1Dswmtw@mail.gmail.com
Whole thread Raw
Responses Re: Query optimization  (David G Johnston <david.g.johnston@gmail.com>)
Re: Query optimization  (David G Johnston <david.g.johnston@gmail.com>)
List pgsql-general
Hello,

I'm trying to optimize a query that takes too much time. This is what I have

table1(field1, field2... field14): contains about 8.5 million rows
table2(f1, f2, f3): contains about 2.5 million rows
table3: is empty, and must be filled with data coming from table1 and table2

To fill table3, I'm using a query that looks like this:

WITH records_to_insert(field1,field2,field3,field4,field5,field6_hstore,field7,field8_array) AS
 
(SELECT value1,value2,value3,value4,value5, hstore(ARRAY['field9', 'field10', 'field11', 'field12', 'field13', 'field14'], ARRAY[field9, field10, field11, field12, field13, field14) as metadata, value7, (select array((select row(f1, f2) from table2 p where p.field7 = field7))) as values_array FROM table1)

SELECT fill_table3_function(field1,field2,field3,field4,field5,field6_hstore,field7,field8_array) FROM records_to_insert

So, I first generate a common table records_to_insert, using data from table1 and table2, and then call a function fill_table3_function, in order to insert the values into table3 (I do more things apart from insert, that's reason to call a function instead of just raising an insert query). There are indexes created on all the columns that need them.

I'm having problems because the query takes a lot of time, and the server returns a timeout error.

I think the bottleneck is the array built for the last argument of my common table, and maybe the hstore too (not sure if it's a heavy process or not). First thing I've tried is to limit  the query to build the common table, using LIMIT .. OFFSET after 'FROM table1', and make a loop to seek table1 (the big one, with 8 million rows). But still getting timeout errors.

The problem is I don't have access to Postgres configuration, in order to increase the timeout for user queries. And anyway, I don't think that increasing the timeout is a real solution (It'll just make the server suffer for more time).

So, is there anything obviously wrong with my query? Any changes to make it faster?

Many thanks in advance, and best regards,


--
Jorge Arevalo
Freelance developer

http://about.me/jorgeas80

pgsql-general by date:

Previous
From: Thom Brown
Date:
Subject: Re: How to find earlest possible start times for given duration excluding reservations
Next
From: Romu Hu
Date:
Subject: Re: Need guidance on regression.diffs