On 07/08/2012 17:00, Jeff Janes wrote:
> On Mon, Aug 6, 2012 at 8:08 AM, Ioannis Anagnostopoulos
> <ioannis@anatec.com> wrote:
>> Hi, my query is very simple:
>>
>> select
>> msg_id,
>> msg_type,
>> ship_pos_messages.pos_georef1,
>> ship_pos_messages.pos_georef2,
>> ship_pos_messages.pos_georef3,
>> ship_pos_messages.pos_georef4,
>> obj_id,
>> ship_speed,
>> ship_heading,
>> ship_course,
>> pos_point
>> from
>> feed_all_y2012m08.ship_pos_messages
>> where
>> extract('day' from msg_date_rec) = 1
>> AND msg_id = any(ARRAY[7294724,14174174,22254408]);
>>
>> The msg_id is the pkey on the ship_pos_messages table and in this example it
>> is working fast as it uses the pkey (primary key index) to make the
>> selection. The expplain anayze follows:
> ...
>> I think this is a pretty good plan and quite quick given the size of the
>> table (88Million rows at present). However in real life the parameter where
>> I search for msg_id is not an array of 3 ids but of 300.000 or more. It is
>> then that the query forgets the plan and goes to sequential scan. Is there
>> any way around? Or is this the best I can have?
> What happens if you set "enable_seqscan=off" and run the query with
> the very large list? (This is an experiment, not a recommendation for
> production use)
>
>
> Cheers,
>
> Jeff
As Tom said, the actual question is not valid. Seq scan are not bad, we
just need to understand the way around it instead of forcing them off.
In my case, the problem was the ARRAY as a parameter (which all together
is not that great for holding so many data). By converting it into a
temporary table and performing an inner join in the query (after
analysing the temp table) you get a nice Hash join (or Merge Join if you
don't analyse the temp table).
cheers Yiannis