Re: A very long running query.... - Mailing list pgsql-performance

From Marc Mamin
Subject Re: A very long running query....
Date
Msg-id C4DAC901169B624F933534A26ED7DF310F96B02D@JENMAIL01.ad.intershop.net
Whole thread Raw
In response to A very long running query....  (Ioannis Anagnostopoulos <ioannis@anatec.com>)
Responses Re: A very long running query....
List pgsql-performance


Hello,
isn't the first test superfluous here ?

>       where extract('day' from message_copies.msg_date_rec) = 17
>       and date_trunc('day',    message_copies.msg_date_rec) = '2012-07-17'


> Here is the index:
>
> CREATE INDEX idx_message_copies_wk2_date_src_pos_partial
>   ON feed_all_y2012m07.message_copies_wk2
>   USING btree
>   (date_trunc('day'::text, msg_date_rec),
>   src_id,
>   (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text))
> TABLESPACE archive
>   WHERE (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) IS NOT NULL
>   OR NOT (((pos_georef1::text || pos_georef2::text) || pos_georef3::text) || pos_georef4::text) = ''::text;


the georef test can be simplified using coalesce:

>  and     (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) <> ''
>  and not (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4) is null
  =>
  and coaesce (
    (message_copies.pos_georef1 || message_copies.pos_georef2 || message_copies.pos_georef3 || message_copies.pos_georef4),
     '') <> ''
 
In order to avoid this test at query time you might add a boolean column   message_copies.pos.has_georef,
and keep it up to date  with a before insert or update trigger. This will allow to shorten your index definition and simplify the planner task a little bit.
Moreover it will fasten your query in cases when the index don't get used.

As Tom already mentioned it, it may make sense not to concatenate the georef within the index, but keep them separated, or even keep them in different indexes.
Which is the best depend on the other queries running against this table
 
HTH,

Marc Mamin
 


-----Original Message-----
From: pgsql-performance-owner@postgresql.org on behalf of Ioannis Anagnostopoulos
Sent: Sat 7/21/2012 1:56 AM
To: Tom Lane
Cc: Claudio Freire; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] A very long running query....

On 21/07/2012 00:10, Tom Lane wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> Looking at this:
>> "                    ->  Index Scan using
>> idx_message_copies_wk2_date_src_pos_partial on message_copies_wk2
>> message_copies  (cost=0.00..19057.93 rows=52 width=32) (actual
>> time=62.124..5486270.845 rows=387524 loops=1)"
>> "                          Index Cond: ((date_trunc('day'::text,
>> msg_date_rec) = '2012-07-17 00:00:00'::timestamp without time zone)
>> AND (src_id = 1))"
>> "                          Filter: ((date_part('day'::text,
>> msg_date_rec) = 17::double precision) AND (NOT (((((pos_georef1)::text
>> || (pos_georef2)::text) || (pos_georef3)::text) ||
>> (pos_georef4)::text) IS NULL)) AND (((((pos_georef1)::text ||
>> (pos_georef2)::text) || (pos_georef3)::text) || (pos_georef4)::text)
>> <> ''::text))"
> I think the real problem is that the planner has no hope of doing
> anything very accurate with such an unwieldy filter condition.  I'd look
> at ways of making the filter conditions simpler, perhaps by recasting
> the data representation.  In particular, that's a horridly bad way of
> asking whether some columns are empty, which I gather is the intent.
> If you really want to do it just like that, creating an index on the
> concatenation expression would guide ANALYZE to collect some stats about
> it, but it would probably be a lot more efficient to put together an AND
> or OR of tests on the individual columns.
>
>                       regards, tom lane
So what you suggest is to forget all together the concatenation of the
georef1/2/3/4 and instead alter my query with something like:

georef1 is not null and not georeg1 = ''....etc for georef2 3 and 4

That would require to alter my index and have the four georef columns
separately in it and not as a concatenation and so on for the partial
index part. And a final thing, you seem to imply that the indexes are
used by the analyser to collect statistics even if they are not used. So
an index serves not only as a way to speed up targeted queries but also
to provide better statistics to the analyzer?

Kind Regards
Yiannis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: query overhead
Next
From: Ioannis Anagnostopoulos
Date:
Subject: Re: A very long running query....