Re: less than 2 sec for response - possible? - Mailing list pgsql-performance

From Pujol Mathieu
Subject Re: less than 2 sec for response - possible?
Date
Msg-id 1e4aa8a7-94da-d880-837d-df40b97aaad6@realfusio.com
Whole thread Raw
In response to Re: less than 2 sec for response - possible?  (trafdev <trafdev@mail.ru>)
List pgsql-performance
Hello

Have you solved your problem ?

Could it be a conversion overhead from 'timestamp without time zone' to
'date' ? In this case, I don't know if planer store constants as date or
timestamp.

Mathieu Pujol

Le 02/07/2016 à 04:48, trafdev a écrit :
> Thanks Tom.
>
> I've created index on aid, date:
>
> create index aaa on stats.feed_sub(aid,date);
>
> and simplified a query (dropped gran as it's equal for all rows anyway):
>
> SELECT
>   sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip,
>   sum(stats.feed_sub.c_filt_doubleclick)  AS clicks_on_target,
>   sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period,
>   sum(stats.feed_sub.c_filt_fast_click)   AS fast_click,
>   sum(stats.feed_sub.c_filt_ip_mismatch)  AS ip_mismatch,
>   sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch,
>   sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch,
>   sum(stats.feed_sub.c_filt_ua_mismatch)  AS ua_mismatch,
>   sum(stats.feed_sub.c_filt_url_expired)  AS url_expired,
>   stats.feed_sub.subid                    AS stats_feed_sub_subid,
>   stats.feed_sub.sid                      AS stats_feed_sub_sid
> FROM stats.feed_sub
> WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND
>       stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND
>       stats.feed_sub.aid = 3
> GROUP BY
>   stats.feed_sub.subid, stats.feed_sub.sid;
>
> All data is in the cache and it still takes almost 5 seconds to complete:
>
> QUERY PLAN
> HashAggregate  (cost=792450.42..803727.24 rows=346979 width=86)
> (actual time=4742.145..4882.468 rows=126533 loops=1)
> "  Group Key: subid, sid"
>   Buffers: shared hit=1350371
>   ->  Index Scan using aaa on feed_sub  (cost=0.43..697031.39
> rows=3469783 width=86) (actual time=0.026..1655.394 rows=3588376 loops=1)
>         Index Cond: ((aid = 3) AND (date >= '2016-06-01
> 00:00:00'::timestamp without time zone) AND (date <= '2016-06-30
> 00:00:00'::timestamp without time zone))
>         Buffers: shared hit=1350371
> Planning time: 0.159 ms
> Execution time: 4899.934 ms
>
> It's better, but still is far from "<2 secs" goal.
>
> Any thoughts?
>
>
> On 07/01/16 18:23, Tom Lane wrote:
>> trafdev <trafdev@mail.ru> writes:
>>> CREATE INDEX ix_feed_sub_date
>>>    ON stats.feed_sub
>>>    USING brin
>>>    (date);
>>
>>> CREATE UNIQUE INDEX ixu_feed_sub
>>>    ON stats.feed_sub
>>>    USING btree
>>>    (date, gran, aid, pid, sid, fid, subid COLLATE
>>> pg_catalog."default");
>>
>>> HashAggregate (cost=901171.72..912354.97 rows=344100 width=86) (actual
>>> time=7207.825..7335.473 rows=126044 loops=1)
>>> "  Group Key: subid, sid"
>>>    Buffers: shared hit=3635804
>>>    ->  Index Scan using ixu_feed_sub on feed_sub (cost=0.56..806544.38
>>> rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344
>>> loops=1)
>>>          Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without
>>> time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time
>>> zone) AND (gran = '1 day'::interval) AND (aid = 3))
>>>          Buffers: shared hit=3635804
>>> Planning time: 0.150 ms
>>> Execution time: 7352.009 ms
>>
>> Neither of those indexes is terribly well designed for this query.
>> A btree index on (aid, gran, date) or (gran, aid, date) would work
>> much better.  See
>>
>> https://www.postgresql.org/docs/9.5/static/indexes-multicolumn.html
>>
>> You could rearrange the column order in that giant unique index
>> and get some of the benefit.  But if you're desperate to optimize
>> this particular query, an index not bearing so many irrelevant columns
>> would probably be better for it.
>>
>> An alternative way of thinking would be to create an index with those
>> three leading columns and then all of the other columns used by this
>> query as later columns.  That would be an even larger index, but it
>> would
>> allow an index-only scan, which might be quite a lot faster. The fact
>> that you seem to be hitting about one page for each row retrieved says
>> that the data you need is pretty badly scattered, so constructing an
>> index
>> that concentrates everything you need into one range of the index might
>> be the ticket.
>>
>> Either of these additional-index ideas is going to penalize table
>> insertions/updates, so keep an eye on that end of the performance
>> question too.
>>
>>             regards, tom lane
>>
>
>



pgsql-performance by date:

Previous
From: Kouber Saparev
Date:
Subject: Re: DELETE takes too much memory
Next
From: Torsten Zuehlsdorff
Date:
Subject: Re: less than 2 sec for response - possible?