Re: less than 2 sec for response - possible? - Mailing list pgsql-performance
From | Pujol Mathieu |
---|---|
Subject | Re: less than 2 sec for response - possible? |
Date | |
Msg-id | 1e4aa8a7-94da-d880-837d-df40b97aaad6@realfusio.com Whole thread Raw |
In response to | Re: less than 2 sec for response - possible? (trafdev <trafdev@mail.ru>) |
List | pgsql-performance |
Hello Have you solved your problem ? Could it be a conversion overhead from 'timestamp without time zone' to 'date' ? In this case, I don't know if planer store constants as date or timestamp. Mathieu Pujol Le 02/07/2016 à 04:48, trafdev a écrit : > Thanks Tom. > > I've created index on aid, date: > > create index aaa on stats.feed_sub(aid,date); > > and simplified a query (dropped gran as it's equal for all rows anyway): > > SELECT > sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip, > sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target, > sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period, > sum(stats.feed_sub.c_filt_fast_click) AS fast_click, > sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch, > sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch, > sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch, > sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch, > sum(stats.feed_sub.c_filt_url_expired) AS url_expired, > stats.feed_sub.subid AS stats_feed_sub_subid, > stats.feed_sub.sid AS stats_feed_sub_sid > FROM stats.feed_sub > WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND > stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND > stats.feed_sub.aid = 3 > GROUP BY > stats.feed_sub.subid, stats.feed_sub.sid; > > All data is in the cache and it still takes almost 5 seconds to complete: > > QUERY PLAN > HashAggregate (cost=792450.42..803727.24 rows=346979 width=86) > (actual time=4742.145..4882.468 rows=126533 loops=1) > " Group Key: subid, sid" > Buffers: shared hit=1350371 > -> Index Scan using aaa on feed_sub (cost=0.43..697031.39 > rows=3469783 width=86) (actual time=0.026..1655.394 rows=3588376 loops=1) > Index Cond: ((aid = 3) AND (date >= '2016-06-01 > 00:00:00'::timestamp without time zone) AND (date <= '2016-06-30 > 00:00:00'::timestamp without time zone)) > Buffers: shared hit=1350371 > Planning time: 0.159 ms > Execution time: 4899.934 ms > > It's better, but still is far from "<2 secs" goal. > > Any thoughts? > > > On 07/01/16 18:23, Tom Lane wrote: >> trafdev <trafdev@mail.ru> writes: >>> CREATE INDEX ix_feed_sub_date >>> ON stats.feed_sub >>> USING brin >>> (date); >> >>> CREATE UNIQUE INDEX ixu_feed_sub >>> ON stats.feed_sub >>> USING btree >>> (date, gran, aid, pid, sid, fid, subid COLLATE >>> pg_catalog."default"); >> >>> HashAggregate (cost=901171.72..912354.97 rows=344100 width=86) (actual >>> time=7207.825..7335.473 rows=126044 loops=1) >>> " Group Key: subid, sid" >>> Buffers: shared hit=3635804 >>> -> Index Scan using ixu_feed_sub on feed_sub (cost=0.56..806544.38 >>> rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344 >>> loops=1) >>> Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without >>> time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time >>> zone) AND (gran = '1 day'::interval) AND (aid = 3)) >>> Buffers: shared hit=3635804 >>> Planning time: 0.150 ms >>> Execution time: 7352.009 ms >> >> Neither of those indexes is terribly well designed for this query. >> A btree index on (aid, gran, date) or (gran, aid, date) would work >> much better. See >> >> https://www.postgresql.org/docs/9.5/static/indexes-multicolumn.html >> >> You could rearrange the column order in that giant unique index >> and get some of the benefit. But if you're desperate to optimize >> this particular query, an index not bearing so many irrelevant columns >> would probably be better for it. >> >> An alternative way of thinking would be to create an index with those >> three leading columns and then all of the other columns used by this >> query as later columns. That would be an even larger index, but it >> would >> allow an index-only scan, which might be quite a lot faster. The fact >> that you seem to be hitting about one page for each row retrieved says >> that the data you need is pretty badly scattered, so constructing an >> index >> that concentrates everything you need into one range of the index might >> be the ticket. >> >> Either of these additional-index ideas is going to penalize table >> insertions/updates, so keep an eye on that end of the performance >> question too. >> >> regards, tom lane >> > >
pgsql-performance by date: