Re: less than 2 sec for response - possible? - Mailing list pgsql-performance
From | trafdev |
---|---|
Subject | Re: less than 2 sec for response - possible? |
Date | |
Msg-id | 0461b3e8-cb38-0ded-6e8c-48e6dc545ea6@mail.ru Whole thread Raw |
In response to | Re: less than 2 sec for response - possible? (Torsten Zuehlsdorff <mailinglists@toco-domains.de>) |
Responses |
Re: less than 2 sec for response - possible?
|
List | pgsql-performance |
Hi, yes I've tried it in the past, it makes no any difference at all: With TIMESTAMP cast: QUERY PLAN HashAggregate (cost=1405666.90..1416585.93 rows=335970 width=86) (actual time=4794.585..4923.062 rows=126533 loops=1) " Group Key: subid, sid" Buffers: shared hit=1486949 -> Index Scan using ix_feed_sub_aid_date on feed_sub (cost=0.44..1313275.32 rows=3359694 width=86) (actual time=0.020..1736.005 rows=3588376 loops=1) Index Cond: ((aid = 3) AND (date >= '2016-06-01 00:00:00'::timestamp without time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time zone)) Buffers: shared hit=1486949 Planning time: 0.158 ms Execution time: 4939.965 ms Without TIMESTAMP cast: QUERY PLAN HashAggregate (cost=1405666.90..1416585.93 rows=335970 width=86) (actual time=4797.272..4924.015 rows=126533 loops=1) " Group Key: subid, sid" Buffers: shared hit=1486949 -> Index Scan using ix_feed_sub_aid_date on feed_sub (cost=0.44..1313275.32 rows=3359694 width=86) (actual time=0.019..1783.104 rows=3588376 loops=1) Index Cond: ((aid = 3) AND (date >= '2016-06-01'::date) AND (date <= '2016-06-30'::date)) Buffers: shared hit=1486949 Planning time: 0.164 ms Execution time: 4941.259 ms I need to be sure it's a physical limitation of a Postgresql (when all data is in a memory and fetching\joining 1.5 mln of rows can't be done in less than 2-3 seconds) and there is no way to improve it. On 07/05/16 04:39, Torsten Zuehlsdorff wrote: > > > On 02.07.2016 02:54, trafdev wrote: > > Hi. > > > > I'm trying to build an OLAP-oriented DB based on PostgresSQL. > > > > User works with a paginated report in the web-browser. Interface allows > > to fetch data for a custom date-range selection, > > display individual rows (20-50 per page) and totals (for entire > > selection, even not visible on the current page) and sorting by any > > column. > > > > The main goal is to deliver results of the basic SELECT queries to the > > end-user in less than 2 seconds. > > > > I was able to achieve that except for one table (the biggest one). > > > > It consist of multiple dimensions (date, gran, aid, pid, sid, fid, > > subid) and metrics (see below). > > User can filter by any dimension and sort by any metric. > > > > Here is a CREATE script for this table: > > > > CREATE TABLE stats.feed_sub > > ( > > date date NOT NULL, > > gran interval NOT NULL, > > aid smallint NOT NULL, > > pid smallint NOT NULL, > > sid smallint NOT NULL, > > fid smallint NOT NULL, > > subid text NOT NULL, > > rev_est_pub real NOT NULL, > > rev_est_feed real NOT NULL, > > rev_raw real NOT NULL, > > c_total bigint NOT NULL, > > c_passed bigint NOT NULL, > > q_total bigint NOT NULL, > > q_passed bigint NOT NULL, > > q_filt_geo bigint NOT NULL, > > q_filt_browser bigint NOT NULL, > > q_filt_os bigint NOT NULL, > > q_filt_ip bigint NOT NULL, > > q_filt_subid bigint NOT NULL, > > q_filt_pause bigint NOT NULL, > > q_filt_click_cap_ip bigint NOT NULL, > > q_filt_query_cap bigint NOT NULL, > > q_filt_click_cap bigint NOT NULL, > > q_filt_rev_cap bigint NOT NULL, > > q_filt_erpm_floor bigint NOT NULL, > > c_filt_click_cap_ip bigint NOT NULL, > > c_filt_doubleclick bigint NOT NULL, > > c_filt_url_expired bigint NOT NULL, > > c_filt_fast_click bigint NOT NULL, > > c_filt_delay_clicks bigint NOT NULL, > > c_filt_ip_mismatch bigint NOT NULL, > > c_filt_ref_mismatch bigint NOT NULL, > > c_filt_lng_mismatch bigint NOT NULL, > > c_filt_ua_mismatch bigint NOT NULL, > > res_impr bigint NOT NULL, > > rev_ver_pub real, > > rev_ver_feed real, > > c_ver bigint, > > q_filt_ref bigint NOT NULL > > ) > > WITH ( > > OIDS=FALSE > > ); > > > > CREATE INDEX ix_feed_sub_date > > ON stats.feed_sub > > USING brin > > (date); > > > > CREATE UNIQUE INDEX ixu_feed_sub > > ON stats.feed_sub > > USING btree > > (date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default"); > > > > Here is some sizing info (https://wiki.postgresql.org/wiki/Disk_Usage): > > > > relation,size > > stats.feed_sub,5644 MB > > stats.ixu_feed_sub,1594 MB > > > > row_estimate > > 15865627 > > > > Here is the typical query (for totals beige): > > SELECT > > sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip, > > sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target, > > sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period, > > sum(stats.feed_sub.c_filt_fast_click) AS fast_click, > > sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch, > > sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch, > > sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch, > > sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch, > > sum(stats.feed_sub.c_filt_url_expired) AS url_expired, > > stats.feed_sub.subid AS stats_feed_sub_subid, > > stats.feed_sub.sid AS stats_feed_sub_sid > > FROM stats.feed_sub > > WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND > > stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND > > stats.feed_sub.gran = '1 day' > > AND stats.feed_sub.aid = 3 > > GROUP BY > > stats.feed_sub.subid, stats.feed_sub.sid; > > You cast every date to an timestamp. Why? You can adjust the index to: > > CREATE UNIQUE INDEX ixu_feed_sub > ON stats.feed_sub > USING btree > (date::timestamp, gran, aid, pid, sid, fid, subid COLLATE > pg_catalog."default"); > > But since i see no need for the cast at all (maybe i missed it) try it > without! > > Greetings, > Torsten >
pgsql-performance by date: