Re: Querying 19million records very slowly - Mailing list pgsql-performance
From | Paul Ramsey |
---|---|
Subject | Re: Querying 19million records very slowly |
Date | |
Msg-id | BF4268C3-E328-11D9-B899-000393D33C2E@refractions.net Whole thread Raw |
In response to | Re: Querying 19million records very slowly (Kjell Tore Fossbakk <kjelltore@gmail.com>) |
List | pgsql-performance |
OK, so the planner is in fact making a mistake (I think). Try turning down your random_page_cost a little. It defaults at 4.0, see if 2.0 works "right". (Careful, move these things around too much for one query, you will wreck others.) 4.0 is a little large for almost all modern hardware, so see if moving it down a little makes things somewhat smarter. P On Wednesday, June 22, 2005, at 12:45 AM, Kjell Tore Fossbakk wrote: > database=> set enable_seqscan to on; > SET > Time: 0.34 ms > > > > database=> explain analyze select count(*) from test where p1=53; > QUERY PLAN > ----------------------------------------------------------------------- > ------------------------------------------------ > Aggregate (cost=522824.50..522824.50 rows=1 width=0) (actual > time=56380.72..56380.72 rows=1 loops=1) > -> Seq Scan on test (cost=0.00..517383.30 rows=2176479 width=0) > (actual time=9.61..47677.48 rows=2220746 loops=1) > Filter: (p1 = 53) > Total runtime: 56380.79 msec > (4 rows) > > Time: 56381.40 ms > > > > database=> explain analyze select count(*) from test where p1=53 and > time > now() - interval '24 hours' ; > QUERY PLAN > ----------------------------------------------------------------------- > ------------------------------------------------- > Aggregate (cost=661969.01..661969.01 rows=1 width=0) (actual > time=45787.02..45787.02 rows=1 loops=1) > -> Seq Scan on test (cost=0.00..660155.28 rows=725493 width=0) > (actual time=37799.32..45613.58 rows=42424 loops=1) > Filter: ((p1 = 53) AND ("time" > (now() - '1 day'::interval))) > Total runtime: 45787.09 msec > (4 rows) > > Time: 45787.79 ms > > > > database=> explain analyze select date_trunc('hour', time),count(*) as > total from test where p1=53 and time>now()-interval '24 hours' group > by date_trunc order by date_trunc; > QUERY > PLAN > ----------------------------------------------------------------------- > ------------------------------------------------------------- > Aggregate (cost=755116.97..760558.17 rows=72549 width=8) (actual > time=46040.63..46717.61 rows=23 loops=1) > -> Group (cost=755116.97..758744.44 rows=725493 width=8) (actual > time=46022.06..46548.84 rows=42407 loops=1) > -> Sort (cost=755116.97..756930.70 rows=725493 width=8) > (actual time=46022.04..46198.94 rows=42407 loops=1) > Sort Key: date_trunc('hour'::text, "time") > -> Seq Scan on test (cost=0.00..660155.28 rows=725493 > width=8) (actual time=37784.91..45690.88 rows=42407 loops=1) > Filter: ((p1 = 53) AND ("time" > (now() - '1 > day'::interval))) > Total runtime: 46718.43 msec > (7 rows) > > Time: 46719.44 ms > > > > database=> create index test_time_p1_idx on test(time,p1) ; > CREATE INDEX > Time: 178926.02 ms > > database=> vacuum analyze test ; > VACUUM > Time: 73058.33 ms > > database=> \d test > Table "public.test" > Column | Type | Modifiers > -------------+--------------------------+----------- > time | timestamp with time zone | > source | inet | > destination | inet | > p1 | integer | > p2 | integer | > > > > database=> \di > public | test_time_idx | index | database | test > public | test_source_idx | index | database | test > public | test_destination_idx | index | database | test > public | test_p1_idx | index | database | test > public | test_p2_idx | index | database | test > public | test_time_p1_idx | index | database | test > > > > database=> set enable_seqscan to off ; > SET > Time: 0.28 ms > > > > database=> explain analyze select date_trunc('hour', time),count(*) as > total from test where p1=53 and time>now()-interval '24 hours' group > by date_trunc order by date_trunc; > Aggregate (cost=2315252.66..2320767.17 rows=73527 width=8) (actual > time=2081.15..2720.44 rows=23 loops=1) > -> Group (cost=2315252.66..2318929.00 rows=735268 width=8) > (actual time=2079.76..2564.22 rows=41366 loops=1) > -> Sort (cost=2315252.66..2317090.83 rows=735268 width=8) > (actual time=2079.74..2243.32 rows=41366 loops=1) > Sort Key: date_trunc('hour'::text, "time") > -> Index Scan using test_time_p1_idx on test > (cost=0.00..2218878.46 rows=735268 width=8) (actual > time=29.50..1774.52 rows=41366 loops=1) > Index Cond: (("time" > (now() - '1 > day'::interval)) AND (p1 = 53)) > Total runtime: 2735.42 msec > > Time: 2736.48 ms > > > > database=> explain analyze select date_trunc('hour', time),count(*) as > total from test where p1=80 and time>now()-interval '24 hours' group > by date_trunc order by date_trunc; > Aggregate (cost=1071732.15..1074305.59 rows=34313 width=8) (actual > time=6353.93..7321.99 rows=22 loops=1) > -> Group (cost=1071732.15..1073447.77 rows=343125 width=8) > (actual time=6323.76..7078.10 rows=64267 loops=1) > -> Sort (cost=1071732.15..1072589.96 rows=343125 width=8) > (actual time=6323.75..6579.42 rows=64267 loops=1) > Sort Key: date_trunc('hour'::text, "time") > -> Index Scan using test_time_p1_idx on test > (cost=0.00..1035479.58 rows=343125 width=8) (actual time=0.20..5858.67 > rows=64267 loops=1) > Index Cond: (("time" > (now() - '1 > day'::interval)) AND (p1 = 80)) > Total runtime: 7322.82 msec > > Time: 7323.90 ms > > > > database=> explain analyze select date_trunc('hour', time),count(*) as > total from test where p1=139 and time>now()-interval '24 hours' group > by date_trunc order by date_trunc; > Aggregate (cost=701562.34..703250.12 rows=22504 width=8) (actual > time=2448.41..3033.80 rows=22 loops=1) > -> Group (cost=701562.34..702687.53 rows=225037 width=8) (actual > time=2417.39..2884.25 rows=36637 loops=1) > -> Sort (cost=701562.34..702124.94 rows=225037 width=8) > (actual time=2417.38..2574.19 rows=36637 loops=1) > Sort Key: date_trunc('hour'::text, "time") > -> Index Scan using test_time_p1_idx on test > (cost=0.00..679115.34 rows=225037 width=8) (actual time=8.47..2156.18 > rows=36637 loops=1) > Index Cond: (("time" > (now() - '1 > day'::interval)) AND (p1 = 139)) > Total runtime: 3034.57 msec > > Time: 3035.70 ms > > > > Now, this query gives me all the hours in a day, with the count of all > p1=53 for each hour. Pg uses 46.7 seconds to run with seqscan, while > 2.7 seconds indexing on (time,p1). I think I turned "set > enable_seqscan to on;" again, and then the planner used seqscan, and > not index. > - Why does Pg not see the benefits of using index? > - and how can i tune the optimisation fields in postgresql.conf to > help him? > > So now my PG uses a reasonable amout of time on these queries (with > enable_seqscan turned off) > > The next place which seems to slow my queries, is probably my > connection to PHP. I got a bash script running in cron on my server > (freebsd 4.11), which runs php on a php file. To force PG to not use > seqscans, I have modifies the postgresql.conf: > > .. > enable_seqscan = false > enable_indexscan = true > .. > effective_cache_size = 10000 > random_page_cost = 2 > .. > > I save the file, type 'pg_crl reload' then enter 'psql database'. > > database=> show enable_seqscan ; > enable_seqscan > ---------------- > on > (1 row) > > > argus=> show effective_cache_size ; > effective_cache_size > ---------------------- > 1000 > (1 row) > > I have used the manual pages on postgresql, postmaster, and so on, but > I cant find anywhere to specify which config file Pg is to use. I'm > not entirely sure if he uses the one im editing > (/usr/local/etc/postgresql.conf). > > Any hints, tips or help is most appreciated! > > Kjell Tore. > > > > > > On 6/21/05, PFC <lists@boutiquenumerique.com> wrote: >> >> use CURRENT_TIME which is a constant instead of now() which is not >> considered constant... (I think) >> >> Paul Ramsey Refractions Research Email: pramsey@refractions.net Phone: (250) 885-0632
pgsql-performance by date: