Re: Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version - Mailing list pgsql-performance
From | PFC |
---|---|
Subject | Re: Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version |
Date | |
Msg-id | op.ux1x15arcigqcu@soyouz Whole thread Raw |
In response to | Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version (parimala <parimala@zohocorp.com>) |
List | pgsql-performance |
> ,some of our queries to the database taking long time to return the > results. > fsync: off (even we tested this parameter is on ,we observed the same > slowness ) If your queries take long time to return results, I suppose you are talking about SELECTs. fsync = off will not make SELECTs faster (only inserts, updates, deletes) but it is not worth it as you risk data loss. synchronous_commit = on has about the same advantages (faster...) as fsync=off, but with no risk of data loss, so it is much better ! > We have 300k row's in PolledData Table.In each STATSDATA table ,we have > almost 12 to 13 million rows. OK. So you insert 13 million rows per day ? That is about 150 rows per second. > Every one minute interval ,we insert data into to STATSDATA table. I assume you are making an INSERT INTO statsdata VALUES (...... 150 values .....) and not 150 inserts, yes ? > First Query : > SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_21_2009 WHERE ( ( PolledData.ID=STATSDATA8_21_2009.POLLID) AND ( ( TTIME >= 1250838027454) AND ( TTIME <=1250838079654) ) ) ) t1; * You could rewrite as : SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID) WHERE TTIME BETWEEN ... AND ... - It is exactly the same query, but much easier to read. * some ANALYZE-ing of your tables would be useful, since the estimates from the planner look suspiciously different from reality - ANALYZE is fast, you can run it often if you INSERT rows all the time * You are joining on POLLID which is a NUMERIC in one table and a BIGINT in the other table. - Is there any reason for this type difference ? - Could you use BIGINT in both tables ? - BIGINT is faster than NUMERIC and uses less space. - Type conversions use CPU cycles too. * Should StatsData.ID have a foreign key REFERENCES PolledData.ID ? - This won't make the query faster, but if you know all rows in StatsData reference rows in PolledData (because of the FK constraint) and you want a count(*) like above, you don't need to JOIN. * TTIME >= 1250838027454 AND TTIME <=1250838079654 - TTIME should be TIMESTAMP (with or without TIMEZONE) or BIGINT but certainly not NUMERIC - An index on StatsData.TTIME would be useful, it would avoid Seq Scan, replacing it with a Bitmap Scan, much faster * work_mem - since you have few connections you could increase work_mem > Second Query : Same as first query > Third Query SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_21_2009 WHERE ( ( PolledData.ID=STATSDATA8_21_2009.POLLID) AND ( ( TTIME >= 1250838027454) AND ( TTIME <=1250838027454) ) ) union all SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_20_2009 WHERE ( ( PolledData.ID=STATSDATA8_20_2009.POLLID) AND ( ( TTIME >= 1250767134601) AND ( TTIME <= 1250767134601) ) ) )t1 ; Basically this is, again, exactly the same query as above, but two times, and UNION ALL'ed * You could rewrite it like this : SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM ( SELECT ... FROM STATSDATA8_21_2009 WHERE TTIME BETWEEN ... AND ... ) UNION ALL SELECT ... FROM STATSDATA8_20_2009 WHERE TTIME BETWEEN ... AND ... ) ) JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID) * If TTIME is the current time, and you insert data as it comes, data in StatsData tables is probably already ordered on TTIME. - If it is not the case, once a table is filled and becomes read-only, consider CLUSTER on the index you created on TTIME - It will make range queries on TTIME much faster * Query plan Seq Scan on statsdata8_21_2009 (cost=0.00..70574.88 rows=1 width=32) (actual time=0.047..29066.227 rows=227 loops=1) Seq Scan on statsdata8_20_2009 (cost=0.00..382519.60 rows=1 width=32) (actual time=3136.008..93985.540 rows=1 loops=1) Postgres thinks there is 1 row in those tables... that's probably not the case ! The first one returns 227 rows, so the plan chosen in a catastrophe. I was a bit intrigued by your query, so I made a little test... BEGIN; CREATE TABLE test( x INT, y INT ); INSERT INTO test (SELECT n,n FROM generate_series( 1,1000000 ) AS n ); CREATE INDEX test_x ON test( x ); CREATE INDEX test_y ON test( y ); COMMIT; ANALYZE test; test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE a.x BETWEEN 0 AND 10000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=480.53..23759.14 rows=10406 width=16) (actual time=15.614..1085.085 rows=10000 loops=1) Hash Cond: (b.x = a.x) -> Seq Scan on test b (cost=0.00..14424.76 rows=999976 width=8) (actual time=0.013..477.516 rows=1000000 loops=1) -> Hash (cost=350.46..350.46 rows=10406 width=8) (actual time=15.581..15.581 rows=10000 loops=1) -> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406 width=8) (actual time=0.062..8.537 rows=10000 loops=1) Index Cond: ((x >= 0) AND (x <= 10000)) Total runtime: 1088.462 ms (7 lignes) test=> set enable_seqscan TO 0; SET test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE a.x BETWEEN 0 AND 10000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..30671.03 rows=10406 width=16) (actual time=0.075..85.897 rows=10000 loops=1) -> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406 width=8) (actual time=0.066..8.377 rows=10000 loops=1) Index Cond: ((x >= 0) AND (x <= 10000)) -> Index Scan using test_x on test b (cost=0.00..2.90 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=10000) Index Cond: (b.x = a.x) Total runtime: 90.160 ms (6 lignes) test=> set enable_nestloop TO 0; SET test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE a.x BETWEEN 0 AND 10000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..31200.45 rows=10406 width=16) (actual time=0.081..35.735 rows=10000 loops=1) Merge Cond: (a.x = b.x) -> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406 width=8) (actual time=0.059..8.093 rows=10000 loops=1) Index Cond: ((x >= 0) AND (x <= 10000)) -> Index Scan using test_x on test b (cost=0.00..28219.98 rows=999976 width=8) (actual time=0.016..7.494 rows=10001 loops=1) Total runtime: 40.013 ms (6 lignes) I wonder why it doesn't choose the merge join at first...
pgsql-performance by date: