Re: How to optimize monstrous query, sorts instead of - Mailing list pgsql-performance
From | Michael Mattox |
---|---|
Subject | Re: How to optimize monstrous query, sorts instead of |
Date | |
Msg-id | CJEBLDCHAADCLAGIGCOOGEADCLAA.michael.mattox@verideon.com Whole thread Raw |
In response to | Re: How to optimize monstrous query, sorts instead of (Rod Taylor <rbt@rbt.ca>) |
List | pgsql-performance |
> Shared buffers is probably too high. How much memory in this machine? > Is there anything else running aside from PostgreSQL? What does top say > about cached / buffered data (number) I was using the 25% of RAM guideline posted recently. The machine has 1.5gig but it also has a couple other java applications running on it including tomcat. 1:56pm up 6 days, 2:58, 6 users, load average: 2.60, 2.07, 1.78 193 processes: 191 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 14.0% user, 9.0% system, 0.0% nice, 75.1% idle CPU1 states: 31.0% user, 0.1% system, 0.0% nice, 67.0% idle CPU2 states: 5.0% user, 0.1% system, 0.1% nice, 93.0% idle CPU3 states: 0.0% user, 0.1% system, 0.1% nice, 98.0% idle Mem: 1547572K av, 1537848K used, 9724K free, 0K shrd, 25104K buff Swap: 1044216K av, 51352K used, 992864K free 1245460K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 10184 veriguar 9 0 94760 83M 2612 S 36.5 5.5 13:29 java 8990 postgres 17 0 54864 53M 53096 R 11.5 3.5 0:00 postmaster 8988 veriguar 11 0 1164 1164 836 R 1.9 0.0 0:00 top 10161 veriguar 13 5 69504 60M 2600 S N 0.9 3.9 13:11 java 10206 veriguar 13 5 27952 23M 2580 S N 0.9 1.5 7:21 java 10699 postgres 9 0 31656 30M 30396 S 0.9 2.0 0:02 postmaster total used free shared buffers cached Mem: 1547572 1532024 15548 0 23820 1239024 -/+ buffers/cache: 269180 1278392 Swap: 1044216 51368 992848 > I see you reduced the random_page_cost to 1.5. Why did you do this (how > is your disk subsystem configured)? Someone suggested I lower it to 1.5 or 1.0, not sure what the reasoning was. The disks are both SCSI 10,000 RPM. My data directory is on one disk by itself, and the pg_xlog is on the other disk as well as the operating system and everything else. I was told it's best to have them on seperate disks, however I'm wondering because my system only has two disks and the one with the operating system isn't big enough to hold my database therefore I must put my DB on the 2nd disk and if pg_xlog is to be separate, it has to be with the OS & Java apps. > Standard questions, did you VACUUM? Regularly? Want to try again and > send us the output from VACUUM VERBOSE? I vacuum the monitor table every 5 minutes and I do a vacuum full analyze every night at midnight (cron job). I just did a vacuum verbose, output is attached. > Sounds like you created a ton of test data, then removed a bunch? Did > you REINDEX that table? I haven't deleted any of the data, I've been continuously adding new data. I added about 6 million rows at once, and they all had the same date. Since then my application has been stress testing over about 2 weeks now so there's now 7693057 rows in monitorstatusx and monitorstatusitemx as well as the necessary rows for the join tables. > During normal use, what is your query spread like? Mostly selects with > some inserts? Any updates or deletes? How often to updates or deletes > come in, and how many rows do they effect? There is a query on monitorx by datex every 10 seconds (monitors are updated every 5 minutes, so every 10 seconds I get the monitors that are due for an update). Each monitor is then saved with its status field modified, and a new status item is inserted. This happens every 5 minutes. There are 8-16 monitors being run in parallel, although typically it's 8 or less. This is the main application. The reporting application does a few queries but nothing major except the query that is the subject of this email. It's the reporting app that is slow due to this one big query. Finally the web app executes the same query as the reporting app, except there is a lot less data to be returned since it's only for the current day. > > -> Index Scan using > monitorstatusxmonitori on > > monitorstatusx ms (cost=0.00..5996.18 rows=163 width=84) (actual > > time=4383.38..14936.39 rows=225 loops=1) > > Index Cond: ("outer".jdoidx = > ms.monitorx) > > Filter: ((datex >= '2003-07-01 > > 00:00:00'::timestamp without time zone) AND (datex <= '2003-07-01 > > 23:59:59'::timestamp without time zone)) > > The above index scan is taking a vast majority of the time (nearly 15 > seconds of the 16 second total -- stop thinking about sorts!).. What > happened to the index on monitorx and datex? I just did reindex table monitorstatux; which didn't help, in fact query times went up. I then did create index monitorstatus_monitor_date_i on monitorstatusx(monitorx, datex); and this seemed to help a little: QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------------------------------------------------------------- Sort (cost=1133.13..1133.38 rows=98 width=827) (actual time=9754.06..9754.25 rows=226 loops=1) Sort Key: ms.datex -> Nested Loop (cost=0.00..1129.90 rows=98 width=827) (actual time=50.81..9753.17 rows=226 loops=1) -> Nested Loop (cost=0.00..833.47 rows=98 width=162) (actual time=50.74..7149.28 rows=226 loops=1) -> Nested Loop (cost=0.00..537.04 rows=98 width=146) (actual time=50.67..4774.45 rows=226 loops=1) -> Nested Loop (cost=0.00..240.44 rows=98 width=130) (actual time=50.61..1515.10 rows=226 loops=1) -> Index Scan using monitorx_id_index on monitorx mx (cost=0.00..3.45 rows=1 width=46) (actual time=0.09..0.11 rows=1 loops=1) Index Cond: (idx = 'M-TEST_170-TEST_00_10560857890510173779233'::character varying) -> Index Scan using monitorstatus_monitor_date_i on monitorstatusx ms (cost=0.00..234.93 rows=165 width=84) (actual time=50.51..1513.21 rows=226 loops=1) Index Cond: (("outer".jdoidx = ms.monitorx) AND (ms.datex >= '2003-07-01 00:00:00'::timestamp without time zone) AND (ms.datex <= '2003-07-01 23:59:59'::timestamp without time zone)) -> Index Scan using monitorstatus_stjdoidb742c9b3i on monitorstatus_statusitemsx mssisx (cost=0.00..3.01 rows=1 width=16) (actual time=14.40..14.41 rows=1 loops=226) Index Cond: ("outer".jdoidx = mssisx.jdoidx) -> Index Scan using monitorstatusitejdoid7db0befci on monitorstatusitemlistd8ea58a5x litem (cost=0.00..3.01 rows=1 width=16) (actual time=10.49..10.49 rows=1 loops=226) Index Cond: ("outer".statusitemsx = litem.jdoidx) -> Index Scan using monitorstatusitemx_pkey on monitorstatusitemx msi (cost=0.00..3.01 rows=1 width=665) (actual time=11.50..11.50 rows=1 loops=226) Index Cond: ("outer".statusitemlistx = msi.jdoidx) Total runtime: 9754.64 msec (17 rows) Before I guess the index with monitorx,datex didn't do much because all the data had the same date. But now that I have over 2 weeks of real data, it makes a difference. Thanks, Michael
Attachment
pgsql-performance by date: