Thread: Hash join on int takes 8..114 seconds
Query below seems to use indexes everywhere in most optimal way. dokumnr column is of type int Speed of this query varies rapidly: In live db fastest response I have got is 8 seconds. Re-running same query after 10 seconds may take 60 seconds. Re-running it again after 10 seconds may take 114 seconds. Any idea how to speed it up ? Is it possible to optimize it, will upgrading to 8.3.5 help or should I require to add more RAM, disk or CPU speed ? Real query contains column list instead of sum(1) used in test below. Andrus. explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik using(grupp,liik) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' longest response time: "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=114479.933..114479.936 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=100435.523..114403.293 rows=20588 loops=1)" " Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" " -> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=100405.258..114207.387 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=18.312..18.325 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=100386.921..114037.986 rows=20588 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=11119.932..76225.918 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807 rows=280599 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=35082.427..35082.427 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331 rows=105202 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=30.220..30.220 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=20.104..29.845 rows=84 loops=1)" "Total runtime: 114480.373 ms" Same query in other runs: "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=62164.496..62164.500 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=46988.005..62088.379 rows=20588 loops=1)" " Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" " -> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=46957.750..61893.613 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=146.530..146.543 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=46811.194..61595.560 rows=20588 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=1870.209..55864.237 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=1863.713..1863.713 rows=280599 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=1650.823..1650.823 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=0.091..1190.962 rows=105202 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=30.210..30.210 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=20.069..29.836 rows=84 loops=1)" "Total runtime: 62164.789 ms" "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=40185.499..40185.503 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=32646.761..40109.470 rows=20585 loops=1)" " Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" " -> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=32645.933..39944.242 rows=20585 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=0.072..0.085 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=32645.839..39793.180 rows=20585 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=1823.542..36318.419 rows=277291 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=1817.053..1817.053 rows=280596 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=1242.785..1242.785 rows=105195 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=0.088..788.399 rows=105195 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=0.786..0.786 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=0.019..0.419 rows=84 loops=1)" "Total runtime: 40186.102 ms" "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=29650.398..29650.402 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=23513.713..29569.448 rows=20588 loops=1)" " Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" " -> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=23512.808..29388.712 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=61.813..61.825 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=23450.970..29163.774 rows=20588 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=2098.630..24362.104 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=2086.772..2086.772 rows=280599 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=2860.088..2860.088 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=0.088..2365.063 rows=105202 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=0.861..0.861 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=0.039..0.458 rows=84 loops=1)" "Total runtime: 29650.696 ms" "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=11131.392..11131.396 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=9179.703..11043.906 rows=20588 loops=1)" " Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" " -> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=9178.827..10858.383 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=10.252..10.264 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=9168.552..10675.424 rows=20588 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=2129.814..7152.618 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=2123.223..2123.223 rows=280599 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=1414.254..1414.254 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=0.092..895.533 rows=105202 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=0.833..0.833 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=0.043..0.465 rows=84 loops=1)" "Total runtime: 11131.694 ms" Environment: 3-8 concurrent users "PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)" # free total used free shared buffers cached Mem: 2075828 2008228 67600 0 0 1904552 -/+ buffers/cache: 103676 1972152 Swap: 3911816 76 3911740
> Query below seems to use indexes everywhere in most optimal way. > dokumnr column is of type int > > Speed of this query varies rapidly: > > In live db fastest response I have got is 8 seconds. > Re-running same query after 10 seconds may take 60 seconds. > Re-running it again after 10 seconds may take 114 seconds. > > Any idea how to speed it up ? > > Is it possible to optimize it, will upgrading to 8.3.5 help or should I > require to add more RAM, disk or CPU speed ? > > Real query contains column list instead of sum(1) used in test below. > > Andrus. Just a question, what are you doing with the 20.000 result rows ?
> Just a question, what are you doing with the 20.000 result rows ? Those rows represent monthly sales data of one item. They are used as following: 1. Detailed sales report for month. This report can browsed in screen for montly sales and ordering analysis. 2. Total reports. In those reports, sum( sales), sum(quantity) is used to get total sales in day, week, month, time for item and resulting rows are summed. Andrus.
Andrus wrote: > Query below seems to use indexes everywhere in most optimal way. > dokumnr column is of type int > > Speed of this query varies rapidly: > > In live db fastest response I have got is 8 seconds. > Re-running same query after 10 seconds may take 60 seconds. > Re-running it again after 10 seconds may take 114 seconds. > > Any idea how to speed it up ? > > Is it possible to optimize it, will upgrading to 8.3.5 help or should I > require to add more RAM, disk or CPU speed ? At a quick glance, the plans look the same to me. The overall costs are certainly identical. That means whatever is affecting the query times it isn't the query plan. > "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual > time=62164.496..62164.500 rows=1 loops=1)" > "Total runtime: 62164.789 ms" > "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual > time=40185.499..40185.503 rows=1 loops=1)" > "Total runtime: 40186.102 ms" > "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual > time=29650.398..29650.402 rows=1 loops=1)" > "Total runtime: 29650.696 ms" > "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual > time=11131.392..11131.396 rows=1 loops=1)" > "Total runtime: 11131.694 ms" So - what other activity is happening on this machine? Either other queries are taking up noticeable resources, or some other process is (it might be disk activity from checkpointing, logging some other application). -- Richard Huxton Archonet Ltd
Richard, > At a quick glance, the plans look the same to me. The overall costs are > certainly identical. That means whatever is affecting the query times it > isn't the query plan. > > So - what other activity is happening on this machine? Either other > queries are taking up noticeable resources, or some other process is (it > might be disk activity from checkpointing, logging some other > application). Thank you. This is dedicated server running only PostgreSql which serves approx 6 point of sales at this time. Maybe those other clients make queries which invalidate lot of data loaded into server cache. In next time server must read it again from disk which causes those perfomance differences. top output is currently: top - 13:13:10 up 22 days, 18:25, 1 user, load average: 0.19, 0.12, 0.19 Tasks: 53 total, 2 running, 51 sleeping, 0 stopped, 0 zombie Cpu(s): 13.7% us, 2.0% sy, 0.0% ni, 78.3% id, 6.0% wa, 0.0% hi, 0.0% si Mem: 2075828k total, 2022808k used, 53020k free, 0k buffers Swap: 3911816k total, 88k used, 3911728k free, 1908536k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5382 postgres 15 0 144m 43m 40m S 15.0 2.2 0:00.45 postmaster 5358 postgres 15 0 152m 87m 75m S 0.3 4.3 0:00.97 postmaster 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.35 init 2 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0 3 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0 4 root 10 -5 0 0 0 S 0.0 0.0 0:00.42 khelper 5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 7 root 10 -5 0 0 0 S 0.0 0.0 2:03.91 kblockd/0 8 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 115 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 114 root 15 0 0 0 0 S 0.0 0.0 8:49.67 kswapd0 116 root 10 -5 0 0 0 S 0.0 0.0 0:10.32 xfslogd/0 117 root 10 -5 0 0 0 S 0.0 0.0 0:39.96 xfsdatad/0 706 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 723 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 738 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0 740 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0 741 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1 742 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2 743 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_3 762 root 10 -5 0 0 0 S 0.0 0.0 0:17.54 xfsbufd 763 root 10 -5 0 0 0 S 0.0 0.0 0:00.68 xfssyncd 963 root 16 -4 1712 528 336 S 0.0 0.0 0:00.24 udevd 6677 root 15 0 1728 572 400 S 0.0 0.0 0:04.99 syslog-ng 7128 postgres 16 0 140m 10m 9900 S 0.0 0.5 0:05.60 postmaster in few seconds later: top - 13:14:01 up 22 days, 18:26, 1 user, load average: 1.72, 0.53, 0.32 Tasks: 52 total, 2 running, 50 sleeping, 0 stopped, 0 zombie Cpu(s): 5.3% us, 3.0% sy, 0.0% ni, 0.0% id, 91.0% wa, 0.0% hi, 0.7% si Mem: 2075828k total, 2022692k used, 53136k free, 0k buffers Swap: 3911816k total, 88k used, 3911728k free, 1905028k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1179 postgres 18 0 155m 136m 122m D 6.7 6.7 1:32.52 postmaster 4748 postgres 15 0 145m 126m 122m D 1.3 6.2 0:14.38 postmaster 5358 postgres 16 0 160m 98m 81m D 0.7 4.9 0:01.21 postmaster 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.35 init 2 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0 3 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0 4 root 10 -5 0 0 0 S 0.0 0.0 0:00.42 khelper 5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 7 root 10 -5 0 0 0 S 0.0 0.0 2:03.97 kblockd/0 8 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 115 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 114 root 15 0 0 0 0 S 0.0 0.0 8:49.79 kswapd0 116 root 10 -5 0 0 0 S 0.0 0.0 0:10.32 xfslogd/0 117 root 10 -5 0 0 0 S 0.0 0.0 0:39.96 xfsdatad/0 706 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 723 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 738 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0 740 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0 741 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1 742 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2 743 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_3 762 root 10 -5 0 0 0 S 0.0 0.0 0:17.54 xfsbufd 763 root 10 -5 0 0 0 S 0.0 0.0 0:00.68 xfssyncd 963 root 16 -4 1712 528 336 S 0.0 0.0 0:00.24 udevd 6677 root 15 0 1728 572 400 S 0.0 0.0 0:04.99 syslog-ng Andrus.
Andrus wrote: > Richard, > >> At a quick glance, the plans look the same to me. The overall costs are >> certainly identical. That means whatever is affecting the query times it >> isn't the query plan. >> >> So - what other activity is happening on this machine? Either other >> queries are taking up noticeable resources, or some other process is (it >> might be disk activity from checkpointing, logging some other >> application). > > Thank you. > This is dedicated server running only PostgreSql which serves approx 6 > point of sales at this time. > > Maybe those other clients make queries which invalidate lot of data > loaded into server cache. > In next time server must read it again from disk which causes those > perfomance differences. In addition to "top" below, you'll probably find "vmstat 5" useful. > top output is currently: > > top - 13:13:10 up 22 days, 18:25, 1 user, load average: 0.19, 0.12, 0.19 > Tasks: 53 total, 2 running, 51 sleeping, 0 stopped, 0 zombie > Cpu(s): 13.7% us, 2.0% sy, 0.0% ni, 78.3% id, 6.0% wa, 0.0% hi, > 0.0% si > Mem: 2075828k total, 2022808k used, 53020k free, 0k buffers > Swap: 3911816k total, 88k used, 3911728k free, 1908536k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 5382 postgres 15 0 144m 43m 40m S 15.0 2.2 0:00.45 postmaster > 5358 postgres 15 0 152m 87m 75m S 0.3 4.3 0:00.97 postmaster > 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.35 init Looks pretty quiet. > in few seconds later: > > top - 13:14:01 up 22 days, 18:26, 1 user, load average: 1.72, 0.53, 0.32 > Tasks: 52 total, 2 running, 50 sleeping, 0 stopped, 0 zombie > Cpu(s): 5.3% us, 3.0% sy, 0.0% ni, 0.0% id, 91.0% wa, 0.0% hi, > 0.7% si > Mem: 2075828k total, 2022692k used, 53136k free, 0k buffers > Swap: 3911816k total, 88k used, 3911728k free, 1905028k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 1179 postgres 18 0 155m 136m 122m D 6.7 6.7 1:32.52 postmaster > 4748 postgres 15 0 145m 126m 122m D 1.3 6.2 0:14.38 postmaster > 5358 postgres 16 0 160m 98m 81m D 0.7 4.9 0:01.21 postmaster > 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.35 init Here you're stuck waiting for disks (91.0% wa). Check out vmstat and iostat to see what's happening. -- Richard Huxton Archonet Ltd
OK so vmstat says you are IO-bound, this seems logical if the same plan has widely varying timings... Let's look at the usual suspects : - how many dead rows in your tables ? are your tables data, or bloat ? (check vacuum verbose, etc) - what's the size of the dataset relative to the RAM ? Now let's look more closely at the query : explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik using(grupp,liik) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' OK, so artliik is a very small table (84 rows) : Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=20.104..29.845 rows=84 loops=1) I presume doing the query without artliik changes nothing to the runtime, yes ? Let's look at the main part of the query : FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' Postgres's plan is logical. It starts by joining rid and dok since your WHERE is on those : -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=100386.921..114037.986 rows=20588 loops=1)" Hash Cond: ("outer".dokumnr = "inner".dokumnr)" -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=11119.932..76225.918 rows=277294 loops=1)" Recheck Cond: (toode = 'X05'::bpchar)" -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807 rows=280599 loops=1)" Index Cond: (toode = 'X05'::bpchar)" -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=35082.427..35082.427 rows=105202 loops=1)" -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331 rows=105202 loops=1)" Index Cond: (kuupaev >= '2008-09-01'::date)" Your problem here is that, no matter what, postgres will have to examine - all rows where dok.kuupaev>='2008-09-01', - and all rows where rid.toode = 'X05'. If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have to scan many, many more rows. If you perform this query often you could CLUSTER rid on (toode) and dok on (kuupaev), but this can screw other queries. What is the meaning of the columns ? To make this type of query faster I would tend to think about : - materialized views - denormalization (ie adding a column in one of your tables and a multicolumn index) - materialized summary tables (ie. summary of sales for last month, for instance) "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=114479.933..114479.936 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=100435.523..114403.293 rows=20588 loops=1)" " Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" " -> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=100405.258..114207.387 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=18.312..18.325 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=100386.921..114037.986 rows=20588 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=11119.932..76225.918 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807 rows=280599 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=35082.427..35082.427 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331 rows=105202 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=30.220..30.220 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=20.104..29.845 rows=84 loops=1)" "Total runtime: 114480.373 ms"
Richard, > In addition to "top" below, you'll probably find "vmstat 5" useful. Thank you. During this query run (65 sec), vmstat 5 shows big values in bi,cs and wa columns: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 88 51444 0 1854404 0 0 17 6 15 13 5 1 83 10 0 1 88 52140 0 1854304 0 0 3626 95 562 784 15 38 0 47 0 1 92 51608 0 1855668 0 0 14116 103 1382 2294 4 8 0 88 0 1 92 51620 0 1857256 0 1 15258 31 1210 1975 4 8 0 88 0 2 92 50448 0 1859188 0 0 13118 19 1227 1982 3 7 0 90 0 1 92 51272 0 1859088 0 0 7691 53 828 1284 14 4 0 82 0 1 92 52472 0 1858792 0 0 10691 9 758 968 3 7 0 89 0 2 92 51240 0 1858596 0 0 8204 7407 717 1063 2 5 0 93 0 1 92 51648 0 1860388 0 0 20622 121 1118 2229 12 9 0 79 2 1 92 50564 0 1861396 0 0 20994 3277 969 1681 15 8 0 76 1 0 92 52180 0 1860192 0 0 18542 36 802 1276 36 12 0 51 0 0 92 91872 0 1820948 0 1 15285 47 588 774 9 12 32 47 0 0 92 91904 0 1820948 0 0 0 4 251 18 0 0 100 0 0 0 92 92044 0 1820948 0 0 0 0 250 17 0 0 100 0 0 0 92 91668 0 1821156 0 0 27 93 272 66 5 0 92 3 0 0 92 91668 0 1821156 0 0 0 64 260 38 0 0 100 0 0 0 92 91636 0 1821156 0 0 0 226 277 71 0 0 100 0 0 0 92 91676 0 1821156 0 0 0 26 255 23 0 0 100 0 > Here you're stuck waiting for disks (91.0% wa). Check out vmstat and > iostat to see what's happening. typing iostat returns bash: iostat: command not found It seems that this is not installed in this gentoo. Andrus.
PFC, thank you. > OK so vmstat says you are IO-bound, this seems logical if the same plan > has widely varying timings... > > Let's look at the usual suspects : > > - how many dead rows in your tables ? are your tables data, or bloat ? > (check vacuum verbose, etc) set search_path to firma2,public; vacuum verbose dok; vacuum verbose rid INFO: vacuuming "firma2.dok" INFO: index "dok_pkey" now contains 1235086 row versions in 9454 pages DETAIL: 100 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.16s/0.38u sec elapsed 0.77 sec. INFO: index "dok_dokumnr_idx" now contains 1235086 row versions in 9454 pages DETAIL: 100 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.14s/0.40u sec elapsed 0.78 sec. INFO: index "dok_klient_idx" now contains 1235086 row versions in 18147 pages DETAIL: 887 index row versions were removed. 3265 index pages have been deleted, 3033 are currently reusable. CPU 0.36s/0.46u sec elapsed 31.87 sec. INFO: index "dok_krdokumnr_idx" now contains 1235086 row versions in 11387 pages DETAIL: 119436 index row versions were removed. 1716 index pages have been deleted, 1582 are currently reusable. CPU 0.47s/0.55u sec elapsed 63.38 sec. INFO: index "dok_kuupaev_idx" now contains 1235101 row versions in 10766 pages DETAIL: 119436 index row versions were removed. 659 index pages have been deleted, 625 are currently reusable. CPU 0.62s/0.53u sec elapsed 40.20 sec. INFO: index "dok_tasudok_idx" now contains 1235104 row versions in 31348 pages DETAIL: 119436 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.18s/1.08u sec elapsed 118.97 sec. INFO: index "dok_tasudok_unique_idx" now contains 99 row versions in 97 pages DETAIL: 98 index row versions were removed. 80 index pages have been deleted, 80 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.48 sec. INFO: index "dok_tasumata_idx" now contains 1235116 row versions in 11663 pages DETAIL: 119436 index row versions were removed. 5340 index pages have been deleted, 5131 are currently reusable. CPU 0.43s/0.56u sec elapsed 53.96 sec. INFO: index "dok_tellimus_idx" now contains 1235122 row versions in 11442 pages DETAIL: 119436 index row versions were removed. 1704 index pages have been deleted, 1569 are currently reusable. CPU 0.45s/0.59u sec elapsed 76.91 sec. INFO: index "dok_yksus_pattern_idx" now contains 1235143 row versions in 5549 pages DETAIL: 119436 index row versions were removed. 529 index pages have been deleted, 129 are currently reusable. CPU 0.19s/0.46u sec elapsed 2.72 sec. INFO: index "dok_doktyyp" now contains 1235143 row versions in 3899 pages DETAIL: 119436 index row versions were removed. 188 index pages have been deleted, 13 are currently reusable. CPU 0.14s/0.44u sec elapsed 1.40 sec. INFO: index "dok_sihtyksus_pattern_idx" now contains 1235143 row versions in 5353 pages DETAIL: 119436 index row versions were removed. 286 index pages have been deleted, 5 are currently reusable. CPU 0.13s/0.45u sec elapsed 3.01 sec. INFO: "dok": removed 119436 row versions in 13707 pages DETAIL: CPU 0.80s/0.37u sec elapsed 14.15 sec. INFO: "dok": found 119436 removable, 1235085 nonremovable row versions in 171641 pages DETAIL: 2 dead row versions cannot be removed yet. There were 1834279 unused item pointers. 0 pages are entirely empty. CPU 6.56s/6.88u sec elapsed 450.54 sec. INFO: vacuuming "pg_toast.pg_toast_40595" INFO: index "pg_toast_40595_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_40595": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "firma2.rid" INFO: index "rid_pkey" now contains 3275197 row versions in 13959 pages DETAIL: 38331 index row versions were removed. 262 index pages have been deleted, 262 are currently reusable. CPU 0.42s/1.05u sec elapsed 58.56 sec. INFO: index "rid_dokumnr_idx" now contains 3275200 row versions in 14125 pages DETAIL: 38331 index row versions were removed. 572 index pages have been deleted, 571 are currently reusable. CPU 0.49s/1.14u sec elapsed 71.57 sec. INFO: index "rid_inpdokumnr_idx" now contains 3275200 row versions in 15103 pages DETAIL: 38331 index row versions were removed. 579 index pages have been deleted, 579 are currently reusable. CPU 0.66s/1.03u sec elapsed 68.38 sec. INFO: index "rid_toode_idx" now contains 3275224 row versions in 31094 pages DETAIL: 38331 index row versions were removed. 2290 index pages have been deleted, 2290 are currently reusable. CPU 1.39s/1.58u sec elapsed 333.82 sec. INFO: index "rid_rtellimus_idx" now contains 3275230 row versions in 7390 pages DETAIL: 18591 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.18s/0.66u sec elapsed 1.78 sec. INFO: index "rid_toode_pattern_idx" now contains 3275230 row versions in 16310 pages DETAIL: 17800 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.44s/1.04u sec elapsed 6.55 sec. INFO: "rid": removed 38331 row versions in 3090 pages DETAIL: CPU 0.20s/0.10u sec elapsed 5.49 sec. INFO: "rid": found 38331 removable, 3275189 nonremovable row versions in 165282 pages DETAIL: 0 dead row versions cannot be removed yet. There were 1878923 unused item pointers. 0 pages are entirely empty. CPU 5.06s/7.27u sec elapsed 607.59 sec. Query returned successfully with no result in 1058319 ms. > - what's the size of the dataset relative to the RAM ? Db size is 7417 MB relevant table sizes in desc by size order: 1 40595 dok 2345 MB 2 1214 pg_shdepend 2259 MB 3 40926 rid 2057 MB 6 1232 pg_shdepend_depender_index 795 MB 7 1233 pg_shdepend_reference_index 438 MB 8 44286 dok_tasudok_idx 245 MB 9 44299 rid_toode_idx 243 MB 10 44283 dok_klient_idx 142 MB 11 19103791 rid_toode_pattern_idx 127 MB 14 44298 rid_inpdokumnr_idx 118 MB 15 44297 rid_dokumnr_idx 110 MB 16 43573 rid_pkey 109 MB 18 44288 dok_tasumata_idx 91 MB 19 44289 dok_tellimus_idx 89 MB 20 44284 dok_krdokumnr_idx 89 MB 21 44285 dok_kuupaev_idx 84 MB 23 43479 dok_pkey 74 MB 24 44282 dok_dokumnr_idx 74 MB 25 19076304 rid_rtellimus_idx 58 MB 26 18663923 dok_yksus_pattern_idx 43 MB 27 18801591 dok_sihtyksus_pattern_idx 42 MB 29 18774881 dok_doktyyp 30 MB 46 40967 toode 13 MB server is HP Proliant DL320 G3 http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML CPU is 2.93Ghz Celeron 256kb cache. Server has 2 GB RAM. It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ mirrored disks. > Now let's look more closely at the query : > > explain analyze > SELECT sum(1) > FROM dok JOIN rid USING (dokumnr) > JOIN toode USING (toode) > LEFT JOIN artliik using(grupp,liik) > WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > > > I presume doing the query without artliik changes nothing to the runtime, > yes ? Yes. After removing artkliik from join I got response times 65 and 50 seconds, so this does not make difference. > Your problem here is that, no matter what, postgres will have to examine > - all rows where dok.kuupaev>='2008-09-01', > - and all rows where rid.toode = 'X05'. > If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have > to scan many, many more rows. Probably yes, since then it reads one year more sales data. > If you perform this query often you could CLUSTER rid on (toode) and dok > on (kuupaev), but this can screw other queries. Some reports are by sales date (dok.kuupaev) and customers. CLUSTER rid on (toode) slows them down. Also autovacuum cannot do clustering. > What is the meaning of the columns ? This is typical sales data: -- Receipt headers: DOK ( dokumnr INT SERIAL PRIMARY KEY, kuupaev DATE --- sales date ) -- Receipt details RID ( dokumnr INT, toode CHAR(20), -- item code CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr) REFERENCES dok (dokumnr), CONSTRAINT rid_toode_fkey FOREIGN KEY (toode) REFERENCES firma2.toode (toode) ) -- Products TOODE ( toode CHAR(20) PRIMARY KEY ) > To make this type of query faster I would tend to think about : > - denormalization (ie adding a column in one of your tables and a > multicolumn index) For this query it is possible to duplicate kuupaev column to rid table. However most of the this seems to go to scanning rid table, so I suspect that this will help. > - materialized views > - materialized summary tables (ie. summary of sales for last month, for > instance) There are about 1000 items and reports are different. Andrus.
Just the most important points: 1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB pages this means 1.4GB MB of data), but there are 1834279 unused item pointers (i.e. about 60% of the space is wasted) 2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means about 1.3GB of data), but there are 1878923 unused item pointers (i.e. about 30% of the space is wasted) 3) don't forget to execute analyze after vacuuming (or vacuum analyze) 4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB for "doc" table) - the difference seems too large for me. Anyway the amount of wasted rows seems significant to me - I'd try to solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will lock the table exclusively, but the results may be better (when sorting by a well chosen index). Don't forget to run ANALYZE afterwards. Several other things to consider: 1) Regarding the toode column - why are you using CHAR(20) when the values are actually shorter? This may significantly increase the amount of space required. 2) I've noticed the CPU used is Celeron, which may negatively affect the speed of hash computation. I'd try to replace it by something faster - say INTEGER as an artificial primary key of the "toode" table and using it as a FK in other tables. This might improve the "Bitmap Heap Scan on rid" part, but yes - it's just a minor improvement compared to the "Hash Join" part of the query. Materialized views seem like a good idea to me, but maybe I'm not seeing something. What do you mean by "reports are different"? If there is a lot of rows for a given product / day, then creating an aggregated table with (product code / day) as a primary key is quite simple. It may require a lot of disk space, but it'll remove the hash join overhead. But if the queries are very different, then it may be difficult to build such materialized view(s). regards Tomas > PFC, > > thank you. > >> OK so vmstat says you are IO-bound, this seems logical if the same plan >> has widely varying timings... >> >> Let's look at the usual suspects : >> >> - how many dead rows in your tables ? are your tables data, or bloat ? >> (check vacuum verbose, etc) > > set search_path to firma2,public; > vacuum verbose dok; vacuum verbose rid > > INFO: vacuuming "firma2.dok" > INFO: index "dok_pkey" now contains 1235086 row versions in 9454 pages > DETAIL: 100 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.16s/0.38u sec elapsed 0.77 sec. > INFO: index "dok_dokumnr_idx" now contains 1235086 row versions in 9454 > pages > DETAIL: 100 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.14s/0.40u sec elapsed 0.78 sec. > INFO: index "dok_klient_idx" now contains 1235086 row versions in 18147 > pages > DETAIL: 887 index row versions were removed. > 3265 index pages have been deleted, 3033 are currently reusable. > CPU 0.36s/0.46u sec elapsed 31.87 sec. > INFO: index "dok_krdokumnr_idx" now contains 1235086 row versions in > 11387 > pages > DETAIL: 119436 index row versions were removed. > 1716 index pages have been deleted, 1582 are currently reusable. > CPU 0.47s/0.55u sec elapsed 63.38 sec. > INFO: index "dok_kuupaev_idx" now contains 1235101 row versions in 10766 > pages > DETAIL: 119436 index row versions were removed. > 659 index pages have been deleted, 625 are currently reusable. > CPU 0.62s/0.53u sec elapsed 40.20 sec. > INFO: index "dok_tasudok_idx" now contains 1235104 row versions in 31348 > pages > DETAIL: 119436 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 1.18s/1.08u sec elapsed 118.97 sec. > INFO: index "dok_tasudok_unique_idx" now contains 99 row versions in 97 > pages > DETAIL: 98 index row versions were removed. > 80 index pages have been deleted, 80 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.48 sec. > INFO: index "dok_tasumata_idx" now contains 1235116 row versions in 11663 > pages > DETAIL: 119436 index row versions were removed. > 5340 index pages have been deleted, 5131 are currently reusable. > CPU 0.43s/0.56u sec elapsed 53.96 sec. > INFO: index "dok_tellimus_idx" now contains 1235122 row versions in 11442 > pages > DETAIL: 119436 index row versions were removed. > 1704 index pages have been deleted, 1569 are currently reusable. > CPU 0.45s/0.59u sec elapsed 76.91 sec. > INFO: index "dok_yksus_pattern_idx" now contains 1235143 row versions in > 5549 pages > DETAIL: 119436 index row versions were removed. > 529 index pages have been deleted, 129 are currently reusable. > CPU 0.19s/0.46u sec elapsed 2.72 sec. > INFO: index "dok_doktyyp" now contains 1235143 row versions in 3899 pages > DETAIL: 119436 index row versions were removed. > 188 index pages have been deleted, 13 are currently reusable. > CPU 0.14s/0.44u sec elapsed 1.40 sec. > INFO: index "dok_sihtyksus_pattern_idx" now contains 1235143 row versions > in 5353 pages > DETAIL: 119436 index row versions were removed. > 286 index pages have been deleted, 5 are currently reusable. > CPU 0.13s/0.45u sec elapsed 3.01 sec. > INFO: "dok": removed 119436 row versions in 13707 pages > DETAIL: CPU 0.80s/0.37u sec elapsed 14.15 sec. > INFO: "dok": found 119436 removable, 1235085 nonremovable row versions in > 171641 pages > DETAIL: 2 dead row versions cannot be removed yet. > There were 1834279 unused item pointers. > 0 pages are entirely empty. > CPU 6.56s/6.88u sec elapsed 450.54 sec. > INFO: vacuuming "pg_toast.pg_toast_40595" > INFO: index "pg_toast_40595_index" now contains 0 row versions in 1 pages > DETAIL: 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "pg_toast_40595": found 0 removable, 0 nonremovable row versions in > 0 > pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: vacuuming "firma2.rid" > INFO: index "rid_pkey" now contains 3275197 row versions in 13959 pages > DETAIL: 38331 index row versions were removed. > 262 index pages have been deleted, 262 are currently reusable. > CPU 0.42s/1.05u sec elapsed 58.56 sec. > INFO: index "rid_dokumnr_idx" now contains 3275200 row versions in 14125 > pages > DETAIL: 38331 index row versions were removed. > 572 index pages have been deleted, 571 are currently reusable. > CPU 0.49s/1.14u sec elapsed 71.57 sec. > INFO: index "rid_inpdokumnr_idx" now contains 3275200 row versions in > 15103 > pages > DETAIL: 38331 index row versions were removed. > 579 index pages have been deleted, 579 are currently reusable. > CPU 0.66s/1.03u sec elapsed 68.38 sec. > INFO: index "rid_toode_idx" now contains 3275224 row versions in 31094 > pages > DETAIL: 38331 index row versions were removed. > 2290 index pages have been deleted, 2290 are currently reusable. > CPU 1.39s/1.58u sec elapsed 333.82 sec. > INFO: index "rid_rtellimus_idx" now contains 3275230 row versions in 7390 > pages > DETAIL: 18591 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.18s/0.66u sec elapsed 1.78 sec. > INFO: index "rid_toode_pattern_idx" now contains 3275230 row versions in > 16310 pages > DETAIL: 17800 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.44s/1.04u sec elapsed 6.55 sec. > INFO: "rid": removed 38331 row versions in 3090 pages > DETAIL: CPU 0.20s/0.10u sec elapsed 5.49 sec. > INFO: "rid": found 38331 removable, 3275189 nonremovable row versions in > 165282 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 1878923 unused item pointers. > 0 pages are entirely empty. > CPU 5.06s/7.27u sec elapsed 607.59 sec. > > Query returned successfully with no result in 1058319 ms. > >> - what's the size of the dataset relative to the RAM ? > > Db size is 7417 MB > relevant table sizes in desc by size order: > > 1 40595 dok 2345 MB > 2 1214 pg_shdepend 2259 MB > 3 40926 rid 2057 MB > 6 1232 pg_shdepend_depender_index 795 MB > 7 1233 pg_shdepend_reference_index 438 MB > 8 44286 dok_tasudok_idx 245 MB > 9 44299 rid_toode_idx 243 MB > 10 44283 dok_klient_idx 142 MB > 11 19103791 rid_toode_pattern_idx 127 MB > 14 44298 rid_inpdokumnr_idx 118 MB > 15 44297 rid_dokumnr_idx 110 MB > 16 43573 rid_pkey 109 MB > 18 44288 dok_tasumata_idx 91 MB > 19 44289 dok_tellimus_idx 89 MB > 20 44284 dok_krdokumnr_idx 89 MB > 21 44285 dok_kuupaev_idx 84 MB > 23 43479 dok_pkey 74 MB > 24 44282 dok_dokumnr_idx 74 MB > 25 19076304 rid_rtellimus_idx 58 MB > 26 18663923 dok_yksus_pattern_idx 43 MB > 27 18801591 dok_sihtyksus_pattern_idx 42 MB > 29 18774881 dok_doktyyp 30 MB > 46 40967 toode 13 MB > > server is HP Proliant DL320 G3 > http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML > CPU is 2.93Ghz Celeron 256kb cache. > > Server has 2 GB RAM. > It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ > mirrored disks. > >> Now let's look more closely at the query : >> >> explain analyze >> SELECT sum(1) >> FROM dok JOIN rid USING (dokumnr) >> JOIN toode USING (toode) >> LEFT JOIN artliik using(grupp,liik) >> WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' >> >> >> I presume doing the query without artliik changes nothing to the >> runtime, >> yes ? > > Yes. After removing artkliik from join I got response times 65 and 50 > seconds, so this does not make difference. > >> Your problem here is that, no matter what, postgres will have to examine >> - all rows where dok.kuupaev>='2008-09-01', >> - and all rows where rid.toode = 'X05'. >> If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have >> to scan many, many more rows. > > Probably yes, since then it reads one year more sales data. > >> If you perform this query often you could CLUSTER rid on (toode) and dok >> on (kuupaev), but this can screw other queries. > > Some reports are by sales date (dok.kuupaev) and customers. > CLUSTER rid on (toode) slows them down. Also autovacuum cannot do > clustering. > >> What is the meaning of the columns ? > > This is typical sales data: > > -- Receipt headers: > DOK ( dokumnr INT SERIAL PRIMARY KEY, > kuupaev DATE --- sales date > ) > > -- Receipt details > RID ( dokumnr INT, > toode CHAR(20), -- item code > CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr) REFERENCES dok > (dokumnr), > CONSTRAINT rid_toode_fkey FOREIGN KEY (toode) > REFERENCES firma2.toode (toode) > ) > > -- Products > TOODE ( > toode CHAR(20) PRIMARY KEY > ) > >> To make this type of query faster I would tend to think about : > >> - denormalization (ie adding a column in one of your tables and a >> multicolumn index) > > For this query it is possible to duplicate kuupaev column to rid table. > However most of the this seems to go to scanning rid table, so I suspect > that this will help. > >> - materialized views >> - materialized summary tables (ie. summary of sales for last month, for >> instance) > > There are about 1000 items and reports are different. > > Andrus. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Andrus wrote: >> - what's the size of the dataset relative to the RAM ? > > Db size is 7417 MB > relevant table sizes in desc by size order: > > 1 40595 dok 2345 MB > 2 1214 pg_shdepend 2259 MB > 6 1232 pg_shdepend_depender_index 795 MB > 7 1233 pg_shdepend_reference_index 438 MB These three are highly suspicious. They track dependencies between system object (so you can't drop function F because trigger T depends on it). http://www.postgresql.org/docs/8.3/static/catalog-pg-shdepend.html You've got 3.5GB of data there, which is a *lot* of dependencies. Try "SELECT count(*) FROM pg_shdepend". If it's not a million rows, then the table is bloated. Try (as postgres or some other db superuser) "vacuum full pg_shdepend" and a "reindex pg_shdepend". If it is a million rows, you'll need to find out why. Do you have a lot of temporary tables that aren't being dropped or something similar? -- Richard Huxton Archonet Ltd
Richard, Thank you. > Try "SELECT count(*) FROM pg_shdepend". This query returns 3625 and takes 35 seconds to run. > If it's not a million rows, then the table is bloated. Try (as postgres > or some other db superuser) "vacuum full pg_shdepend" and a "reindex > pg_shdepend". vacuum full verbose pg_shdepend INFO: vacuuming "pg_catalog.pg_shdepend" INFO: "pg_shdepend": found 16103561 removable, 3629 nonremovable row versions in 131425 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 49 to 49 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 1009387632 bytes. 131363 pages are or will become empty, including 0 at the end of the table. 131425 pages containing 1009387632 free bytes are potential move destinations. CPU 2.12s/1.69u sec elapsed 52.66 sec. INFO: index "pg_shdepend_depender_index" now contains 3629 row versions in 101794 pages DETAIL: 16103561 index row versions were removed. 101311 index pages have been deleted, 20000 are currently reusable. CPU 20.12s/14.52u sec elapsed 220.66 sec. After 400 seconds of run I got phone calls that server does not respond to other clients. So I was forced to cancel "vacuum full verbose pg_shdepend " command. How to run it so that other users can use database at same time ? > If it is a million rows, you'll need to find out why. Do you have a lot > of temporary tables that aren't being dropped or something similar? Application creates temporary tables in many places. Every sales operation probably creates some temporary tables. Should I change something in configuration or change application (Only single POS application which is used to access this db) or is only solution to manully run vacuum full pg_shdepend reindex pg_shdepend periodically ? How to vacuum full pg_shdepend automatically so that other users can work at same time ? Hopefully this table size does not affect to query speed. Andrus.
> How to vacuum full pg_shdepend automatically so that other users can > work at same time ? Your table is horribly bloated. You must use VACUUM FULL + REINDEX (as superuser) on it, however unfortunately, it is blocking. Therefore, you should wait for sunday night to do this, when noone will notice. Meanwhile, you can always VACUUM it (as superuser) and REINDEX it. And while you're at it, VACUUM FULL + reindex the entire database. To avoid such annoyances in the future, you should ensure that autovacuum runs properly ; you should investigate this. If you use a cron'ed VACUUM that does not run as superuser, then it will not be able to VACUUM the system catalogs, and the problem will come back.
> Server has 2 GB RAM. > It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ > mirrored disks. You could perhaps run a little check on the performance of the RAID, is it better than linux software RAID ? Does it leverage NCQ appropriately when running queries in parallel ? > -- Receipt headers: > DOK ( dokumnr INT SERIAL PRIMARY KEY, > kuupaev DATE --- sales date > ) > -- Receipt details > RID ( dokumnr INT, > toode CHAR(20), -- item code > CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr) REFERENCES dok > (dokumnr), > CONSTRAINT rid_toode_fkey FOREIGN KEY (toode) > REFERENCES firma2.toode (toode) > ) > -- Products > TOODE ( > toode CHAR(20) PRIMARY KEY > ) OK so pretty straightforward : dok <-(dokumnr)-> rid <-(toode)-> toode toode.toode should really be an INT though. >> explain analyze >> SELECT sum(1) >> FROM dok JOIN rid USING (dokumnr) >> JOIN toode USING (toode) >> LEFT JOIN artliik using(grupp,liik) >> WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' By the way, note that the presence of the toode table in the query above is not required at all, unless you use columns of toode in your aggregates. Let's play with that, after all, it's friday night. BEGIN; CREATE TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL); CREATE TABLE products (product_id INTEGER NOT NULL, product_name TEXT NOT NULL); CREATE TABLE orders_products (order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, padding1 TEXT, padding2 TEXT); INSERT INTO products SELECT n, 'product number ' || n::TEXT FROM generate_series(1,40000) AS n; INSERT INTO orders SELECT n,'2000-01-01'::date + (n/1000 * '1 DAY'::interval) FROM generate_series(1,1000000) AS n; SET work_mem TO '1GB'; INSERT INTO orders_products SELECT a,b,'aibaifbaurgbyioubyfazierugybfoaybofauez', 'hfohbdsqbhjhqsvdfiuazvfgiurvgazrhbazboifhaoifh' FROM (SELECT DISTINCT (1+(n/10))::INTEGER AS a, (1+(random()*39999))::INTEGER AS b FROM generate_series( 1,9999999 ) AS n) AS x; DELETE FROM orders_products WHERE product_id NOT IN (SELECT product_id FROM products); DELETE FROM orders_products WHERE order_id NOT IN (SELECT order_id FROM orders); ALTER TABLE orders ADD PRIMARY KEY (order_id); ALTER TABLE products ADD PRIMARY KEY (product_id); ALTER TABLE orders_products ADD PRIMARY KEY (order_id,product_id); ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES products( product_id ) ON DELETE CASCADE; ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders( order_id ) ON DELETE CASCADE; CREATE INDEX orders_date ON orders( order_date ); COMMIT; SET work_mem TO DEFAULT; ANALYZE; With the following query : EXPLAIN ANALYZE SELECT sum(1) FROM orders JOIN orders_products USING (order_id) JOIN products USING (product_id) WHERE orders.order_date BETWEEN '2000-01-01' AND '2000-02-01' AND products.product_id = 12345; I get the following results : orders_products has a PK index on (order_id, product_id). I dropped it. No index on orders_products : => Big seq scan (16 seconds) Index on orders_products( product_id ) : Aggregate (cost=2227.22..2227.23 rows=1 width=0) (actual time=108.204..108.205 rows=1 loops=1) -> Nested Loop (cost=1312.30..2227.20 rows=7 width=0) (actual time=105.929..108.191 rows=6 loops=1) -> Index Scan using products_pkey on products (cost=0.00..8.27 rows=1 width=4) (actual time=0.010..0.014 rows=1 loops=1) Index Cond: (product_id = 12345) -> Hash Join (cost=1312.30..2218.85 rows=7 width=4) (actual time=105.914..108.167 rows=6 loops=1) Hash Cond: (orders_products.order_id = orders.order_id) -> Bitmap Heap Scan on orders_products (cost=6.93..910.80 rows=232 width=8) (actual time=0.194..2.175 rows=246 loops=1) Recheck Cond: (product_id = 12345) -> Bitmap Index Scan on orders_products_product_id (cost=0.00..6.87 rows=232 width=0) (actual time=0.129..0.129 rows=246 loops=1) Index Cond: (product_id = 12345) -> Hash (cost=949.98..949.98 rows=28432 width=4) (actual time=105.696..105.696 rows=31999 loops=1) -> Index Scan using orders_date on orders (cost=0.00..949.98 rows=28432 width=4) (actual time=0.059..64.443 rows=31999 loops=1) Index Cond: ((order_date >= '2000-01-01'::date) AND (order_date <= '2000-02-01'::date)) Total runtime: 108.357 ms (don't trust this timing, it's a bit cached, this is the same plan as you get) Index on orders_products( product_id ) and orders_products( order_id ): => Same plan Note that in this case, a smarter planner would use the new index to perform a BitmapAnd before hitting the heap to get the rows. Index on ( order_id, product_id ), orders_products( product_id ): Index on ( order_id, product_id ): => Different plan, slower (especially in second case). If a "order_date" column is added to the "orders_products" table to make it into some kind of materialized view : CREATE TABLE orders_products2 AS SELECT orders.order_id, orders.order_date, product_id FROM orders JOIN orders_products USING (order_id); And an index is created on (product_id, order_date) we get this : Aggregate (cost=100.44..100.45 rows=1 width=0) (actual time=0.176..0.177 rows=1 loops=1) -> Nested Loop (cost=0.00..100.42 rows=7 width=0) (actual time=0.083..0.168 rows=6 loops=1) -> Index Scan using products_pkey on products (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1) Index Cond: (product_id = 12345) -> Nested Loop (cost=0.00..92.08 rows=7 width=4) (actual time=0.068..0.147 rows=6 loops=1) -> Index Scan using orders_products2_pid_date on orders_products2 (cost=0.00..33.50 rows=7 width=8) (actual time=0.053..0.076 rows=6 loops=1) Index Cond: ((product_id = 12345) AND (order_date >= '2000-01-01'::date) AND (order_date <= '2000-02-01'::date)) -> Index Scan using orders_pkey on orders (cost=0.00..8.36 rows=1 width=4) (actual time=0.008..0.009 rows=1 loops=6) Index Cond: (orders.order_id = orders_products2.order_id) Total runtime: 0.246 ms An index on (order_date,product_id) produces the same effect ; the index scan is slower, but the heap scan uses the same amount of IO. Two indexes, (order_date) and (product_id), strangely, do not produce a BitmapAnd ; instead a plan with more IO is chosen. >> - denormalization (ie adding a column in one of your tables and a >> multicolumn index) > > For this query it is possible to duplicate kuupaev column to rid table. > However most of the this seems to go to scanning rid table, so I suspect > that this will help. Yes, most of the time goes to scanning rid table, and this is the time that should be reduced. Adding a date column in "rid" would allow you to create a multicolumn index on rid (dokumnr,date) which would massively speed up the particular query above. If you don't create a multicolumn index, this denormalization is useless. Basically instead of scanning all rows in "rid" where > >> - materialized views >> - materialized summary tables (ie. summary of sales for last month, for >> instance) > > There are about 1000 items and reports are different. It all depends on what you put in your summary table...
Thomas, Thank you. > Just the most important points: > > 1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB > pages this means 1.4GB MB of data), but there are 1834279 unused item > pointers (i.e. about 60% of the space is wasted) > > 2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means > about 1.3GB of data), but there are 1878923 unused item pointers (i.e. > about 30% of the space is wasted) > > 3) don't forget to execute analyze after vacuuming (or vacuum analyze) autovacuum is running. So if I understand properly, I must ran VACUUM FULL ANALYZE dok; VACUUM FULL ANALYZE rid; Those commands cause server probably to stop responding to other client like vacuum full pg_shdepend did. Should vacuum_cost_delay = 2000 allow other users to work when running those commands ? > 4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB > for "doc" table) - the difference seems too large for me. I used pg_total_relation_size(). So 2.3 GB includes indexes also: 8 44286 dok_tasudok_idx 245 MB 10 44283 dok_klient_idx 142 MB 18 44288 dok_tasumata_idx 91 MB 19 44289 dok_tellimus_idx 89 MB 20 44284dok_krdokumnr_idx 89 MB 21 44285 dok_kuupaev_idx 84 MB 22 43531 makse_pkey 77 MB 23 43479 dok_pkey 74 MB 24 44282 dok_dokumnr_idx 74 MB 26 18663923 dok_yksus_pattern_idx 43 MB 27 18801591 dok_sihtyksus_pattern_idx 42 MB > Anyway the amount of wasted rows seems significant to me - I'd try to > solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will > lock the table exclusively, but the results may be better (when sorting by > a well chosen index). Don't forget to run ANALYZE afterwards. How to invoke those commands so that other clients can continue work? I'm using 8.1.4. Log files show that autovacuum is running. I'm planning the following solution: 1. Set vacuum_cost_delay=2000 2. Run the following commands periodically in this order: VACUUM FULL; vacuum full pg_shdepend; CLUSTER rid on (toode); CLUSTER dok on (kuupaev); REINDEX DATABASE mydb; REINDEX SYSTEM mydb; ANALYZE; Are all those command required or can something leaved out ? > Several other things to consider: > > 1) Regarding the toode column - why are you using CHAR(20) when the values > are actually shorter? This may significantly increase the amount of space > required. There may be some products whose codes may be up to 20 characters. PostgreSQL does not hold trailing spaces in db, so this does *not* affect to space. > 2) I've noticed the CPU used is Celeron, which may negatively affect the > speed of hash computation. I'd try to replace it by something faster - say > INTEGER as an artificial primary key of the "toode" table and using it as > a FK in other tables. This might improve the "Bitmap Heap Scan on rid" > part, but yes - it's just a minor improvement compared to the "Hash Join" > part of the query. Natural key Toode CHAR(20) is used widely in different queries. Replacing it with INT surrogate key requires major application rewrite. Should I add surrogate index INT columns to toode and rid table and measure test query speed in this case? > Materialized views seem like a good idea to me, but maybe I'm not seeing > something. What do you mean by "reports are different"? If there is a lot > of rows for a given product / day, then creating an aggregated table with > (product code / day) as a primary key is quite simple. It may require a > lot of disk space, but it'll remove the hash join overhead. But if the > queries are very different, then it may be difficult to build such > materialized view(s). log file seems that mostly only those queries are slow: SELECT ... FROM dok JOIN rid USING (dokumnr) JOIN ProductId USING (ProductId) WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2 :p1 and :p2 are parameters different for different queries. dok contains several years of data. :p2 is usually only few previous months or last year ago. SELECT column list contains fixed list of known columns from all tables. How to create index or materialized view to optimize this types of queries ? Andrus.
> log file seems that mostly only those queries are slow: > > SELECT ... > FROM dok JOIN rid USING (dokumnr) > JOIN ProductId USING (ProductId) > WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2 > > :p1 and :p2 are parameters different for different queries. > > dok contains several years of data. :p2 is usually only few previous > months > or last year ago. > SELECT column list contains fixed list of known columns from all tables. > > How to create index or materialized view to optimize this types of > queries ? > I would remove some granularity, for instance create a summary table (materialized view) by month : - date (contains the first day of the month) - product_id - total quantity, total price sold in given month You get the idea. If your products belong to categories, and you make queries on all the products in a category, it could be worth making a summary table for categories also.
>> How to vacuum full pg_shdepend automatically so that other users can >> work at same time ? > > Your table is horribly bloated. > You must use VACUUM FULL + REINDEX (as superuser) on it, however > unfortunately, it is blocking. > Therefore, you should wait for sunday night to do this, when noone will > notice. Shops are closed late night for a short time, including sunday night. This time may be shorter than time required to complete VACUUM command. I discovered vacuum_cost_delay=2000 option. Will this remove blocking issue and allow vacuum full to work ? > Meanwhile, you can always VACUUM it (as superuser) and REINDEX it. I expect that autovacuum does this automatically. > And while you're at it, VACUUM FULL + reindex the entire database. > To avoid such annoyances in the future, you should ensure that autovacuum > runs properly ; you should investigate this. If you use a cron'ed VACUUM > that does not run as superuser, then it will not be able to VACUUM the > system catalogs, and the problem will come back. autovacuum is turned on in postgresql.conf file log file shows a lot of messages every day that database is vacuumed. I assume that it is running as user postgres. I do'nt understand how autovacuum can avoid this: it does not perform vacuum full so pg_shdepend ja my tables become bloated again and again. Andrus.
On Friday 21 November 2008, "Andrus" <kobruleht2@hot.ee> wrote: > Those commands cause server probably to stop responding to other client > like vacuum full pg_shdepend > did. > > Should vacuum_cost_delay = 2000 allow other users to work when running > those commands ? Any vacuum full or cluster will lock out other clients. A high vacuum_cost_delay will just make the vacuum run slower. -- Corporations will ingest natural resources and defecate garbage until all resources are depleted, debt can no longer be repaid and our money becomes worthless - Jay Hanson
Andrus wrote: > I discovered vacuum_cost_delay=2000 option. Will this remove blocking > issue and allow vacuum full to work ? No. Are you really using vacuum_cost_delay=2000? If so, therein lies your problem. That's a silly value to use for that variable. Useful values are in the 20-40 range probably, or maybe 10-100 being extremely generous. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro, > Are you really using vacuum_cost_delay=2000? If so, therein lies your > problem. That's a silly value to use for that variable. Useful values > are in the 20-40 range probably, or maybe 10-100 being extremely > generous. Thank you. My 8.1.4 postgresql.conf does not contain such option. So vacuum_cost_delay is off probably. Since doc does not recommend any value, I planned to use 2000 Will value of 30 allow other clients to work when VACUUM FULL is running ? Uncommented relevant values in postgresql.conf file are: shared_buffers = 15000 work_mem = 512 maintenance_work_mem = 131072 fsync = on effective_cache_size= 70000 log_min_duration_statement= 30000 Andrus.
Andrus wrote: > Will value of 30 allow other clients to work when VACUUM FULL is running ? 1. vacuum_cost_delay does not affect vacuum full 2. vacuum full is always blocking, regardless of settings So I gather you're not doing any vacuuming, eh? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
PFC <lists@peufeu.com> writes: > Index on orders_products( product_id ) and orders_products( order_id ): > => Same plan > Note that in this case, a smarter planner would use the new index to > perform a BitmapAnd before hitting the heap to get the rows. Considering that the query has no constraint on orders_products.order_id, I'm not sure what you think the extra index is supposed to be used *for*. (Well, we could put orders as the outside of a nestloop and then we'd have such a constraint, but with 30000 orders rows to process that plan would lose big.) (And yes, the planner did consider such a plan along the way. See choose_bitmap_and.) regards, tom lane
Alvaro, > 1. vacuum_cost_delay does not affect vacuum full > 2. vacuum full is always blocking, regardless of settings So only way is to disable other database acces if vacuum full is required. > So I gather you're not doing any vacuuming, eh? Log files for every day are full of garbage messages below. So I hope that vacuum is running well, isn't it ? Andrus. 2008-11-19 00:00:48 EET 11728 1 LOG: autovacuum: processing database "postgres" 2008-11-19 00:01:48 EET 11729 1 LOG: autovacuum: processing database "mydb1" 2008-11-19 00:02:48 EET 11730 1 LOG: autovacuum: processing database "emydb1" 2008-11-19 00:03:48 EET 11731 1 LOG: autovacuum: processing database "template1" 2008-11-19 00:04:48 EET 11732 1 LOG: autovacuum: processing database "testmydb1" 2008-11-19 00:05:48 EET 11733 1 LOG: autovacuum: processing database "mydb3" 2008-11-19 00:06:48 EET 11734 1 LOG: autovacuum: processing database "postgres" 2008-11-19 00:07:48 EET 11735 1 LOG: autovacuum: processing database "mydb1" 2008-11-19 00:08:48 EET 11736 1 LOG: autovacuum: processing database "emydb1" 2008-11-19 00:09:48 EET 11737 1 LOG: autovacuum: processing database "template1" 2008-11-19 00:10:48 EET 11750 1 LOG: autovacuum: processing database "testmydb1" 2008-11-19 00:11:48 EET 11751 1 LOG: autovacuum: processing database "mydb3" 2008-11-19 00:12:48 EET 11752 1 LOG: autovacuum: processing database "postgres" 2008-11-19 00:13:48 EET 11753 1 LOG: autovacuum: processing database "mydb1" 2008-11-19 00:14:48 EET 11754 1 LOG: autovacuum: processing database "emydb1" 2008-11-19 00:15:48 EET 11755 1 LOG: autovacuum: processing database "template1" 2008-11-19 00:16:48 EET 11756 1 LOG: autovacuum: processing database "testmydb1" 2008-11-19 00:17:48 EET 11757 1 LOG: autovacuum: processing database "mydb3" ...
Andrus wrote: >> So I gather you're not doing any vacuuming, eh? > > Log files for every day are full of garbage messages below. > So I hope that vacuum is running well, isn't it ? This does not really mean that autovacuum has done anything in the databases. If the times are consistently separated by 1 min, then it's possible that it always exits without doing anything. In such old a release we didn't have any decent logging mechanism in autovacuum :-( You can change log_min_messages to debug2 to see if it actually does anything or not. I suggest you connect to the problem database (and then to all others, just to be sure) and run "vacuum" (no full). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> Thank you. > My 8.1.4 postgresql.conf does not contain such option. So > vacuum_cost_delay is off probably. > Since doc does not recommend any value, I planned to use 2000 > > Will value of 30 allow other clients to work when VACUUM FULL is running ? No, as someone already noted the VACUUM FULL is blocking anyway (and does not use this value at all). > Uncommented relevant values in postgresql.conf file are: > > shared_buffers = 15000 > work_mem = 512 I'd consider increasing this value a little - 0.5 MB seems too low to me (but not necessarily). > maintenance_work_mem = 131072 > fsync = on > effective_cache_size= 70000 Well, your server has 2GB of RAM and usually it's recommended to set this value to about 60-70% of your RAM, so using 540MB (25%) seems quite low. Anyway this is just a hint to PostgreSQL, it does not increase memory consumption or so - it's just an estimate of how much data are cached by kernel. Anyway, I don't expect these values have significant effect in case of the issue solved in this thread. regards Tomas
> 2. Run the following commands periodically in this order: > > VACUUM FULL; > vacuum full pg_shdepend; > CLUSTER rid on (toode); > CLUSTER dok on (kuupaev); > REINDEX DATABASE mydb; > REINDEX SYSTEM mydb; > ANALYZE; > > Are all those command required or can something leaved out ? Running CLUSTER after VACUUM FULL is just a waste of time. In my experience CLUSTER is actually faster in case of such heavily bloated tables - I think this is caused by the fact that it creates indexes from the beginning instead of updating them (as VACUUM FULL does). So CLUSTER actually performs REINDEX, so I'd just run CLUSTER rid ON rid_pkey; CLUSTER dok ON dok_pkey; ANALYZE rid; ANALYZE dok; Clustering by other indexes might give better performance, using primary keys is just a safe guess here. This should improve the performance of your query and it seems these two tables are the most bloated ones. I wouldn't do the same maintenance on the other tables now - it's just a waste of time. > >> Several other things to consider: >> >> 1) Regarding the toode column - why are you using CHAR(20) when the >> values >> are actually shorter? This may significantly increase the amount of space >> required. > > There may be some products whose codes may be up to 20 characters. > PostgreSQL does not hold trailing spaces in db, so this does *not* > affect to > space. OK, I haven't realized this. You're right. >> 2) I've noticed the CPU used is Celeron, which may negatively affect the >> speed of hash computation. I'd try to replace it by something faster - >> say >> INTEGER as an artificial primary key of the "toode" table and using it as >> a FK in other tables. This might improve the "Bitmap Heap Scan on rid" >> part, but yes - it's just a minor improvement compared to the "Hash Join" >> part of the query. > > Natural key Toode CHAR(20) is used widely in different queries. > Replacing it with INT surrogate key requires major application rewrite. > > Should I add surrogate index INT columns to toode and rid table and measure > test query speed in this case? Test it. Create tables with fake data, and compare the performance with and without the surrogate keys. Using a simple data type instead of text gave me huge performance boost. For example one of my colleagues used VARCHAR(15) to store IP addresses, and then used them to join tables (and suffered by the poor perfomance). Replacing them by INET improved the speed by several orders of magnitude. >> Materialized views seem like a good idea to me, but maybe I'm not seeing >> something. What do you mean by "reports are different"? If there is a lot >> of rows for a given product / day, then creating an aggregated table with >> (product code / day) as a primary key is quite simple. It may require a >> lot of disk space, but it'll remove the hash join overhead. But if the >> queries are very different, then it may be difficult to build such >> materialized view(s). > > log file seems that mostly only those queries are slow: > > SELECT ... > FROM dok JOIN rid USING (dokumnr) > JOIN ProductId USING (ProductId) > WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2 > > :p1 and :p2 are parameters different for different queries. > > dok contains several years of data. :p2 is usually only few previous months > or last year ago. > SELECT column list contains fixed list of known columns from all tables. > > How to create index or materialized view to optimize this types of > queries ? Well, difficult to answer without detailed information about the queries you want to run, aggregated values, etc. Materialized views is a world on it's own, and the best solution depends on (for example): 1) what aggregated values are you interested in (additive values are the most primitive ones, while VARIANCE etc. make it difficult) 2) do you need current data, or is it OK that today's data are not available till midnight (for example)? Another thing you have to consider is whether you want to create materialized view with final or intermediary data and then compute the final data somehow (for example monthly totals from daily totals). The most primitive (but often sufficient) solution is recreating the materialized view periodically (for example every midnight). In your case it'd mean running something like CREATE TABLE materialized_view AS SELECT ... your query here ... GROUP BY productId, saleDate This gives you daily totals for each product - the clients then can run another query to compute the final data. But of course, if you need to maintain 'current' data, you may create a set of triggers to update the materialized view. Either after each modification or (more sophisticated) when it's needed. See for example a great presentation from this year's PGCon: http://www.pgcon.org/2008/schedule/events/69.en.html regards Tomas
> If it's not a million rows, then the table is bloated. Try (as postgres > or some other db superuser) "vacuum full pg_shdepend" and a "reindex > pg_shdepend". reindex table pg_shdepend causes error ERROR: shared table "pg_shdepend" can only be reindexed in stand-alone mode vacuum full verbose pg_shdepend seems to work but indexes are still bloated. How to remove index bloat ? sizes after vacuum full are below. pg_shdepend size 1234 MB includes its index sizes, so indexes are 100% bloated. 4 1214 pg_catalog.pg_shdepend 1234 MB 6 1232 pg_catalog.pg_shdepend_depender_index 795 MB 7 1233 pg_catalog.pg_shdepend_reference_index 439 MB Andrus. vacuum full verbose pg_shdepend; INFO: vacuuming "pg_catalog.pg_shdepend" INFO: "pg_shdepend": found 254 removable, 3625 nonremovable row versions in 131517 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 49 to 49 bytes long. There were 16115259 unused item pointers. Total free space (including removable row versions) is 1010091872 bytes. 131456 pages are or will become empty, including 8 at the end of the table. 131509 pages containing 1010029072 free bytes are potential move destinations. CPU 2.08s/0.92u sec elapsed 63.51 sec. INFO: index "pg_shdepend_depender_index" now contains 3625 row versions in 101794 pages DETAIL: 254 index row versions were removed. 101611 index pages have been deleted, 20000 are currently reusable. CPU 0.87s/0.28u sec elapsed 25.44 sec. INFO: index "pg_shdepend_reference_index" now contains 3625 row versions in 56139 pages DETAIL: 254 index row versions were removed. 56076 index pages have been deleted, 20000 are currently reusable. CPU 0.51s/0.15u sec elapsed 23.10 sec. INFO: "pg_shdepend": moved 1518 row versions, truncated 131517 to 25 pages DETAIL: CPU 5.26s/2.39u sec elapsed 89.93 sec. INFO: index "pg_shdepend_depender_index" now contains 3625 row versions in 101794 pages DETAIL: 1518 index row versions were removed. 101609 index pages have been deleted, 20000 are currently reusable. CPU 0.94s/0.28u sec elapsed 24.61 sec. INFO: index "pg_shdepend_reference_index" now contains 3625 row versions in 56139 pages DETAIL: 1518 index row versions were removed. 56088 index pages have been deleted, 20000 are currently reusable. CPU 0.54s/0.14u sec elapsed 21.11 sec. Query returned successfully with no result in 253356 ms
If Autovacuum was working, and your tables still got very bloated, it may be because your free space map is not configuredlarge enough. What is your value for max_fsm_pages? The effect of having max_fsm_pages or max_fsm_relations too small is bloating of tables and indexes. Increasing it too large will not use a lot of memory. Since you had over 130000 free pages in just one table below, makesure it is set to at least 150000. This may be overkill with regular vacuum or autovacuum, however its much better tohave this too large than too small. Your server has 2GB of RAM? You should make sure your shared_buffers is between 100MB and 400MB if this is a dedicated server,or more in some cases. If you can, plan to migrate to 8.3 (or 8.4 early next year). Vacuum, bloating, and configuration related to these have improveda great deal since 8.1. Although your Indexes below remain bloated, the fact that they have been vacuumed, combined with a large enough value setin max_fsm_pages, means that they should not get any larger. I am not sure how to foce these to be smaller. The larger size, after a vacuum and large enough max_fsm_pages value will also not cause a performance problem. It willwaste some disk space and be a bit slower to access, but only very slightly. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andrus Sent: Friday, November 21, 2008 1:51 PM To: Richard Huxton Cc: PFC; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Hash join on int takes 8..114 seconds > If it's not a million rows, then the table is bloated. Try (as postgres > or some other db superuser) "vacuum full pg_shdepend" and a "reindex > pg_shdepend". reindex table pg_shdepend causes error ERROR: shared table "pg_shdepend" can only be reindexed in stand-alone mode vacuum full verbose pg_shdepend seems to work but indexes are still bloated. How to remove index bloat ? sizes after vacuum full are below. pg_shdepend size 1234 MB includes its index sizes, so indexes are 100% bloated. 4 1214 pg_catalog.pg_shdepend 1234 MB 6 1232 pg_catalog.pg_shdepend_depender_index 795 MB 7 1233 pg_catalog.pg_shdepend_reference_index 439 MB Andrus. vacuum full verbose pg_shdepend; INFO: vacuuming "pg_catalog.pg_shdepend" INFO: "pg_shdepend": found 254 removable, 3625 nonremovable row versions in 131517 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 49 to 49 bytes long. There were 16115259 unused item pointers. Total free space (including removable row versions) is 1010091872 bytes. 131456 pages are or will become empty, including 8 at the end of the table. 131509 pages containing 1010029072 free bytes are potential move destinations. CPU 2.08s/0.92u sec elapsed 63.51 sec. INFO: index "pg_shdepend_depender_index" now contains 3625 row versions in 101794 pages DETAIL: 254 index row versions were removed. 101611 index pages have been deleted, 20000 are currently reusable. CPU 0.87s/0.28u sec elapsed 25.44 sec. INFO: index "pg_shdepend_reference_index" now contains 3625 row versions in 56139 pages DETAIL: 254 index row versions were removed. 56076 index pages have been deleted, 20000 are currently reusable. CPU 0.51s/0.15u sec elapsed 23.10 sec. INFO: "pg_shdepend": moved 1518 row versions, truncated 131517 to 25 pages DETAIL: CPU 5.26s/2.39u sec elapsed 89.93 sec. INFO: index "pg_shdepend_depender_index" now contains 3625 row versions in 101794 pages DETAIL: 1518 index row versions were removed. 101609 index pages have been deleted, 20000 are currently reusable. CPU 0.94s/0.28u sec elapsed 24.61 sec. INFO: index "pg_shdepend_reference_index" now contains 3625 row versions in 56139 pages DETAIL: 1518 index row versions were removed. 56088 index pages have been deleted, 20000 are currently reusable. CPU 0.54s/0.14u sec elapsed 21.11 sec. Query returned successfully with no result in 253356 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On Fri, 21 Nov 2008 21:07:02 +0100, Tom Lane <tgl@sss.pgh.pa.us> wrote: > PFC <lists@peufeu.com> writes: >> Index on orders_products( product_id ) and orders_products( order_id ): >> => Same plan > >> Note that in this case, a smarter planner would use the new index to >> perform a BitmapAnd before hitting the heap to get the rows. > > Considering that the query has no constraint on > orders_products.order_id, I'm not sure what you think the extra index is > supposed to be used *for*. > > (Well, we could put orders as the outside of a nestloop and then we'd > have such a constraint, but with 30000 orders rows to process that plan > would lose big.) > > (And yes, the planner did consider such a plan along the way. > See choose_bitmap_and.) > > regards, tom lane I think I didn't express myself correctly... Here the indexes are small (therefore well cached) but the orders_products table is large (and not cached). To reproduce this, I put this table on a crummy slow external USB drive. Between each of the following queries, pg was stopped, the USB drive unmounted, remounted, and pg restarted, to purge orders_products table out of all caches. I also modified the statistical distribution (see init script at bottom of message). EXPLAIN ANALYZE SELECT count(*) FROM orders JOIN orders_products USING (order_id) WHERE orders.order_date BETWEEN '2000-01-01' AND '2000-02-01' AND orders_products.product_id = 2345; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=5431.93..5431.94 rows=1 width=0) (actual time=5176.382..5176.382 rows=1 loops=1) -> Hash Join (cost=1575.13..5431.84 rows=35 width=0) (actual time=62.634..5176.332 rows=36 loops=1) Hash Cond: (orders_products.order_id = orders.order_id) -> Bitmap Heap Scan on orders_products (cost=21.27..3864.85 rows=1023 width=4) (actual time=7.041..5118.512 rows=1004 loops=1) Recheck Cond: (product_id = 2345) -> Bitmap Index Scan on orders_products_product_order (cost=0.00..21.02 rows=1023 width=0) (actual time=0.531..0.531 rows=1004 loops=1) Index Cond: (product_id = 2345) -> Hash (cost=1130.58..1130.58 rows=33862 width=4) (actual time=55.526..55.526 rows=31999 loops=1) -> Index Scan using orders_date on orders (cost=0.00..1130.58 rows=33862 width=4) (actual time=0.139..33.466 rows=31999 loops=1) Index Cond: ((order_date >= '2000-01-01'::date) AND (order_date <= '2000-02-01'::date)) Total runtime: 5176.659 ms This is the original query ; what I don't like about it is that it bitmapscans orders_products way too much, because it reads all orders for the specified product, not just orders in the date period we want. However, since Postgres scanned all order_id's corresponding to the date range already, to build the hash, the list of order_ids of interest is known at no extra cost. In this case, additionnally, correlation is 100% between order_id and date, so I can do : test=> SELECT max(order_id), min(order_id) FROM orders WHERE order_date BETWEEN '2000-01-01' AND '2000-02-01'; max | min -------+----- 31999 | 1 And I can add an extra condition to the query, like this : EXPLAIN ANALYZE SELECT count(*) FROM orders JOIN orders_products USING (order_id) WHERE orders.order_date BETWEEN '2000-01-01' AND '2000-02-01' AND orders_products.order_id BETWEEN 1 AND 31999 AND orders_products.product_id = 2345; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=426.80..426.81 rows=1 width=0) (actual time=179.233..179.233 rows=1 loops=1) -> Nested Loop (cost=0.00..426.79 rows=1 width=0) (actual time=6.667..179.190 rows=36 loops=1) -> Index Scan using orders_products_product_order on orders_products (cost=0.00..142.11 rows=34 width=4) (actual time=6.559..177.597 rows=36 loops=1) Index Cond: ((product_id = 2345) AND (order_id >= 1) AND (order_id <= 31999)) -> Index Scan using orders_pkey on orders (cost=0.00..8.36 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=36) Index Cond: (orders.order_id = orders_products.order_id) Filter: ((orders.order_date >= '2000-01-01'::date) AND (orders.order_date <= '2000-02-01'::date)) Total runtime: 179.392 ms This is with no cache on orders_products table. About 30X faster. Interestingly, when everything is cached, it's even faster (about 100X)... The plan I was thinking about was not a nested loop with 30K loops... this would be bad as you said. It would have been something like this : - There is an index on (product_id, order_id) - Build the hash from orders table (can't avoid it) -> Hash -> Index Scan using orders_date on orders Index Cond: ((order_date >= '2000-01-01'::date) AND (order_date <= '2000-02-01'::date)) - A slightly twisted bitmap scan form : -> Bitmap Heap Scan on orders_products Recheck Cond: (product_id = 2345) AND order_id IN (hash created above)) -> Bitmap Index Scan on orders_products_product_order Index Cond: (product_id = 2345 AND order_id IN (hash created above)) The Bitmap Index Scan sees the order_ids in the index it is scanning... they could be checked before checking the visibility in the heap for the big table. Test script: BEGIN; CREATE TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL); CREATE TABLE products (product_id INTEGER NOT NULL, product_name TEXT NOT NULL); CREATE TABLE orders_products (order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, padding1 TEXT, padding2 TEXT) TABLESPACE usb; INSERT INTO products SELECT n, 'product number ' || n::TEXT FROM generate_series(1,10001) AS n; INSERT INTO orders SELECT n,'2000-01-01'::date + (n/1000 * '1 DAY'::interval) FROM generate_series(1,1000000) AS n; SET work_mem TO '1GB'; INSERT INTO orders_products SELECT a,b,'aibaifbaurgbyioubyfazierugybfoaybofauez', 'hfohbdsqbhjhqsvdfiuazvfgiurvgazrhbazboifhaoifh' FROM (SELECT DISTINCT (1+(n/10))::INTEGER AS a, (1+(random()*10000))::INTEGER AS b FROM generate_series( 1,9999999 ) AS n) AS x; DELETE FROM orders_products WHERE product_id NOT IN (SELECT product_id FROM products); DELETE FROM orders_products WHERE order_id NOT IN (SELECT order_id FROM orders); ALTER TABLE orders ADD PRIMARY KEY (order_id); ALTER TABLE products ADD PRIMARY KEY (product_id); ALTER TABLE orders_products ADD PRIMARY KEY (order_id,product_id); ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES products( product_id ) ON DELETE CASCADE; ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders( order_id ) ON DELETE CASCADE; CREATE INDEX orders_date ON orders( order_date ); CREATE INDEX orders_products_product_order ON orders_products( product_id, order_id ); COMMIT; SET work_mem TO DEFAULT; ANALYZE;
> You could perhaps run a little check on the performance of the RAID, is > it better than linux software RAID ? > Does it leverage NCQ appropriately when running queries in parallel ? I was told that this RAID is software RAID. I have no experience what to check. This HP server was installed 3 years ago and in this time it was not high perfomance server. >>> explain analyze >>> SELECT sum(1) >>> FROM dok JOIN rid USING (dokumnr) >>> JOIN toode USING (toode) >>> LEFT JOIN artliik using(grupp,liik) >>> WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > > By the way, note that the presence of the toode table in the query above > is not required at all, unless you use columns of toode in your > aggregates. In real query, SELECT column list contains data form sales table dok (sale date and time) and sales detail table rid (quantity, price) WHERE clause may contain additional filters from product table (product category, supplier). > Let's play with that, after all, it's friday night. Thank you very much for great sample. I tried to create testcase from this to match production db: 1.2 million orders 3.5 million order details 13400 products with char(20) as primary keys containing ean-13 codes mostly 3 last year data every order has usually 1..3 detail lines same product can appear multiple times in order products are queried by start of code This sample does not distribute products randomly between orders. How to change this so that every order contains 3 (or 1..6 ) random products? I tried to use random row sample from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i but in this case constant product is returned always. It seems than query containing randon() is executed only once. Andrus. begin; CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL); CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name char(70) NOT NULL, quantity numeric(12,2) default 1); CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id CHAR(20),padding1 char(70), id serial, price numeric(12,2) default 1 ); INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20), 'product number ' || n::TEXT FROM generate_series(0,13410) AS n; INSERT INTO orders SELECT n,'2005-01-01'::date + (4000.0 * n/3500000.0 * '1 DAY'::interval) FROM generate_series(0,3500000/3) AS n; SET work_mem TO 2097151; -- 1048576; INSERT INTO orders_products SELECT generate_series/3 as order_id, ( ((generate_series/3500000.0)*13410.0)::int*power( 10,13))::INT8::CHAR(20) FROM generate_series(1,3500000) where generate_series/3>0; ALTER TABLE orders ADD PRIMARY KEY (order_id); ALTER TABLE products ADD PRIMARY KEY (product_id); ALTER TABLE orders_products ADD PRIMARY KEY (id); ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES products(product_id); ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE; CREATE INDEX orders_date ON orders( order_date ); CREATE INDEX order_product_pattern_idx ON orders_products( product_id bpchar_pattern_ops ); COMMIT; SET work_mem TO DEFAULT; ANALYZE; SELECT sum(quantity*price) FROM orders JOIN orders_products USING (order_id) JOIN products USING (product_id) WHERE orders.order_date>='2008-01-17' and orders_products.product_id like '130%'
> Thank you very much for great sample. > I tried to create testcase from this to match production db: > > 1.2 million orders > 3.5 million order details > 13400 products with char(20) as primary keys containing ean-13 codes > mostly > 3 last year data > every order has usually 1..3 detail lines > same product can appear multiple times in order > products are queried by start of code > > This sample does not distribute products randomly between orders. > How to change this so that every order contains 3 (or 1..6 ) random > products? > I tried to use random row sample from > http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i > > but in this case constant product is returned always. It seems than > query containing randon() is executed only once. You could try writing a plpgsql function which would generate the data set. Or you could use your existing data set. By the way, a simple way to de-bloat your big table without blocking would be this : - stop all inserts and updates - begin - create table new like old table - insert into new select * from old (order by perhaps) - create indexes - rename new into old - commit If this is just a reporting database where you insert a batch of new data every day, for instance, that's very easy to do. If it's OLTP, then, no.
> You could try writing a plpgsql function which would generate the data > set. > Or you could use your existing data set. Creating 3.5 mln rows using stored proc is probably slow. Probably it would be better and faster to use some random() and generate_series() trick. In this case others can try it and dataset generation is faster. > By the way, a simple way to de-bloat your big table without blocking > would be this : > > - stop all inserts and updates > - begin > - create table new like old table > - insert into new select * from old (order by perhaps) > - create indexes > - rename new into old > - commit > > If this is just a reporting database where you insert a batch of new data > every day, for instance, that's very easy to do. If it's OLTP, then, no. Those are orders and order_products tables. I ran vacuum full analyze verbose last night. Now database has 4832 MB size, including 1 GB pg_shdepend bloated indexes. I added max_fsm_pages=150000 and re-booted. Query below and other queries are still too slow set search_path to firma2,public; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=181795.13..181795.14 rows=1 width=0) (actual time=23678.265..23678.268 rows=1 loops=1)" " -> Nested Loop (cost=73999.44..181733.74 rows=24555 width=0) (actual time=18459.230..23598.956 rows=21476 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.134..0.145 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=73999.44..181482.18 rows=24555 width=24) (actual time=18459.076..23441.098 rows=21476 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4082.88..101779.03 rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634 rows=278183 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=69195.13..69195.13 rows=112573 width=4) (actual time=8894.465..8894.465 rows=109890 loops=1)" " -> Bitmap Heap Scan on dok (cost=1492.00..69195.13 rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177 rows=110484 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" "Total runtime: 23678.790 ms" Here is a list of untried recommendations from this thread: 1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx - In 8.1.4 provided form of CLUSTER causes syntax error, no idea what syntax to use. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 2. Change CHAR(20) product index to int index by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 3. Denormalization of sale date to order_producs table by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 4. Check on the performance of the RAID: Does it leverage NCQ appropriately when running queries in parallel ? No idea how. 5. Materialized views. I need date granularity so it is possible to sum only one days sales. http://www.pgcon.org/2008/schedule/events/69.en.html Seems to be major appl re-write, no idea how. Appoaches which probably does not change perfomance: 6. Upgrade to 8.4 or to 8.3.5 7. run server on standalone mode and recover 1 GB pg_shdepend bloated index. 8. tune some conf file parameters: > work_mem = 512 I'd consider increasing this value a little - 0.5 MB seems too low to me (but not necessarily). > effective_cache_size= 70000 Well, your server has 2GB of RAM and usually it's recommended to set this value to about 60-70% of your RAM, so using 540MB (25%) seems quite low. Data size is nearly the same as RAM size. It is unpleasant surprise that queries take so long time. What should I do next? Andrus. 1 40926 firma2.rid 1737 MB 2 40595 firma2.dok 1632 MB 3 1214 pg_catalog.pg_shdepend 1235 MB 4 1232 pg_catalog.pg_shdepend_depender_index 795 MB 7 1233 pg_catalog.pg_shdepend_reference_index 439 MB 8 44299 firma2.rid_toode_idx 298 MB 9 44286 firma2.dok_tasudok_idx 245 MB 10 19103791 firma2.rid_toode_pattern_idx 202 MB 11 44283 firma2.dok_klient_idx 160 MB 12 44298 firma2.rid_inpdokumnr_idx 148 MB 13 44297 firma2.rid_dokumnr_idx 132 MB 14 43573 firma2.rid_pkey 130 MB 17 40556 pg_toast.pg_toast_40552 112 MB 18 44288 firma2.dok_tasumata_idx 103 MB 19 44289 firma2.dok_tellimus_idx 101 MB 20 44284 firma2.dok_krdokumnr_idx 101 MB 21 44285 firma2.dok_kuupaev_idx 94 MB 22 19076304 firma2.rid_rtellimus_idx 90 MB 24 44282 firma2.dok_dokumnr_idx 74 MB 25 43479 firma2.dok_pkey 74 MB 26 18663923 firma2.dok_yksus_pattern_idx 65 MB 27 18801591 firma2.dok_sihtyksus_pattern_idx 64 MB 32 18774881 firma2.dok_doktyyp 47 MB output from vacuum full: INFO: free space map contains 14353 pages in 314 relations DETAIL: A total of 20000 page slots are in use (including overhead). 89664 page slots are required to track all free space. Current limits are: 20000 page slots, 1000 relations, using 182 KB. NOTICE: number of page slots needed (89664) exceeds max_fsm_pages (20000) HINT: Consider increasing the configuration parameter "max_fsm_pages" to a value over 89664. Query returned successfully with no result in 10513335 ms.
> Risky to try in prod server. Requires creating randomly distributed > product_id testcase to measure > difference. > > What should I do next? I guess you have backups - take them, restore the database on a different machine (preferably with the same / similar hw config) and tune the queries on it. After restoring all the tables / indexes will be 'clean' (not bloated), so you'll see if performing VACUUM FULL / CLUSTER is the right solution or if you have to change the application internals. Sure, the times will be slightly different but the performance problems should remain the same. regards Tomas
> Appoaches which probably does not change perfomance: > 6. Upgrade to 8.4 or to 8.3.5 Both of these will improve performance a little, even with the same query plan and same data. I would expect about a 10%improvement for 8.3.x on most memory bound select queries. 8.4 won't be out for a few months. > 7. run server on standalone mode and recover 1 GB pg_shdepend bloated index. > 8. tune some conf file parameters: > > work_mem = 512 > I'd consider increasing this value a little - 0.5 MB seems too low to me > (but not necessarily). This is very easy to try. You can change work_mem for just a single session, and this can in some cases help performancequite a bit, and in others not at all. I would not recommend having it lower than at least 4MB on a server like that unless you have a lot of concurrently activequeries / connections. To try it, simply use the SET command. To try out 32MB, just do: SET work_mem = '32MB'; and the value will be changed locally for that session only. See if it affects your test query or not. http://www.postgresql.org/docs/8.3/interactive/sql-set.html > > effective_cache_size= 70000 > Well, your server has 2GB of RAM and usually it's recommended to set > this value to about 60-70% of your RAM, so using 540MB (25%) seems quite > low. > Data size is nearly the same as RAM size. It is unpleasant surprise that > queries take so long time. > What should I do next? First, demonstrate that it is all or mostly in memory -- use iostat or other tools to ensure that there is not much diskactivity during the query. If your system doesn't have iostat installed, it should be installed. It is a very usefultool. If it is all cached in memory, you may want to ensure that your shared_buffers is a reasonalbe size so that there is lessshuffling of data from the kernel to postgres and back. Generally, shared_buffers works best between 5% and 25% of systemmemory. If it is completely CPU bound then the work done for the query has to be reduced by altering the plan to a more optimal oneor making the work it has to do at each step easier. Most of the ideas in this thread revolve around those things. Based on the time it took to do the vacuum, I suspect your disk subsystem is a bit slow. If it can be determined that thereis much disk I/O in your use cases, there are generally several things that can be done to tune Linux I/O. The mainones in my experience are the 'readahead' value for each disk which helps sequential reads significantly, and tryingout the linux 'deadline' scheduler and comparing it to the more commonly used 'cfq' scheduler. If the system is configuredwith the anticipatory scheduler, absolutely switch to cfq or deadline as the anticipatory scheduler will performhorribly poorly for a database. > Andrus.
> I guess you have backups - take them, restore the database on a different > machine (preferably with the same / similar hw config) and tune the > queries on it. > > After restoring all the tables / indexes will be 'clean' (not bloated), so > you'll see if performing VACUUM FULL / CLUSTER is the right solution or if > you have to change the application internals. > > Sure, the times will be slightly different but the performance problems > should remain the same. VACUUM FULL has My test computer has PostgreSql 8.3, 4 GB RAM, SSD disks, Intel X2Extreme CPU So it is much faster than this prod server. No idea how to emulate this environment. I can create new db in prod server as old copy but this can be used in late night only. Where to find script which clones some database in server? Something like CREATE DATABASE newdb AS SELECT * FROM olddb; It would be more convenient to run db cloning script from pgadmin command window. Only way I found is to use SSH with pg_dup/pg_restore. This requires SSH access to server and SSH port opening to public internet. Or probably try to run CLUSTER command in prod server. Hopefully clustering by product id cannot make things slow too much. Andrus.
Scott, thank you. >> > work_mem = 512 >This is very easy to try. You can change work_mem for just a single >session, and this can in some cases help performance quite a bit, and in >others not at all. >I would not recommend having it lower than at least 4MB on a server like >that unless you have a lot of concurrently active queries / connections. >To try it, simply use the SET command. To try out 32MB, just do: >SET work_mem = '32MB'; 8.1.4 requires int value. SET work_mem = 33554432; causes: ERROR: 33554432 is outside the valid range for parameter "work_mem" (64 .. 2097151) So max allowed value seems to be 2 MB I tested it when this server was idle by running both queries several times after VACUUM FULL was running SET work_mem = 2097151; set search_path to firma2,public; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=177291.36..177291.37 rows=1 width=0) (actual time=5153.856..5153.859 rows=1 loops=1)" " -> Nested Loop (cost=73607.45..177229.96 rows=24561 width=0) (actual time=1395.935..5071.247 rows=21541 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.078..0.087 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=73607.45..176978.33 rows=24561 width=24) (actual time=1395.836..4911.425 rows=21541 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4083.10..101802.05 rows=270316 width=28) (actual time=238.578..2367.189 rows=278247 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4083.10 rows=270316 width=0) (actual time=150.868..150.868 rows=278248 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=69242.72..69242.72 rows=112651 width=4) (actual time=1146.081..1146.081 rows=110170 loops=1)" " -> Bitmap Heap Scan on dok (cost=1492.28..69242.72 rows=112651 width=4) (actual time=46.210..696.803 rows=110170 loops=1)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.28 rows=112651 width=0) (actual time=33.938..33.938 rows=110232 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" "Total runtime: 5154.911 ms" SET work_mem to default; set search_path to firma2,public; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=181869.36..181869.37 rows=1 width=0) (actual time=7807.867..7807.871 rows=1 loops=1)" " -> Nested Loop (cost=74048.45..181807.96 rows=24561 width=0) (actual time=2607.429..7728.138 rows=21541 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.079..0.091 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=74048.45..181556.33 rows=24561 width=24) (actual time=2607.332..7569.612 rows=21541 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4083.10..101802.05 rows=270316 width=28) (actual time=1147.071..4528.845 rows=278247 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4083.10 rows=270316 width=0) (actual time=1140.337..1140.337 rows=278248 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=69242.72..69242.72 rows=112651 width=4) (actual time=1240.988..1240.988 rows=110170 loops=1)" " -> Bitmap Heap Scan on dok (cost=1492.28..69242.72 rows=112651 width=4) (actual time=68.053..769.448 rows=110170 loops=1)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.28 rows=112651 width=0) (actual time=61.358..61.358 rows=110232 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" "Total runtime: 7808.174 ms" In both cases vmstat 2 shows only cpu activity when queries are running: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa ... 0 0 232 123692 0 1891044 0 0 0 0 252 17 0 0 100 0 0 0 232 123692 0 1891044 0 0 0 0 252 17 0 0 100 0 0 0 232 123684 0 1891044 0 0 0 162 254 22 0 0 100 0 0 0 232 123684 0 1891044 0 0 0 0 252 18 0 0 100 0 1 0 232 123056 0 1891444 0 0 0 13 254 21 62 5 34 0 <---- start of slower query 1 0 232 102968 0 1911060 0 0 0 16 252 18 26 75 0 0 1 0 232 77424 0 1936996 0 0 0 0 252 18 37 63 0 0 1 0 232 71464 0 1941928 0 0 0 73 260 34 38 62 0 0 0 0 232 123420 0 1891044 0 0 0 32 257 31 8 15 77 0 <-------- end of slower query 0 0 232 123420 0 1891044 0 0 0 25 255 24 0 0 100 0 0 0 232 123420 0 1891044 0 0 0 28 255 27 0 0 100 0 Is it safe to set work_mem = 2097151 in postgresql.conf file ? >First, demonstrate that it is all or mostly in memory -- use iostat or >other tools to ensure that there is not much disk activity during the >query. If your system doesn't have iostat installed, it should be >installed. It is a very useful tool. # iostat bash: iostat: command not found # locate iostat /usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt I have few experience in Linux. No idea how to install or compile iostat in this system. >If it is all cached in memory, you may want to ensure that your >shared_buffers is a reasonalbe size so that there is less shuffling of data >from the kernel to postgres and back. Generally, shared_buffers works best >between 5% and 25% of system memory. currently shared_buffers = 15000 >If it is completely CPU bound then the work done for the query has to be >reduced by altering the plan to a more optimal one or making the work it >has to do at each step easier. Most of the ideas in this thread revolve >around those things. When running on loaded server even after VACUUM FULL, response time for original work_mem is longer probably because it must fetch blocks from disk. >Based on the time it took to do the vacuum, I suspect your disk subsystem >is a bit slow. If it can be determined that there is much disk I/O in your >use cases, there are generally several things that can be done to tune >Linux I/O. The main ones in my experience are the 'readahead' value for >each disk which helps sequential reads significantly, and trying out the >linux 'deadline' scheduler and comparing it to the more commonly used 'cfq' >scheduler. If the system is configured with the anticipatory scheduler, >absolutely switch to cfq or deadline as the anticipatory scheduler will >perform horribly poorly for a database. This is 3 year old cheap server. No idea what to config. There is also other similar server which as 1.2 GB more usable memory. No idea is it worth to switch into it. After some years sales data will still exceed this more memory. Andrus.
> My test computer has PostgreSql 8.3, 4 GB RAM, SSD disks, Intel X2Extreme > CPU > So it is much faster than this prod server. > No idea how to emulate this environment. > I can create new db in prod server as old copy but this can be used in late > night only. Well, a faster but comparable system may not be a problem - the query might run 10 times faster, but it still will be slow (say 40 seconds instead of 8 minutes). What is a problem is a different I/O system - SSD instead of traditional drives in this case. I have no direct experience with with SSD yet, but AFAIK the seek time is much better compared to regular drives (say 0.1ms instead of 10ms, that is 100-times faster). So you can't just put on old SATA drive into the test machine? > Where to find script which clones some database in server? Something like > > CREATE DATABASE newdb AS SELECT * FROM olddb; > > It would be more convenient to run db cloning script from pgadmin command > window. > Only way I found is to use SSH with pg_dup/pg_restore. This requires SSH > access to server and SSH port opening to public internet. Yes, using pg_dump | pg_restore is probably the way to clone database. But it will slow down the system, as it has to do a lot of I/O (and as it seems to be a bottleneck already, I don't think this is a good idea). > Or probably try to run CLUSTER command in prod server. Hopefully clustering > by product id cannot make things slow > too much. As already noted, CLUSTER command causes exclusive lock on the database. So this is an operation you'd like to do on production server ... Tomas
> Scott, > > thank you. > >>> > work_mem = 512 >> This is very easy to try. You can change work_mem for just a single >> session, and this can in some cases help performance quite a bit, and in >> others not at all. >> I would not recommend having it lower than at least 4MB on a server like >> that unless you have a lot of concurrently active queries / connections. >> To try it, simply use the SET command. To try out 32MB, just do: >> SET work_mem = '32MB'; > > 8.1.4 requires int value. > SET work_mem = 33554432; > causes: > ERROR: 33554432 is outside the valid range for parameter "work_mem" (64 .. > 2097151) > > So max allowed value seems to be 2 MB > I tested it when this server was idle by running both queries several times > after VACUUM FULL was running > > SET work_mem = 2097151; No, not really. The work_mem value is specified in kilobytes, so you've set it to 2GB :-) > set search_path to firma2,public; > explain analyze > SELECT sum(1) > FROM dok JOIN rid USING (dokumnr) > JOIN toode USING (toode) > WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > "Aggregate (cost=177291.36..177291.37 rows=1 width=0) (actual > time=5153.856..5153.859 rows=1 loops=1)" > " -> Nested Loop (cost=73607.45..177229.96 rows=24561 width=0) (actual > time=1395.935..5071.247 rows=21541 loops=1)" > " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 > width=24) (actual time=0.078..0.087 rows=1 loops=1)" > " Index Cond: ('X05'::bpchar = toode)" > " -> Hash Join (cost=73607.45..176978.33 rows=24561 width=24) > (actual time=1395.836..4911.425 rows=21541 loops=1)" > " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" > " -> Bitmap Heap Scan on rid (cost=4083.10..101802.05 > rows=270316 width=28) (actual time=238.578..2367.189 rows=278247 loops=1)" > " Recheck Cond: (toode = 'X05'::bpchar)" > " -> Bitmap Index Scan on rid_toode_idx > (cost=0.00..4083.10 rows=270316 width=0) (actual time=150.868..150.868 > rows=278248 loops=1)" > " Index Cond: (toode = 'X05'::bpchar)" > " -> Hash (cost=69242.72..69242.72 rows=112651 width=4) > (actual time=1146.081..1146.081 rows=110170 loops=1)" > " -> Bitmap Heap Scan on dok (cost=1492.28..69242.72 > rows=112651 width=4) (actual time=46.210..696.803 rows=110170 loops=1)" > " Recheck Cond: (kuupaev >= '2008-09-01'::date)" > " -> Bitmap Index Scan on dok_kuupaev_idx > (cost=0.00..1492.28 rows=112651 width=0) (actual time=33.938..33.938 > rows=110232 loops=1)" > " Index Cond: (kuupaev >= > '2008-09-01'::date)" > "Total runtime: 5154.911 ms" > > > SET work_mem to default; > set search_path to firma2,public; > explain analyze > SELECT sum(1) > FROM dok JOIN rid USING (dokumnr) > JOIN toode USING (toode) > WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > "Aggregate (cost=181869.36..181869.37 rows=1 width=0) (actual > time=7807.867..7807.871 rows=1 loops=1)" > " -> Nested Loop (cost=74048.45..181807.96 rows=24561 width=0) (actual > time=2607.429..7728.138 rows=21541 loops=1)" > " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 > width=24) (actual time=0.079..0.091 rows=1 loops=1)" > " Index Cond: ('X05'::bpchar = toode)" > " -> Hash Join (cost=74048.45..181556.33 rows=24561 width=24) > (actual time=2607.332..7569.612 rows=21541 loops=1)" > " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" > " -> Bitmap Heap Scan on rid (cost=4083.10..101802.05 > rows=270316 width=28) (actual time=1147.071..4528.845 rows=278247 loops=1)" > " Recheck Cond: (toode = 'X05'::bpchar)" > " -> Bitmap Index Scan on rid_toode_idx > (cost=0.00..4083.10 rows=270316 width=0) (actual time=1140.337..1140.337 > rows=278248 loops=1)" > " Index Cond: (toode = 'X05'::bpchar)" > " -> Hash (cost=69242.72..69242.72 rows=112651 width=4) > (actual time=1240.988..1240.988 rows=110170 loops=1)" > " -> Bitmap Heap Scan on dok (cost=1492.28..69242.72 > rows=112651 width=4) (actual time=68.053..769.448 rows=110170 loops=1)" > " Recheck Cond: (kuupaev >= '2008-09-01'::date)" > " -> Bitmap Index Scan on dok_kuupaev_idx > (cost=0.00..1492.28 rows=112651 width=0) (actual time=61.358..61.358 > rows=110232 loops=1)" > " Index Cond: (kuupaev >= > '2008-09-01'::date)" > "Total runtime: 7808.174 ms" > > In both cases vmstat 2 shows only cpu activity when queries are running: > > procs -----------memory---------- ---swap-- -----io---- --system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id > wa > ... > 0 0 232 123692 0 1891044 0 0 0 0 252 17 0 0 > 100 0 > 0 0 232 123692 0 1891044 0 0 0 0 252 17 0 0 > 100 0 > 0 0 232 123684 0 1891044 0 0 0 162 254 22 0 0 > 100 0 > 0 0 232 123684 0 1891044 0 0 0 0 252 18 0 0 > 100 0 > 1 0 232 123056 0 1891444 0 0 0 13 254 21 62 5 34 > 0 <---- start of slower query > 1 0 232 102968 0 1911060 0 0 0 16 252 18 26 75 0 > 0 > 1 0 232 77424 0 1936996 0 0 0 0 252 18 37 63 0 > 0 > 1 0 232 71464 0 1941928 0 0 0 73 260 34 38 62 0 > 0 > 0 0 232 123420 0 1891044 0 0 0 32 257 31 8 15 77 > 0 <-------- end of slower query > 0 0 232 123420 0 1891044 0 0 0 25 255 24 0 0 > 100 0 > 0 0 232 123420 0 1891044 0 0 0 28 255 27 0 0 > 100 0 > Well, this parameter specifies how much memory may be used for in-memory sorting and hash tables. If more memory is needed, the sort / hash table will be performed on-disk. I guess the difference in speed is not huge, so in this case the hash table (used for join) probably fits into the 1024kB (which is a default value). > Is it safe to set > > work_mem = 2097151 > > in postgresql.conf file ? Each sort in a query uses a separate area in a memory (up to work_mem). So if you have 10 sessions running a query with 2 sorts, that may occupy 10 * 2 * work_mem of memory. Let's suppose you set a reasonable value (say 8096) instead of 2GB. That gives about 160MB. Anyway this depends - if you have a lot of slow queries caused by on-disk sorts / hash tables, use a higher value. Otherwise leave it as it is. >> First, demonstrate that it is all or mostly in memory -- use iostat or >> other tools to ensure that there is not much disk activity during the >> query. If your system doesn't have iostat installed, it should be >> installed. It is a very useful tool. > > # iostat > bash: iostat: command not found > # locate iostat > /usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt > > I have few experience in Linux. No idea how to install or compile iostat in > this system. > >> If it is all cached in memory, you may want to ensure that your >> shared_buffers is a reasonalbe size so that there is less shuffling of >> data >> from the kernel to postgres and back. Generally, shared_buffers works >> best >> between 5% and 25% of system memory. > > currently shared_buffers = 15000 That's 120MB, i.e. about 6% of the memory. Might be a little bit higher, but seems reasonable. >> If it is completely CPU bound then the work done for the query has to be >> reduced by altering the plan to a more optimal one or making the work it >> has to do at each step easier. Most of the ideas in this thread revolve >> around those things. > > When running on loaded server even after VACUUM FULL, response time for > original work_mem is longer probably because it must fetch blocks from > disk. > >> Based on the time it took to do the vacuum, I suspect your disk subsystem >> is a bit slow. If it can be determined that there is much disk I/O in >> your >> use cases, there are generally several things that can be done to tune >> Linux I/O. The main ones in my experience are the 'readahead' value for >> each disk which helps sequential reads significantly, and trying out the >> linux 'deadline' scheduler and comparing it to the more commonly used >> 'cfq' >> scheduler. If the system is configured with the anticipatory scheduler, >> absolutely switch to cfq or deadline as the anticipatory scheduler will >> perform horribly poorly for a database. > > This is 3 year old cheap server. > No idea what to config. Well, how could we know that if you don't? Anyway the options mentioned by Scott Carey are related to linux kernel, so it shouldn't be a problem to change. > There is also other similar server which as 1.2 GB more usable memory. > No idea is it worth to switch into it. > After some years sales data will still exceed this more memory. Given the fact that the performance issues are caused by bloated tables and / or slow I/O subsystem, moving to a similar system won't help I guess. regards Tomas
Tomas, > Let's suppose you set a reasonable value (say 8096) instead of 2GB. That > gives about 160MB. > Anyway this depends - if you have a lot of slow queries caused by on-disk > sorts / hash tables, use a higher value. Otherwise leave it as it is. Probably product orders table is frequently joined which product table. currently there was work_memory = 512 in conf file. I changed it to work_memory = 8096 >>> If it is all cached in memory, you may want to ensure that your >>> shared_buffers is a reasonalbe size so that there is less shuffling of >>> data >>> from the kernel to postgres and back. Generally, shared_buffers works >>> best >>> between 5% and 25% of system memory. >> >> currently shared_buffers = 15000 > > That's 120MB, i.e. about 6% of the memory. Might be a little bit higher, > but seems reasonable. I changed it to 20000 > Given the fact that the performance issues are caused by bloated tables > and / or slow I/O subsystem, moving to a similar system won't help I > guess. I have ran VACUUM FULL ANALYZE VERBOSE and set MAX_FSM_PAGES = 150000 So there is no any bloat except pg_shdepend indexes which should not affect to query speed. Andrus.
>> Given the fact that the performance issues are caused by bloated tables >> and / or slow I/O subsystem, moving to a similar system won't help I >> guess. > > I have ran VACUUM FULL ANALYZE VERBOSE > and set MAX_FSM_PAGES = 150000 > > So there is no any bloat except pg_shdepend indexes which should not > affect to query speed. OK, what was the number of unused pointer items in the VACUUM output? The query performance is still the same as when the tables were bloated? What are the outputs of iostat/vmstat/dstat/top when running the query? regards Tomas
Tomas, > OK, what was the number of unused pointer items in the VACUUM output? I posted it in this thread: VACUUM FULL ANALYZE VERBOSE; ... INFO: free space map contains 14353 pages in 314 relations DETAIL: A total of 20000 page slots are in use (including overhead). 89664 page slots are required to track all free space. Current limits are: 20000 page slots, 1000 relations, using 182 KB. NOTICE: number of page slots needed (89664) exceeds max_fsm_pages (20000) HINT: Consider increasing the configuration parameter "max_fsm_pages" to a value over 89664. Query returned successfully with no result in 10513335 ms. > The query performance is still the same as when the tables were bloated? Seems to be the same. However I improved yesterday after previous message other queries not to scan whole product orders (rid) table. Now there is only few disk activity after 5-300 seconds which seems not to affect to those query results. So issue in this thread has been gone away. Now this query runs using constant time 8 seconds: explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode = 'X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=182210.77..182210.78 rows=1 width=0) (actual time=8031.600..8031.604 rows=1 loops=1)" " -> Nested Loop (cost=74226.74..182149.27 rows=24598 width=0) (actual time=2602.474..7948.121 rows=21711 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.077..0.089 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=74226.74..181897.28 rows=24598 width=24) (actual time=2602.378..7785.315 rows=21711 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4084.54..101951.60 rows=270725 width=28) (actual time=1129.925..4686.601 rows=278417 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4084.54 rows=270725 width=0) (actual time=1123.202..1123.202 rows=278426 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=69419.29..69419.29 rows=112766 width=4) (actual time=1251.496..1251.496 rows=111088 loops=1)" " -> Bitmap Heap Scan on dok (cost=1492.68..69419.29 rows=112766 width=4) (actual time=70.837..776.249 rows=111088 loops=1)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.68 rows=112766 width=0) (actual time=64.177..64.177 rows=111343 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" "Total runtime: 8031.905 ms" Interestingly using like is 80 times faster: explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like 'X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=88178.69..88178.70 rows=1 width=0) (actual time=115.335..115.339 rows=1 loops=1)" " -> Hash Join (cost=71136.22..88117.36 rows=24532 width=0) (actual time=115.322..115.322 rows=0 loops=1)" " Hash Cond: ("outer".toode = "inner".toode)" " -> Hash Join (cost=70163.36..86253.20 rows=24598 width=24) (actual time=0.046..0.046 rows=0 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=21.16..6307.52 rows=270725 width=28) (actual time=0.037..0.037 rows=0 loops=1)" " Filter: (toode ~~ 'X05'::text)" " -> Bitmap Index Scan on rid_toode_pattern_idx (cost=0.00..21.16 rows=1760 width=0) (actual time=0.028..0.028 rows=0 loops=1)" " Index Cond: (toode ~=~ 'X05'::bpchar)" " -> Hash (cost=69419.29..69419.29 rows=112766 width=4) (never executed)" " -> Bitmap Heap Scan on dok (cost=1492.68..69419.29 rows=112766 width=4) (never executed)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.68 rows=112766 width=0) (never executed)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=853.29..853.29 rows=13429 width=24) (actual time=114.757..114.757 rows=13412 loops=1)" " -> Seq Scan on toode (cost=0.00..853.29 rows=13429 width=24) (actual time=0.014..58.319 rows=13412 loops=1)" "Total runtime: 115.505 ms" I posted also a test script in other thread which shows also that like is magitude faster than equality check. rid.toode = 'X05' and rid.toode like 'X05' are exactly the same conditions, there are indexes for both conditions. So I do'nt understand why results are so different. In other sample which I posted in thread "Increasing pattern index query speed" like is 4 times slower: SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode = '99000010' AND dok.kuupaev BETWEEN '2008-11-21' AND '2008-11-21' AND dok.yksus LIKE 'ORISSAARE%' "Aggregate (cost=43.09..43.10 rows=1 width=0) (actual time=12674.675..12674.679 rows=1 loops=1)" " -> Nested Loop (cost=29.57..43.08 rows=1 width=0) (actual time=2002.045..12673.645 rows=19 loops=1)" " -> Nested Loop (cost=29.57..37.06 rows=1 width=24) (actual time=2001.922..12672.344 rows=19 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..3.47 rows=1 width=4) (actual time=342.812..9810.627 rows=319 loops=1)" " Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" " Filter: (yksus ~~ 'ORISSAARE%'::text)" " -> Bitmap Heap Scan on rid (cost=29.57..33.58 rows=1 width=28) (actual time=8.948..8.949 rows=0 loops=319)" " Recheck Cond: (("outer".dokumnr = rid.dokumnr) AND (rid.toode = '99000010'::bpchar))" " -> BitmapAnd (cost=29.57..29.57 rows=1 width=0) (actual time=8.930..8.930 rows=0 loops=319)" " -> Bitmap Index Scan on rid_dokumnr_idx (cost=0.00..2.52 rows=149 width=0) (actual time=0.273..0.273 rows=2 loops=319)" " Index Cond: ("outer".dokumnr = rid.dokumnr)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..26.79 rows=1941 width=0) (actual time=8.596..8.596 rows=15236 loops=319)" " Index Cond: (toode = '99000010'::bpchar)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.043..0.048 rows=1 loops=19)" " Index Cond: ('99000010'::bpchar = toode)" "Total runtime: 12675.191 ms" explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '99000010%' AND dok.kuupaev BETWEEN '2008-11-21' AND '2008-11-21' AND dok.yksus LIKE 'ORISSAARE%' "Aggregate (cost=17.99..18.00 rows=1 width=0) (actual time=46465.609..46465.613 rows=1 loops=1)" " -> Nested Loop (cost=0.00..17.98 rows=1 width=0) (actual time=3033.947..46465.462 rows=19 loops=1)" " -> Nested Loop (cost=0.00..11.96 rows=1 width=24) (actual time=3033.890..46464.310 rows=19 loops=1)" " Join Filter: ("outer".dokumnr = "inner".dokumnr)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.93 rows=1 width=4) (actual time=0.044..5.419 rows=319 loops=1)" " Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" " Filter: (yksus ~~ 'ORISSAARE%'::text)" " -> Index Scan using rid_toode_pattern_idx on rid (cost=0.00..6.01 rows=1 width=28) (actual time=0.025..88.046 rows=15322 loops=319)" " Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode ~<~ '99000011'::bpchar))" " Filter: (toode ~~ '99000010%'::text)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.034..0.039 rows=1 loops=19)" " Index Cond: ("outer".toode = toode.toode)" "Total runtime: 46465.833 ms" > What are the outputs of iostat/vmstat/dstat/top when running the query? iostat and dstat are not installed in Gentoo. For last like query which constantly takes 46 seconds vmstat 5 shows: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 216 124780 0 1899236 0 0 10 13 277 78 3 1 95 0 0 0 216 124656 0 1899248 0 0 2 8 267 59 2 0 98 0 1 0 216 124664 0 1899244 0 0 0 7 266 62 2 0 97 0 0 0 216 124788 0 1899248 0 0 0 8 273 73 3 1 96 0 0 0 216 124656 0 1899252 0 0 0 2 265 54 2 0 97 0 1 0 216 124656 0 1899252 0 0 0 22 268 63 14 39 48 0 <-------- start of query 1 0 216 124656 0 1899252 0 0 0 62 267 61 25 75 0 0 1 0 216 124656 0 1899252 0 0 0 2 266 55 28 72 0 0 1 0 216 124664 0 1899256 0 0 0 5 265 56 26 74 0 0 1 0 216 124788 0 1899256 0 0 0 10 271 67 25 75 0 0 1 0 216 124664 0 1899256 0 0 0 3 265 54 25 75 0 0 1 0 216 124160 0 1899260 0 0 0 1 265 57 28 72 0 0 1 0 216 125020 0 1899260 0 0 0 21 272 60 28 72 0 0 1 0 216 124020 0 1899264 0 0 0 0 271 73 29 71 0 0 0 0 216 124260 0 1899268 0 0 0 3 266 61 19 59 22 0 <------ end of query 1 0 216 125268 0 1899260 0 0 0 9 268 59 2 0 97 0 0 0 216 124912 0 1899268 0 0 0 5 270 59 3 0 96 0 0 0 216 124656 0 1899272 0 0 0 5 267 64 2 0 98 0 0 0 216 124664 0 1899272 0 0 0 0 263 50 2 0 98 0 top shows single postmaster process activity: top - 23:07:49 up 27 days, 4:20, 1 user, load average: 0.25, 0.22, 0.12 Tasks: 50 total, 3 running, 47 sleeping, 0 stopped, 0 zombie Cpu(s): 29.3% us, 70.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 2075828k total, 1951604k used, 124224k free, 0k buffers Swap: 3911816k total, 216k used, 3911600k free, 1899348k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29687 postgres 25 0 144m 124m 121m R 96.3 6.2 8:18.04 postmaster 8147 root 16 0 4812 1628 1316 S 1.0 0.1 0:00.03 sshd 8141 root 15 0 5080 1892 1528 S 0.7 0.1 0:00.02 sshd 8145 sshd 15 0 4816 1220 912 S 0.3 0.1 0:00.01 sshd 8151 sshd 15 0 4812 1120 816 S 0.3 0.1 0:00.01 sshd 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.89 init 2 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0 3 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0 4 root 10 -5 0 0 0 S 0.0 0.0 0:00.63 khelper 5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 7 root 10 -5 0 0 0 S 0.0 0.0 2:21.73 kblockd/0 8 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 115 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 114 root 15 0 0 0 0 S 0.0 0.0 9:21.41 kswapd0 116 root 10 -5 0 0 0 S 0.0 0.0 0:12.06 xfslogd/0 117 root 10 -5 0 0 0 S 0.0 0.0 1:36.43 xfsdatad/0 706 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 723 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 738 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0 740 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0 741 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1 742 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2 743 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_3 762 root 10 -5 0 0 0 S 0.0 0.0 0:18.64 xfsbufd 763 root 10 -5 0 0 0 S 0.0 0.0 0:00.79 xfssyncd 963 root 16 -4 1712 528 336 S 0.0 0.0 0:00.24 udevd 6677 root 15 0 1728 572 400 S 0.0 0.0 0:10.14 syslog-ng 7183 root 15 0 3472 828 672 S 0.0 0.0 0:08.50 sshd 7222 root 16 0 1736 672 556 S 0.0 0.0 0:00.03 cron 7237 root 16 0 1620 712 608 S 0.0 0.0 0:00.00 agetty 7238 root 17 0 1616 712 608 S 0.0 0.0 0:00.00 agetty 7239 root 16 0 1616 712 608 S 0.0 0.0 0:00.00 agetty 7240 root 16 0 1616 708 608 S 0.0 0.0 0:00.00 agetty 7241 root 16 0 1616 708 608 S 0.0 0.0 0:00.00 agetty 31873 root 15 0 1616 712 608 S 0.0 0.0 0:00.00 agetty 14908 postgres 16 0 141m 10m 9936 S 0.0 0.5 0:01.44 postmaster 14915 postgres 16 0 8468 1360 896 S 0.0 0.1 0:00.36 postmaster Andrus.