Re: Hash join on int takes 8..114 seconds - Mailing list pgsql-performance
From | Tomas Vondra |
---|---|
Subject | Re: Hash join on int takes 8..114 seconds |
Date | |
Msg-id | 4929E9C4.1010706@fuzzy.cz Whole thread Raw |
In response to | Re: Hash join on int takes 8..114 seconds ("Andrus" <kobruleht2@hot.ee>) |
Responses |
Re: Hash join on int takes 8..114 seconds
|
List | pgsql-performance |
> Scott, > > thank you. > >>> > work_mem = 512 >> This is very easy to try. You can change work_mem for just a single >> session, and this can in some cases help performance quite a bit, and in >> others not at all. >> I would not recommend having it lower than at least 4MB on a server like >> that unless you have a lot of concurrently active queries / connections. >> To try it, simply use the SET command. To try out 32MB, just do: >> SET work_mem = '32MB'; > > 8.1.4 requires int value. > SET work_mem = 33554432; > causes: > ERROR: 33554432 is outside the valid range for parameter "work_mem" (64 .. > 2097151) > > So max allowed value seems to be 2 MB > I tested it when this server was idle by running both queries several times > after VACUUM FULL was running > > SET work_mem = 2097151; No, not really. The work_mem value is specified in kilobytes, so you've set it to 2GB :-) > set search_path to firma2,public; > explain analyze > SELECT sum(1) > FROM dok JOIN rid USING (dokumnr) > JOIN toode USING (toode) > WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > "Aggregate (cost=177291.36..177291.37 rows=1 width=0) (actual > time=5153.856..5153.859 rows=1 loops=1)" > " -> Nested Loop (cost=73607.45..177229.96 rows=24561 width=0) (actual > time=1395.935..5071.247 rows=21541 loops=1)" > " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 > width=24) (actual time=0.078..0.087 rows=1 loops=1)" > " Index Cond: ('X05'::bpchar = toode)" > " -> Hash Join (cost=73607.45..176978.33 rows=24561 width=24) > (actual time=1395.836..4911.425 rows=21541 loops=1)" > " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" > " -> Bitmap Heap Scan on rid (cost=4083.10..101802.05 > rows=270316 width=28) (actual time=238.578..2367.189 rows=278247 loops=1)" > " Recheck Cond: (toode = 'X05'::bpchar)" > " -> Bitmap Index Scan on rid_toode_idx > (cost=0.00..4083.10 rows=270316 width=0) (actual time=150.868..150.868 > rows=278248 loops=1)" > " Index Cond: (toode = 'X05'::bpchar)" > " -> Hash (cost=69242.72..69242.72 rows=112651 width=4) > (actual time=1146.081..1146.081 rows=110170 loops=1)" > " -> Bitmap Heap Scan on dok (cost=1492.28..69242.72 > rows=112651 width=4) (actual time=46.210..696.803 rows=110170 loops=1)" > " Recheck Cond: (kuupaev >= '2008-09-01'::date)" > " -> Bitmap Index Scan on dok_kuupaev_idx > (cost=0.00..1492.28 rows=112651 width=0) (actual time=33.938..33.938 > rows=110232 loops=1)" > " Index Cond: (kuupaev >= > '2008-09-01'::date)" > "Total runtime: 5154.911 ms" > > > SET work_mem to default; > set search_path to firma2,public; > explain analyze > SELECT sum(1) > FROM dok JOIN rid USING (dokumnr) > JOIN toode USING (toode) > WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > "Aggregate (cost=181869.36..181869.37 rows=1 width=0) (actual > time=7807.867..7807.871 rows=1 loops=1)" > " -> Nested Loop (cost=74048.45..181807.96 rows=24561 width=0) (actual > time=2607.429..7728.138 rows=21541 loops=1)" > " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 > width=24) (actual time=0.079..0.091 rows=1 loops=1)" > " Index Cond: ('X05'::bpchar = toode)" > " -> Hash Join (cost=74048.45..181556.33 rows=24561 width=24) > (actual time=2607.332..7569.612 rows=21541 loops=1)" > " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" > " -> Bitmap Heap Scan on rid (cost=4083.10..101802.05 > rows=270316 width=28) (actual time=1147.071..4528.845 rows=278247 loops=1)" > " Recheck Cond: (toode = 'X05'::bpchar)" > " -> Bitmap Index Scan on rid_toode_idx > (cost=0.00..4083.10 rows=270316 width=0) (actual time=1140.337..1140.337 > rows=278248 loops=1)" > " Index Cond: (toode = 'X05'::bpchar)" > " -> Hash (cost=69242.72..69242.72 rows=112651 width=4) > (actual time=1240.988..1240.988 rows=110170 loops=1)" > " -> Bitmap Heap Scan on dok (cost=1492.28..69242.72 > rows=112651 width=4) (actual time=68.053..769.448 rows=110170 loops=1)" > " Recheck Cond: (kuupaev >= '2008-09-01'::date)" > " -> Bitmap Index Scan on dok_kuupaev_idx > (cost=0.00..1492.28 rows=112651 width=0) (actual time=61.358..61.358 > rows=110232 loops=1)" > " Index Cond: (kuupaev >= > '2008-09-01'::date)" > "Total runtime: 7808.174 ms" > > In both cases vmstat 2 shows only cpu activity when queries are running: > > procs -----------memory---------- ---swap-- -----io---- --system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id > wa > ... > 0 0 232 123692 0 1891044 0 0 0 0 252 17 0 0 > 100 0 > 0 0 232 123692 0 1891044 0 0 0 0 252 17 0 0 > 100 0 > 0 0 232 123684 0 1891044 0 0 0 162 254 22 0 0 > 100 0 > 0 0 232 123684 0 1891044 0 0 0 0 252 18 0 0 > 100 0 > 1 0 232 123056 0 1891444 0 0 0 13 254 21 62 5 34 > 0 <---- start of slower query > 1 0 232 102968 0 1911060 0 0 0 16 252 18 26 75 0 > 0 > 1 0 232 77424 0 1936996 0 0 0 0 252 18 37 63 0 > 0 > 1 0 232 71464 0 1941928 0 0 0 73 260 34 38 62 0 > 0 > 0 0 232 123420 0 1891044 0 0 0 32 257 31 8 15 77 > 0 <-------- end of slower query > 0 0 232 123420 0 1891044 0 0 0 25 255 24 0 0 > 100 0 > 0 0 232 123420 0 1891044 0 0 0 28 255 27 0 0 > 100 0 > Well, this parameter specifies how much memory may be used for in-memory sorting and hash tables. If more memory is needed, the sort / hash table will be performed on-disk. I guess the difference in speed is not huge, so in this case the hash table (used for join) probably fits into the 1024kB (which is a default value). > Is it safe to set > > work_mem = 2097151 > > in postgresql.conf file ? Each sort in a query uses a separate area in a memory (up to work_mem). So if you have 10 sessions running a query with 2 sorts, that may occupy 10 * 2 * work_mem of memory. Let's suppose you set a reasonable value (say 8096) instead of 2GB. That gives about 160MB. Anyway this depends - if you have a lot of slow queries caused by on-disk sorts / hash tables, use a higher value. Otherwise leave it as it is. >> First, demonstrate that it is all or mostly in memory -- use iostat or >> other tools to ensure that there is not much disk activity during the >> query. If your system doesn't have iostat installed, it should be >> installed. It is a very useful tool. > > # iostat > bash: iostat: command not found > # locate iostat > /usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt > > I have few experience in Linux. No idea how to install or compile iostat in > this system. > >> If it is all cached in memory, you may want to ensure that your >> shared_buffers is a reasonalbe size so that there is less shuffling of >> data >> from the kernel to postgres and back. Generally, shared_buffers works >> best >> between 5% and 25% of system memory. > > currently shared_buffers = 15000 That's 120MB, i.e. about 6% of the memory. Might be a little bit higher, but seems reasonable. >> If it is completely CPU bound then the work done for the query has to be >> reduced by altering the plan to a more optimal one or making the work it >> has to do at each step easier. Most of the ideas in this thread revolve >> around those things. > > When running on loaded server even after VACUUM FULL, response time for > original work_mem is longer probably because it must fetch blocks from > disk. > >> Based on the time it took to do the vacuum, I suspect your disk subsystem >> is a bit slow. If it can be determined that there is much disk I/O in >> your >> use cases, there are generally several things that can be done to tune >> Linux I/O. The main ones in my experience are the 'readahead' value for >> each disk which helps sequential reads significantly, and trying out the >> linux 'deadline' scheduler and comparing it to the more commonly used >> 'cfq' >> scheduler. If the system is configured with the anticipatory scheduler, >> absolutely switch to cfq or deadline as the anticipatory scheduler will >> perform horribly poorly for a database. > > This is 3 year old cheap server. > No idea what to config. Well, how could we know that if you don't? Anyway the options mentioned by Scott Carey are related to linux kernel, so it shouldn't be a problem to change. > There is also other similar server which as 1.2 GB more usable memory. > No idea is it worth to switch into it. > After some years sales data will still exceed this more memory. Given the fact that the performance issues are caused by bloated tables and / or slow I/O subsystem, moving to a similar system won't help I guess. regards Tomas
pgsql-performance by date: