Thread: indexes in partitioned tables - again
I know this problem crops up all the time and I have read what I could find, but I'm still not finding an answer to my problem. This is all postgres 8.3. Yes, I've enabled constraint_exclusion. Yes, there are indexes on the partitions, not just on the parent. I've got a table with 1 month partitions. As it happens, I've only got 2 partitions at the moment, one with 12 million rows and the other with 5 million. I only discovered all of the caveats surrounding indexes and partitioned tables when I executed a very simple query, saw that it took far too long to run, and started looking at what the query planner did. In this case, I simply want the set of distinct values for a particular column, across all partitions. The set of distinct values is very small (3) and there is an index on the column, so I'd expect an index scan to return the 3 values almost instantaneously. I turns out that when I query the partitions directly, the planner does an index scan. When I query the parent table, I get full table scans instead of merged output from n index scans. Even worse, instead of getting the distinct values from each partition and merging those, it merges each partition in its entirety and then sorts and uniques, which is pretty much the pathological execution order. I'll give the queries, then the schema, then the various explain outputs. (parent table) select distinct probe_type_num from day_scale_radar_performance_fact; (30 seconds) (partition) select distinct probe_type_num from day_scale_radar_performace_fact_20100301_0000; (6 seconds) (partition) select distinct probe_type_num from day_scale_radar_performance_fact_20100401_0000; (1 second) (manual union) select distinct probe_type_num from (select distinct probe_type_num from day_scale_radar_performace_fact_20100301_0000 UNION select distinct probe_type_num from day_scale_radar_performace_fact_20100401_0000) t2; (7 seconds) In part, I'm surprised that the index scan takes as long as it does, since I'd think an index would be able to return the set of keys relatively quickly. But that's a secondary issue. Parent table: cedexis_v2=# \d day_scale_radar_performance_fact; Table "perf_reporting.day_scale_radar_performance_fact" Column | Type | Modifiers ----------------------------+-----------------------------+----------- count | bigint | not null total_ms | bigint | not null time | timestamp without time zone | not null market_num | integer | not null country_num | integer | not null autosys_num | integer | not null provider_owner_zone_id | integer | not null provider_owner_customer_id | integer | not null provider_id | integer | not null probe_type_num | integer | not null Indexes: "temp1_probe_type_num" btree (probe_type_num) partition: cedexis_v2=# \d day_scale_radar_performance_fact_20100301_0000; Table "perf_reporting.day_scale_radar_performance_fact_20100301_0000" Column | Type | Modifiers ----------------------------+-----------------------------+----------- count | bigint | not null total_ms | bigint | not null time | timestamp without time zone | not null market_num | integer | not null country_num | integer | not null autosys_num | integer | not null provider_owner_zone_id | integer | not null provider_owner_customer_id | integer | not null provider_id | integer | not null probe_type_num | integer | not null Indexes: "day_scale_radar_performance_fact_20100301_0000_asn" btree (autosys_num) "day_scale_radar_performance_fact_20100301_0000_cty" btree (country_num) "day_scale_radar_performance_fact_20100301_0000_mkt" btree (market_num) "day_scale_radar_performance_fact_20100301_0000_p" btree (provider_id) "day_scale_radar_performance_fact_20100301_0000_poc" btree (provider_owner_customer_id) "day_scale_radar_performance_fact_20100301_0000_poz" btree (provider_owner_zone_id) "day_scale_radar_performance_fact_20100301_0000_pt" btree (probe_type_num) "day_scale_radar_performance_fact_20100301_0000_time" btree ("time") Check constraints: "day_scale_radar_performance_fact_20100301_0000_time_check" CHECK ("time" >= '2010-03-01 00:00:00'::timestamp without time zone AND "time" < '2010-04-01 00:00:00'::timestamp without time zone) Inherits: day_scale_radar_performance_fact I also tried creating an index on the relevant column in the parent table, but it had no effect, either way. You can see it in the table description above cedexis_v2=# explain select distinct probe_type_num from day_scale_radar_performance_fact; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=1864962.35..1926416.31 rows=200 width=4) -> Sort (cost=1864962.35..1895689.33 rows=12290793 width=4) Sort Key: perf_reporting.day_scale_radar_performance_fact.probe_type_num -> Result (cost=0.00..249616.93 rows=12290793 width=4) -> Append (cost=0.00..249616.93 rows=12290793 width=4) -> Seq Scan on day_scale_radar_performance_fact (cost=0.00..19.90 rows=990 width=4) -> Seq Scan on day_scale_radar_performance_fact_20100401_0000 day_scale_radar_performance_fact (cost=0.00..31388.01 rows=1545501 width=4) -> Seq Scan on day_scale_radar_performance_fact_20100301_0000 day_scale_radar_performance_fact (cost=0.00..218209.02 rows=10744302 width=4) cedexis_v2=# explain select distinct probe_type_num from day_scale_radar_performance_fact_20100301_0000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=0.00..684328.92 rows=3 width=4) -> Index Scan using day_scale_radar_performance_fact_20100301_0000_pt on day_scale_radar_performance_fact_20100301_0000 (cost=0.00..657468.16 rows=10744302 width=4) And this is a lot closer to what I would hope the query planner would do: cedexis_v2=# explain select distinct probe_type_num from (select distinct probe_type_num from day_scale_radar_performance_fact_20100401_0000 union select distinct probe_type_num from day_scale_radar_performance_fact_20100301_0000) t2; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=781113.73..781113.84 rows=6 width=4) -> Unique (cost=781113.73..781113.76 rows=6 width=4) -> Sort (cost=781113.73..781113.75 rows=6 width=4) Sort Key: day_scale_radar_performance_fact_20100401_0000.probe_type_num -> Append (cost=0.00..781113.66 rows=6 width=4) -> Unique (cost=0.00..96784.68 rows=3 width=4) -> Index Scan using day_scale_radar_performance_fact_20100401_0000_pt on day_scale_radar_performance_fact_20100401_0000 (cost=0.00..92920.93 rows=1545501 width=4) -> Unique (cost=0.00..684328.92 rows=3 width=4) -> Index Scan using day_scale_radar_performance_fact_20100301_0000_pt on day_scale_radar_performance_fact_20100301_0000 (cost=0.00..657468.16 rows=10744302 width=4)
On Tue, Apr 6, 2010 at 5:37 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote: > In part, I'm surprised that the index scan takes as long as it does, > since I'd think an index would be able to return the set of keys > relatively quickly. But that's a secondary issue. We don't actually have a facility built into the index-scan machinery to scan for distinct keys. It's doing a full scan of the index and then unique-ifying the results afterward. It produces the right answers, but it's definitely not as fast as it could be. The planner is not as smart about partitioned tables as it could be, either. A scan of the partitioned tables is implemented as an append node with one member per partition; and the planner isn't very good at pushing things down through append nodes. ...Robert
Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts just ashort time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. Duringthese spikes, the system is completely unresponsive (you can't even login via ssh). I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - itshows 19 active postgress processes, but I think it missed the bulk of the spike. For some reason, every postgres backend suddenly decides (is told?) to do something. When this happens, the system becomeunusable for anywhere from ten seconds to a minute or so, depending on how much web traffic stacks up behind this event. We have two servers, one offline and one public, and they both do this, so it's not caused by actual web traffic (andthe Apache logs don't show any HTTP activity correlated with the spikes). I thought based on other posts that this might be a background-writer problem, but it's not I/O, it's all CPU as far as Ican tell. Any ideas where I can look to find what's triggering this? 8 CPUs, 8 GB memory 8-disk RAID10 (10k SATA) Postgres 8.3.0 Fedora 8, kernel is 2.6.24.4-64.fc8 Diffs from original postgres.conf: max_connections = 1000 shared_buffers = 2000MB work_mem = 256MB max_fsm_pages = 16000000 max_fsm_relations = 625000 synchronous_commit = off wal_buffers = 256kB checkpoint_segments = 30 effective_cache_size = 4GB escape_string_warning = off Thanks, Craig top - 11:24:59 up 81 days, 20:27, 4 users, load average: 0.98, 0.83, 0.92 Tasks: 366 total, 20 running, 346 sleeping, 0 stopped, 0 zombie Cpu(s): 30.6%us, 1.5%sy, 0.0%ni, 66.3%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8194800k total, 8118688k used, 76112k free, 36k buffers Swap: 2031608k total, 169348k used, 1862260k free, 7313232k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 18972 postgres 20 0 2514m 11m 8752 R 11 0.1 0:00.35 postmaster 10618 postgres 20 0 2514m 12m 9456 R 9 0.2 0:00.54 postmaster 10636 postgres 20 0 2514m 11m 9192 R 9 0.1 0:00.45 postmaster 25903 postgres 20 0 2514m 11m 8784 R 9 0.1 0:00.21 postmaster 10626 postgres 20 0 2514m 11m 8716 R 6 0.1 0:00.45 postmaster 10645 postgres 20 0 2514m 12m 9352 R 6 0.2 0:00.42 postmaster 10647 postgres 20 0 2514m 11m 9172 R 6 0.1 0:00.51 postmaster 18502 postgres 20 0 2514m 11m 9016 R 6 0.1 0:00.23 postmaster 10641 postgres 20 0 2514m 12m 9296 R 5 0.2 0:00.36 postmaster 10051 postgres 20 0 2514m 13m 10m R 4 0.2 0:00.70 postmaster 10622 postgres 20 0 2514m 12m 9216 R 4 0.2 0:00.39 postmaster 10640 postgres 20 0 2514m 11m 8592 R 4 0.1 0:00.52 postmaster 18497 postgres 20 0 2514m 11m 8804 R 4 0.1 0:00.25 postmaster 18498 postgres 20 0 2514m 11m 8804 R 4 0.1 0:00.22 postmaster 10341 postgres 20 0 2514m 13m 9m R 2 0.2 0:00.57 postmaster 10619 postgres 20 0 2514m 12m 9336 R 1 0.2 0:00.38 postmaster 15687 postgres 20 0 2321m 35m 35m R 0 0.4 8:36.12 postmaster
On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: > Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). > > I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - itshows 19 active postgress processes, but I think it missed the bulk of the spike. What does iostat 5 say during the jump? Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On 4/7/10 2:40 PM, Joshua D. Drake wrote: > On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: >> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). >> >> I managed to capture one such event using top(1) with the "batch" option as a background process. See output below -it shows 19 active postgress processes, but I think it missed the bulk of the spike. > > What does iostat 5 say during the jump? It's very hard to say ... I'll have to start a background job to watch for a day or so. While it's happening, you can'tlogin, and any open windows become unresponsive. I'll probably have to run it at high priority using nice(1) to getany data at all during the event. Would vmstat be informative? Thanks, Craig
Craig James wrote: > I managed to capture one such event using top(1) with the "batch" > option as a background process. See output below You should add "-c" to your batch top capture, then you'll be able to see what the individual postmaster processes are actually doing when things get stuck. > max_connections = 1000 > shared_buffers = 2000MB > work_mem = 256MB > Mem: 8194800k total, 8118688k used, 76112k free, 36k buffers > Swap: 2031608k total, 169348k used, 1862260k free, 7313232k cached These settings appear way too high for a server with 8GB of RAM. I'm not sure if max_connections is too large, or if it's work_mem that's too big, but one or both of them may need to be tuned way down from where they are now to get your memory usage under control. Your server might running out of RAM during the periods where it becomes unresponsive--that could be the system paging stuff out to swap, which isn't necessarily a high user of I/O but it will block things. Not having any memory used for buffers is never a good sign. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Craig James <craig_james@emolecules.com> writes: > Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). > I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - itshows 19 active postgress processes, but I think it missed the bulk of the spike. > Any ideas where I can look to find what's triggering this? > Postgres 8.3.0 ^^^^^ If it's really 8.3.0, try updating to 8.3.something-recent. We've fixed a whole lot of bugs since then. I have a suspicion that this might be an sinval overrun scenario, in which case you'd need to update to 8.4 to get a real fix. But updating in the 8.3 branch would be cheap and easy. If it is sinval overrun, it would presumably be triggered by a whole lot of catalog changes being made at approximately the same time. Can you correlate the spikes with anything like that? regards, tom lane
On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote: > On 4/7/10 2:40 PM, Joshua D. Drake wrote: > > On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: > >> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). > >> > >> I managed to capture one such event using top(1) with the "batch" option as a background process. See output below- it shows 19 active postgress processes, but I think it missed the bulk of the spike. > > > > What does iostat 5 say during the jump? > > It's very hard to say ... I'll have to start a background job to watch for a day or so. While it's happening, you can'tlogin, and any open windows become unresponsive. I'll probably have to run it at high priority using nice(1) to getany data at all during the event. Do you have sar runing? Say a sar -A ? > > Would vmstat be informative? Yes. My guess is that it is not CPU, it is IO and your CPU usage is all WAIT on IO. To have your CPUs so flooded that they are the cause of an inability to log in is pretty suspect. Joshua D. Drake > > Thanks, > Craig > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Wed, Apr 7, 2010 at 2:37 PM, Craig James <craig_james@emolecules.com> wrote: > Most of the time Postgres runs nicely, but two or three times a day we get a > huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 > CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike > events. During these spikes, the system is completely unresponsive (you > can't even login via ssh). You need to find out what all those Postgres processes are doing. You might try enabling update_process_title and then using ps to figure out what each instance is using. Otherwise, you might try enabling logging of commands that take a certain amount of time to run (see log_min_duration_statement). > I managed to capture one such event using top(1) with the "batch" option as > a background process. See output below - it shows 19 active postgress > processes, but I think it missed the bulk of the spike. Looks like it. The system doesn't appear to be overloaded at all at that point. > 8 CPUs, 8 GB memory > 8-disk RAID10 (10k SATA) > Postgres 8.3.0 Should definitely update to the latest 8.3.10 - 8.3 has a LOT of known bugs. > Fedora 8, kernel is 2.6.24.4-64.fc8 Wow, that is very old, too. > Diffs from original postgres.conf: > > max_connections = 1000 > shared_buffers = 2000MB > work_mem = 256MB work_mem is way too high for 1000 connections and 8GB ram. You could simply be starting up too many postgres processes and overwhelming the machine. Either significantly reduce max_connections or work_mem. > max_fsm_pages = 16000000 > max_fsm_relations = 625000 > synchronous_commit = off You are playing with fire here. You should never turn this off unless you do not care if your data becomes irrecoverably corrupted. > top - 11:24:59 up 81 days, 20:27, 4 users, load average: 0.98, 0.83, 0.92 > Tasks: 366 total, 20 running, 346 sleeping, 0 stopped, 0 zombie > Cpu(s): 30.6%us, 1.5%sy, 0.0%ni, 66.3%id, 1.5%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 8194800k total, 8118688k used, 76112k free, 36k buffers > Swap: 2031608k total, 169348k used, 1862260k free, 7313232k cached System load looks very much OK given that you have 8 CPUs. > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 18972 postgres 20 0 2514m 11m 8752 R 11 0.1 0:00.35 postmaster > 10618 postgres 20 0 2514m 12m 9456 R 9 0.2 0:00.54 postmaster > 10636 postgres 20 0 2514m 11m 9192 R 9 0.1 0:00.45 postmaster > 25903 postgres 20 0 2514m 11m 8784 R 9 0.1 0:00.21 postmaster > 10626 postgres 20 0 2514m 11m 8716 R 6 0.1 0:00.45 postmaster > 10645 postgres 20 0 2514m 12m 9352 R 6 0.2 0:00.42 postmaster > 10647 postgres 20 0 2514m 11m 9172 R 6 0.1 0:00.51 postmaster > 18502 postgres 20 0 2514m 11m 9016 R 6 0.1 0:00.23 postmaster > 10641 postgres 20 0 2514m 12m 9296 R 5 0.2 0:00.36 postmaster > 10051 postgres 20 0 2514m 13m 10m R 4 0.2 0:00.70 postmaster > 10622 postgres 20 0 2514m 12m 9216 R 4 0.2 0:00.39 postmaster > 10640 postgres 20 0 2514m 11m 8592 R 4 0.1 0:00.52 postmaster > 18497 postgres 20 0 2514m 11m 8804 R 4 0.1 0:00.25 postmaster > 18498 postgres 20 0 2514m 11m 8804 R 4 0.1 0:00.22 postmaster > 10341 postgres 20 0 2514m 13m 9m R 2 0.2 0:00.57 postmaster > 10619 postgres 20 0 2514m 12m 9336 R 1 0.2 0:00.38 postmaster > 15687 postgres 20 0 2321m 35m 35m R 0 0.4 8:36.12 postmaster Judging by the amount of CPU time each postmaster as accumulated, they are all fairly new processes. How many pg proceses of the ~350 currently running are there? -Dave
On 4/7/10 3:36 PM, Joshua D. Drake wrote: > On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote: >> On 4/7/10 2:40 PM, Joshua D. Drake wrote: >>> On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: >>>> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). >>>> >>>> I managed to capture one such event using top(1) with the "batch" option as a background process. See output below- it shows 19 active postgress processes, but I think it missed the bulk of the spike. >>> >>> What does iostat 5 say during the jump? >> >> It's very hard to say ... I'll have to start a background job to watch for a day or so. While it's happening, you can'tlogin, and any open windows become unresponsive. I'll probably have to run it at high priority using nice(1) to getany data at all during the event. > > Do you have sar runing? Say a sar -A ? No, I don't have it installed. I'll have a look. At first glance it looks like a combination of what I can get with "top-b" and vmstat, but with a single program. > My guess is that it is not CPU, it is IO and your CPU usage is all WAIT > on IO. > > To have your CPUs so flooded that they are the cause of an inability to > log in is pretty suspect. I thought so too, except that I can't login during the flood. If the CPUs were all doing iowaits, logging in should be easy. Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB system fits these symptoms -- if it's having aswap storm, login is effectively impossible. Craig > > Joshua D. Drake > > >> >> Thanks, >> Craig >> > >
On 4/7/10 2:59 PM, Tom Lane wrote: > Craig James<craig_james@emolecules.com> writes: >> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). >> I managed to capture one such event using top(1) with the "batch" option as a background process. See output below -it shows 19 active postgress processes, but I think it missed the bulk of the spike. > >> Any ideas where I can look to find what's triggering this? > >> Postgres 8.3.0 > ^^^^^ > > If it's really 8.3.0, try updating to 8.3.something-recent. We've fixed > a whole lot of bugs since then. Good advice, I've been meaning to do this, maybe this will be a kick in the pants to motivate me. > I have a suspicion that this might be an sinval overrun scenario, in > which case you'd need to update to 8.4 to get a real fix. But updating > in the 8.3 branch would be cheap and easy. > > If it is sinval overrun, it would presumably be triggered by a whole lot > of catalog changes being made at approximately the same time. Can you > correlate the spikes with anything like that? Not that I know of. Just regular web traffic. On the backup server these events happen occasionally even when there islittle or no web traffic, and nobody logged in doing maintenance. > > regards, tom lane >
>> ...Can you >> correlate the spikes with anything like that? > > Not that I know of. Just regular web traffic. On the backup server > these events happen occasionally even when there is little or no web > traffic, and nobody logged in doing maintenance. What, if anything, are you logging in the PostgreSQL logs? Anything interesting, there? Cheers, Steve
On Wed, Apr 7, 2010 at 3:57 PM, Craig James <craig_james@emolecules.com> wrote: > On 4/7/10 3:36 PM, Joshua D. Drake wrote: >> My guess is that it is not CPU, it is IO and your CPU usage is all WAIT >> on IO. >> >> To have your CPUs so flooded that they are the cause of an inability to >> log in is pretty suspect. > > I thought so too, except that I can't login during the flood. If the CPUs > were all doing iowaits, logging in should be easy. No - logging in with high iowait is very harder to do than high CPU time because of latency of disk access. > Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB > system fits these symptoms -- if it's having a swap storm, login is > effectively impossible. A swap storm effectively puts the machine into very high iowait time. -Dave
Craig James <craig_james@emolecules.com> writes: > On 4/7/10 3:36 PM, Joshua D. Drake wrote: >> To have your CPUs so flooded that they are the cause of an inability to >> log in is pretty suspect. > I thought so too, except that I can't login during the flood. If the CPUs were all doing iowaits, logging in should beeasy. > Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB system fits these symptoms -- if it's havinga swap storm, login is effectively impossible. Yeah, but there is also the question of what's causing all the backends to try to run at the same time. Oversubscribed memory could well be the direct cause of the machine getting driven into the ground, but there's something else going on here too IMO. Anyway I concur with the advice to lower shared_buffers, and run fewer backends if possible, to see if that ameliorates the problem. regards, tom lane
On Wed, Apr 7, 2010 at 6:56 PM, David Rees <drees76@gmail.com> wrote: >> max_fsm_pages = 16000000 >> max_fsm_relations = 625000 >> synchronous_commit = off > > You are playing with fire here. You should never turn this off unless > you do not care if your data becomes irrecoverably corrupted. That is not correct. Turning off synchronous_commit is sensible if you don't mind losing the last few transactions on a crash. What will corrupt your database is if you turn off fsync. ...Robert
On 4/7/10 5:47 PM, Robert Haas wrote: > On Wed, Apr 7, 2010 at 6:56 PM, David Rees<drees76@gmail.com> wrote: >>> max_fsm_pages = 16000000 >>> max_fsm_relations = 625000 >>> synchronous_commit = off >> >> You are playing with fire here. You should never turn this off unless >> you do not care if your data becomes irrecoverably corrupted. > > That is not correct. Turning off synchronous_commit is sensible if > you don't mind losing the last few transactions on a crash. What will > corrupt your database is if you turn off fsync. A bit off the original topic, but ... I set it this way because I was advised that with a battery-backed RAID controller, this was a safe setting. Is that notthe case? Craig
On Wed, Apr 7, 2010 at 7:06 PM, Craig James <craig_james@emolecules.com> wrote: > On 4/7/10 5:47 PM, Robert Haas wrote: >> On Wed, Apr 7, 2010 at 6:56 PM, David Rees<drees76@gmail.com> wrote: >>>> synchronous_commit = off >>> >>> You are playing with fire here. You should never turn this off unless >>> you do not care if your data becomes irrecoverably corrupted. >> >> That is not correct. Turning off synchronous_commit is sensible if >> you don't mind losing the last few transactions on a crash. What will >> corrupt your database is if you turn off fsync. Whoops, you're right. > A bit off the original topic, but ... > > I set it this way because I was advised that with a battery-backed RAID > controller, this was a safe setting. Is that not the case? Robert has it right - with synchronous_commit off, your database will always be consistent, but you may lose transactions in the event of a crash. Doesn't matter if you have a BBU or not - all the BBU does is give the controller the ability to acknowledge a write without the data actually having been written to disk. According to the documentation, with synchronous_commit off and a default wal_writer_delay of 200ms, it's possible to lose up to a maximum of 600ms of data you thought were written to disk. -Dave
On Wed, Apr 7, 2010 at 10:50 PM, David Rees <drees76@gmail.com> wrote: > On Wed, Apr 7, 2010 at 7:06 PM, Craig James <craig_james@emolecules.com> wrote: >> On 4/7/10 5:47 PM, Robert Haas wrote: >>> On Wed, Apr 7, 2010 at 6:56 PM, David Rees<drees76@gmail.com> wrote: >>>>> synchronous_commit = off >>>> >>>> You are playing with fire here. You should never turn this off unless >>>> you do not care if your data becomes irrecoverably corrupted. >>> >>> That is not correct. Turning off synchronous_commit is sensible if >>> you don't mind losing the last few transactions on a crash. What will >>> corrupt your database is if you turn off fsync. > > Whoops, you're right. > >> A bit off the original topic, but ... >> >> I set it this way because I was advised that with a battery-backed RAID >> controller, this was a safe setting. Is that not the case? > > Robert has it right - with synchronous_commit off, your database will > always be consistent, but you may lose transactions in the event of a > crash. > > Doesn't matter if you have a BBU or not - all the BBU does is give the > controller the ability to acknowledge a write without the data > actually having been written to disk. > > According to the documentation, with synchronous_commit off and a > default wal_writer_delay of 200ms, it's possible to lose up to a > maximum of 600ms of data you thought were written to disk. So, IOW, if you're running a social networking web site and your database is full of status updates sent by teenagers to other teenagers, you might judge that turning off synchronous_commit is a reasonable thing to do, if you need the performance. If you're running a bank and your database is full of information on wire transfers sent and received, not so much. ...Robert
On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: > Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). > > I managed to capture one such event using top(1) with the "batch" option as a background process. See output below - itshows 19 active postgress processes, but I think it missed the bulk of the spike. What does iostat 5 say during the jump? Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote: > On 4/7/10 2:40 PM, Joshua D. Drake wrote: > > On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote: > >> Most of the time Postgres runs nicely, but two or three times a day we get a huge spike in the CPU load that lasts justa short time -- it jumps to 10-20 CPU loads. Today it hit 100 CPU loads. Sometimes days go by with no spike events. During these spikes, the system is completely unresponsive (you can't even login via ssh). > >> > >> I managed to capture one such event using top(1) with the "batch" option as a background process. See output below- it shows 19 active postgress processes, but I think it missed the bulk of the spike. > > > > What does iostat 5 say during the jump? > > It's very hard to say ... I'll have to start a background job to watch for a day or so. While it's happening, you can'tlogin, and any open windows become unresponsive. I'll probably have to run it at high priority using nice(1) to getany data at all during the event. Do you have sar runing? Say a sar -A ? > > Would vmstat be informative? Yes. My guess is that it is not CPU, it is IO and your CPU usage is all WAIT on IO. To have your CPUs so flooded that they are the cause of an inability to log in is pretty suspect. Joshua D. Drake > > Thanks, > Craig > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
David Rees wrote: > You need to find out what all those Postgres processes are doing. You > might try enabling update_process_title and then using ps to figure > out what each instance is using. That's what the addition of "-c" to top I suggested does on Linux; it shows the updated process titles where the command line is in the default config. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Wednesday 07 April 2010, Craig James <craig_james@emolecules.com> wrote: > I thought so too, except that I can't login during the flood. If the > CPUs were all doing iowaits, logging in should be easy. Busying out the drives is about the most reliable way to make logging in very slow (especially, but not only, if it's due to swapping).
Craig James wrote: > On 4/7/10 5:47 PM, Robert Haas wrote: > > On Wed, Apr 7, 2010 at 6:56 PM, David Rees<drees76@gmail.com> wrote: > >>> max_fsm_pages = 16000000 > >>> max_fsm_relations = 625000 > >>> synchronous_commit = off > >> > >> You are playing with fire here. You should never turn this off unless > >> you do not care if your data becomes irrecoverably corrupted. > > > > That is not correct. Turning off synchronous_commit is sensible if > > you don't mind losing the last few transactions on a crash. What will > > corrupt your database is if you turn off fsync. > > A bit off the original topic, but ... > > I set it this way because I was advised that with a battery-backed > RAID controller, this was a safe setting. Is that not the case? To get good performance, you can either get a battery-backed RAID controller or risk losing a few transaction with synchronous_commit = off. If you already have a battery-backed RAID controller, there is little benefit to turning synchronous_commit off, and some major downsides (possible data loss). -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
I'm reviving this question because I never figured it out. To summarize: At random intervals anywhere from a few times perhour to once or twice a day, we see a huge spike in CPU load that essentially brings the system to a halt for up to aminute or two. Previous answers focused on "what is it doing", i.e. is it really Postgres or something else? Now the question has narrowed down to this: what could trigger EVERY postgres backend to do something at the same time? See the attached output from "top -b", which shows what is happening during one of the CPU spikes. A little background about our system. We have roughly 100 FastCGI clients connected at all times that are called on to generateimages from data in the database. Even though there are a lot of these, they don't do much. They sit there mostof the time, then they spew out a couple dozen GIF images in about one second as a user gets a new page of data. EachGIF image requires fetching a single row using a single indexed column, so it's a trival amount of work for Postgres. We also have the "heavy lift" application that does the search. Typically one or two of these is running at a time, andtakes from a fraction of a second to a few minutes to complete. In this particular instance, immediately before thisspike, the CPU load was only at about 10% -- a couple users poking around with easy queries. So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real loadon the server? Maybe if a backend crashes? Some other problem? There's nothing in the serverlog. Thanks, Craig top - 12:15:09 up 81 days, 21:18, 4 users, load average: 0.38, 0.38, 0.73 Tasks: 374 total, 95 running, 279 sleeping, 0 stopped, 0 zombie Cpu(s): 62.5%us, 2.2%sy, 0.0%ni, 34.9%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8194800k total, 7948928k used, 245872k free, 36k buffers Swap: 2031608k total, 161136k used, 1870472k free, 7129744k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22120 postgres 20 0 2514m 17m 13m R 11 0.2 0:01.02 postmaster 18497 postgres 20 0 2514m 11m 8832 R 6 0.1 0:00.62 postmaster 22962 postgres 20 0 2514m 12m 9548 R 6 0.2 0:00.22 postmaster 24002 postgres 20 0 2514m 11m 8804 R 6 0.1 0:00.15 postmaster 25900 postgres 20 0 2514m 11m 8824 R 6 0.1 0:00.55 postmaster 8941 postgres 20 0 2324m 6172 4676 R 5 0.1 0:00.32 postmaster 10622 postgres 20 0 2514m 12m 9444 R 5 0.2 0:00.79 postmaster 14021 postgres 20 0 2514m 11m 8548 R 5 0.1 0:00.28 postmaster 14075 postgres 20 0 2514m 11m 8672 R 5 0.1 0:00.27 postmaster 14423 postgres 20 0 2514m 11m 8572 R 5 0.1 0:00.29 postmaster 18896 postgres 20 0 2324m 5644 4204 R 5 0.1 0:00.11 postmaster 18897 postgres 20 0 2514m 12m 9800 R 5 0.2 0:00.27 postmaster 18928 postgres 20 0 2514m 11m 8792 R 5 0.1 0:00.18 postmaster 18973 postgres 20 0 2514m 11m 8792 R 5 0.1 0:00.70 postmaster 22049 postgres 20 0 2514m 17m 14m R 5 0.2 0:01.11 postmaster 22050 postgres 20 0 2514m 16m 13m R 5 0.2 0:01.06 postmaster 22843 postgres 20 0 2514m 12m 9328 R 5 0.2 0:00.20 postmaster 24202 postgres 20 0 2324m 5560 4120 R 5 0.1 0:00.07 postmaster 24388 postgres 20 0 2514m 12m 9380 R 5 0.2 0:00.16 postmaster 25903 postgres 20 0 2514m 11m 8828 R 5 0.1 0:00.55 postmaster 28362 postgres 20 0 2514m 11m 8952 R 5 0.1 0:00.48 postmaster 5667 postgres 20 0 2324m 6752 5588 R 4 0.1 0:08.93 postmaster 7531 postgres 20 0 2324m 5452 4008 R 4 0.1 0:03.21 postmaster 9219 postgres 20 0 2514m 11m 8476 R 4 0.1 0:00.89 postmaster 9820 postgres 20 0 2514m 12m 9.9m R 4 0.2 0:00.92 postmaster 10050 postgres 20 0 2324m 6172 4676 R 4 0.1 0:00.31 postmaster 10645 postgres 20 0 2514m 12m 9512 R 4 0.2 0:00.72 postmaster 14582 postgres 20 0 2514m 25m 21m R 4 0.3 0:02.10 postmaster 18502 postgres 20 0 2514m 11m 9040 R 4 0.1 0:00.64 postmaster 18972 postgres 20 0 2514m 11m 8792 R 4 0.1 0:00.76 postmaster 18975 postgres 20 0 2514m 11m 8904 R 4 0.1 0:00.63 postmaster 19496 postgres 20 0 2514m 14m 11m R 4 0.2 0:00.44 postmaster 22121 postgres 20 0 2514m 16m 13m R 4 0.2 0:00.81 postmaster 24340 postgres 20 0 2514m 12m 9424 R 4 0.2 0:00.15 postmaster 24483 postgres 20 0 2324m 6008 4536 R 4 0.1 0:00.21 postmaster 25668 postgres 20 0 2514m 16m 13m R 4 0.2 0:00.91 postmaster 26382 postgres 20 0 2514m 11m 8996 R 4 0.1 0:00.50 postmaster 28363 postgres 20 0 2514m 11m 8908 R 4 0.1 0:00.34 postmaster 9754 postgres 20 0 2514m 11m 8752 R 3 0.1 0:00.29 postmaster 16113 postgres 20 0 2514m 17m 14m R 3 0.2 0:01.10 postmaster 18498 postgres 20 0 2514m 11m 8844 R 3 0.1 0:00.63 postmaster 18500 postgres 20 0 2514m 11m 8812 R 3 0.1 0:00.66 postmaster 22116 postgres 20 0 2514m 17m 13m R 3 0.2 0:01.05 postmaster 22287 postgres 20 0 2324m 6072 4596 R 3 0.1 0:00.24 postmaster 22425 postgres 20 0 2514m 17m 14m R 3 0.2 0:01.02 postmaster 22827 postgres 20 0 2514m 13m 10m R 3 0.2 0:00.43 postmaster 23285 postgres 20 0 2514m 13m 10m R 3 0.2 0:00.40 postmaster 24384 postgres 20 0 2514m 12m 9300 R 3 0.2 0:00.15 postmaster 30501 postgres 20 0 2514m 11m 9012 R 3 0.1 0:00.47 postmaster 5665 postgres 20 0 2324m 6528 5396 R 2 0.1 0:08.71 postmaster 5671 postgres 20 0 2324m 6720 5596 R 2 0.1 0:08.73 postmaster 7428 postgres 20 0 2324m 6176 4928 R 2 0.1 0:07.37 postmaster 7431 postgres 20 0 2324m 6140 4920 R 2 0.1 0:07.40 postmaster 7433 postgres 20 0 2324m 6372 4924 R 2 0.1 0:07.29 postmaster 7525 postgres 20 0 2324m 5468 4024 R 2 0.1 0:03.36 postmaster 7530 postgres 20 0 2324m 5452 4008 R 2 0.1 0:03.40 postmaster 7532 postgres 20 0 2324m 5440 3996 R 2 0.1 0:03.23 postmaster 7533 postgres 20 0 2324m 5484 4040 R 2 0.1 0:03.25 postmaster 8944 postgres 20 0 2514m 26m 23m R 2 0.3 0:02.16 postmaster 8946 postgres 20 0 2514m 26m 22m R 2 0.3 0:02.06 postmaster 9821 postgres 20 0 2514m 12m 9948 R 2 0.2 0:00.93 postmaster 10051 postgres 20 0 2514m 13m 10m R 2 0.2 0:01.03 postmaster 10226 postgres 20 0 2514m 27m 23m R 2 0.3 0:02.24 postmaster 10626 postgres 20 0 2514m 12m 9212 R 2 0.1 0:00.83 postmaster 14580 postgres 20 0 2324m 6120 4632 R 2 0.1 0:00.27 postmaster 16112 postgres 20 0 2514m 18m 14m R 2 0.2 0:01.18 postmaster 19450 postgres 20 0 2324m 6108 4620 R 2 0.1 0:00.22 postmaster 22289 postgres 20 0 2514m 22m 19m R 2 0.3 0:01.66 postmaster 5663 postgres 20 0 2324m 6700 5576 R 1 0.1 0:08.23 postmaster 7526 postgres 20 0 2324m 5444 4000 R 1 0.1 0:03.44 postmaster 7528 postgres 20 0 2324m 5444 4000 R 1 0.1 0:03.44 postmaster 7529 postgres 20 0 2324m 5420 3976 R 1 0.1 0:03.04 postmaster 8888 postgres 20 0 2514m 25m 22m R 1 0.3 0:02.01 postmaster 9622 postgres 20 0 2514m 13m 10m R 1 0.2 0:01.08 postmaster 9625 postgres 20 0 2514m 13m 10m R 1 0.2 0:01.00 postmaster 14686 postgres 20 0 2324m 6116 4628 R 1 0.1 0:00.30 postmaster 14687 postgres 20 0 2514m 24m 21m R 1 0.3 0:01.95 postmaster 16111 postgres 20 0 2514m 17m 14m R 1 0.2 0:01.01 postmaster 16854 postgres 20 0 2324m 5468 4024 R 1 0.1 0:03.31 postmaster 5664 postgres 20 0 2324m 6740 5584 R 0 0.1 0:08.45 postmaster 5666 postgres 20 0 2324m 6744 5584 R 0 0.1 0:08.70 postmaster 5668 postgres 20 0 2324m 6720 5588 R 0 0.1 0:08.58 postmaster 5670 postgres 20 0 2324m 6748 5584 R 0 0.1 0:08.99 postmaster 5672 postgres 20 0 2324m 6764 5596 R 0 0.1 0:08.30 postmaster 7429 postgres 20 0 2324m 6000 4760 R 0 0.1 0:07.41 postmaster 7430 postgres 20 0 2324m 6080 4928 R 0 0.1 0:07.09 postmaster 7463 postgres 20 0 2324m 6412 4928 R 0 0.1 0:07.14 postmaster 7538 postgres 20 0 2324m 5472 4028 R 0 0.1 0:03.42 postmaster 8887 postgres 20 0 2324m 6184 4680 R 0 0.1 0:00.23 postmaster 8942 postgres 20 0 2514m 26m 22m R 0 0.3 0:01.97 postmaster 10636 postgres 20 0 2514m 12m 9380 R 0 0.2 0:00.75 postmaster 10640 postgres 20 0 2514m 11m 9148 R 0 0.1 0:00.75 postmaster 15687 postgres 20 0 2321m 35m 35m R 0 0.4 8:38.38 postmaster
On Thu, 2010-06-24 at 17:50 -0700, Craig James wrote: > I'm reviving this question because I never figured it out. To summarize: At random intervals anywhere from a few timesper hour to once or twice a day, we see a huge spike in CPU load that essentially brings the system to a halt for upto a minute or two. Previous answers focused on "what is it doing", i.e. is it really Postgres or something else? > > Now the question has narrowed down to this: what could trigger EVERY postgres backend to do something at the same time? See the attached output from "top -b", which shows what is happening during one of the CPU spikes. checkpoint causing IO Wait. What does sar say about these times? Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering
Craig James wrote: > Now the question has narrowed down to this: what could trigger EVERY > postgres backend to do something at the same time? See the attached > output from "top -b", which shows what is happening during one of the > CPU spikes. By the way: you probably want "top -b -c", which will actually show you what each client is doing via inspecting what it's set its command line to. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Craig James <craig_james@emolecules.com> writes: > So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real loadon the server? Maybe if a backend crashes? Some other problem? sinval queue overflow comes to mind ... although that really shouldn't happen if there's "no real load" on the server. What PG version is this? Also, the pg_stat_activity view contents when this happens would probably be more useful to look at than "top" output. regards, tom lane
On Thu, 2010-06-24 at 17:50 -0700, Craig James wrote: > I'm reviving this question because I never figured it out. To summarize: At random intervals anywhere from a few timesper hour to once or twice a day, we see a huge spike in CPU load that essentially brings the system to a halt for upto a minute or two. Previous answers focused on "what is it doing", i.e. is it really Postgres or something else? > > Now the question has narrowed down to this: what could trigger EVERY postgres backend to do something at the same time? See the attached output from "top -b", which shows what is happening during one of the CPU spikes. checkpoint causing IO Wait. What does sar say about these times? Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering
On 6/24/10 9:04 PM, Tom Lane wrote: > Craig James<craig_james@emolecules.com> writes: >> So what is it that will cause every single Postgres backend to come to life at the same moment, when there's no real loadon the server? Maybe if a backend crashes? Some other problem? > > sinval queue overflow comes to mind ... although that really shouldn't > happen if there's "no real load" on the server. What PG version is > this? 8.3.10. Upgraded based on your advice when I first asked this question. > Also, the pg_stat_activity view contents when this happens would > probably be more useful to look at than "top" output. I'll try. It's hard to discover anything because the whole machine is overwhelmed when this happens. The only way I gotthe top(1) output was by running it high priority as root using nice(1). I can't do that with a Postgres backend, butI'll see what I can do. Craig
Craig James <craig_james@emolecules.com> writes: > On 6/24/10 9:04 PM, Tom Lane wrote: >> sinval queue overflow comes to mind ... although that really shouldn't >> happen if there's "no real load" on the server. What PG version is >> this? > 8.3.10. Upgraded based on your advice when I first asked this question. Any chance of going to 8.4? If this is what I suspect, you really need this 8.4 fix: http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php which eliminated the thundering-herd behavior that previous releases exhibit when the sinval queue overflows. If you're stuck on 8.3 then you are going to have to modify your application's behavior to eliminate sinval overflows. If the overall system load isn't high then I would have to guess that the problem is some individual sessions sitting "idle in transaction" for long periods, long enough that a number of DDL operations happen elsewhere. You could also consider throwing memory at the problem by raising the sinval queue size. That'd require a custom build since it's not exposed as a configurable parameter, but it'd be a one-line patch I think. Or you could look at using connection pooling so you don't have quite so many backends ... regards, tom lane
Dear Criag,
also check for the possibility of installing sysstat in our system.
it goes a long way in collecting the system stats. you may
consider increasing the frequency of data collection by
changing the interval of cron job manually in /etc/cron.d/
normally its */10 , you may make it */2 for time being.
the software automatically maintains historical records
of data for 1 month.
also check for the possibility of installing sysstat in our system.
it goes a long way in collecting the system stats. you may
consider increasing the frequency of data collection by
changing the interval of cron job manually in /etc/cron.d/
normally its */10 , you may make it */2 for time being.
the software automatically maintains historical records
of data for 1 month.
On 6/25/10 7:47 AM, Tom Lane wrote: > Craig James<craig_james@emolecules.com> writes: >> On 6/24/10 9:04 PM, Tom Lane wrote: >>> sinval queue overflow comes to mind ... although that really shouldn't >>> happen if there's "no real load" on the server. What PG version is >>> this? > >> 8.3.10. Upgraded based on your advice when I first asked this question. > > Any chance of going to 8.4? If this is what I suspect, you really need > this 8.4 fix: > http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php > which eliminated the thundering-herd behavior that previous releases > exhibit when the sinval queue overflows. Yes, there is a chance of upgrading to 8.4.4. I just bought a new server and it has 8.4.4 on it, but it won't be onlinefor a while so I can't compare yet. This may motivate me to upgrade the current servers to 8.4.4 too. I was pleasedto see that 8.4 has a new upgrade-in-place feature that means we don't have to dump/restore. That really helps alot. A question about 8.4.4: I've been having problems with bloat. I thought I'd adjusted the FSM parameters correctly basedon advice I got here, but apparently not. 8.4.4 has removed the configurable FSM parameters completely, which is verycool. But ... if I upgrade a bloated database using the upgrade-in-place feature, will 8.4.4 recover the bloat and returnit to the OS, or do I still have to recover the space manually (like vacuum-full/reindex, or cluster, or copy/dropa table)? > Or you could look at using connection pooling so you don't have quite > so many backends ... I always just assumed that lots of backends that would be harmless if each one was doing very little. If I understand yourexplanation, it sounds like that's not entirely true in pre-8.4.4 releases due to the sinval queue problems. Thanks, Craig
Craig James <craig_james@emolecules.com> wrote: > I always just assumed that lots of backends that would be harmless > if each one was doing very little. Even if each is doing very little, if a large number of them happen to make a request at the same time, you can have problems. This is exactly where a connection pool can massively improve both throughput and response time. If you can arrange it, you want a connection pool which will put a limit on active database transactions and queue requests to start a new transaction until one of the pending ones finishes. -Kevin
Craig James wrote: > if I upgrade a bloated database using the upgrade-in-place feature, > will 8.4.4 recover the bloat and return it to the OS, or do I still > have to recover the space manually (like vacuum-full/reindex, or > cluster, or copy/drop a table)? There's no way for an upgrade in place to do anything about bloat. The changes in 8.4 reduce the potential sources for new bloat (like running out of a FSM pages), and the overhead of running VACUUM drops some due to things like the "Partial VACUUM" changes. But existing bloated tables and indexes are moved forward to the new version without any change. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Craig James <craig_james@emolecules.com> writes: > On 6/25/10 7:47 AM, Tom Lane wrote: >> Any chance of going to 8.4? If this is what I suspect, you really need >> this 8.4 fix: >> http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php >> which eliminated the thundering-herd behavior that previous releases >> exhibit when the sinval queue overflows. > Yes, there is a chance of upgrading to 8.4.4. I just bought a new server and it has 8.4.4 on it, but it won't be onlinefor a while so I can't compare yet. This may motivate me to upgrade the current servers to 8.4.4 too. I was pleasedto see that 8.4 has a new upgrade-in-place feature that means we don't have to dump/restore. That really helps alot. I wouldn't put a lot of faith in pg_migrator for an 8.3 to 8.4 conversion ... it might work, but test it on a copy of your DB first. Possibly it'll actually be recommendable in 9.0. > A question about 8.4.4: I've been having problems with bloat. I thought I'd adjusted the FSM parameters correctly basedon advice I got here, but apparently not. 8.4.4 has removed the configurable FSM parameters completely, which is verycool. But ... if I upgrade a bloated database using the upgrade-in-place feature, will 8.4.4 recover the bloat and returnit to the OS, or do I still have to recover the space manually (like vacuum-full/reindex, or cluster, or copy/dropa table)? No, an in-place upgrade to 8.4 isn't magically going to fix that. This might actually be sufficient reason to stick with the tried&true dump and reload method, since you're going to have to do something fairly expensive anyway to clean out the bloat. regards, tom lane
On 6/25/10 9:41 AM, Kevin Grittner wrote: > Craig James<craig_james@emolecules.com> wrote: > >> I always just assumed that lots of backends that would be harmless >> if each one was doing very little. > > Even if each is doing very little, if a large number of them happen > to make a request at the same time, you can have problems. This is > exactly where a connection pool can massively improve both > throughput and response time. If you can arrange it, you want a > connection pool which will put a limit on active database > transactions and queue requests to start a new transaction until one > of the pending ones finishes. No, that's doesn't seem to be the case. There is no external activity that triggers this huge spike in usage. It even happensto our backup server when only one of us is using it to do a single query. This problem seems to be triggered byPostgres itself, not by anything external. Per Tom's suggestion, I think upgrading to 8.4.4 is the answer. I'll learn more when our new hardware comes into use witha shiny new 8.4.4 installation. Craig
I've got a new server and want to make sure it's running well. Are these pretty decent numbers? 8 cores (2x4 Intel Nehalem 2 GHz) 12 GB memory 12 x 7200 SATA 500 GB disks 3WARE 9650SE-12ML RAID controller with BBU WAL on ext2, 2 disks: RAID1 500GB, blocksize=4096 Database on ext4, 8 disks: RAID10 2TB, stripe size 64K, blocksize=4096 Ubuntu 10.04 LTS (Lucid) Postgres 8.4.4 pgbench -i -s 100 -U test pgbench -c 5 -t 20000 -U test tps = 4903 pgbench -c 10 -t 10000 -U test tps = 4070 pgbench -c20 -t 5000 -U test tps = 5789 pgbench -c30 -t 3333 -U test tps = 6961 pgbench -c40 -t 2500 -U test tps = 2945 Thanks, Craig
On Fri, Jun 25, 2010 at 2:53 PM, Craig James <craig_james@emolecules.com> wrote: > I've got a new server and want to make sure it's running well. Are these > pretty decent numbers? > > 8 cores (2x4 Intel Nehalem 2 GHz) > 12 GB memory > 12 x 7200 SATA 500 GB disks > 3WARE 9650SE-12ML RAID controller with BBU > WAL on ext2, 2 disks: RAID1 500GB, blocksize=4096 > Database on ext4, 8 disks: RAID10 2TB, stripe size 64K, blocksize=4096 > Ubuntu 10.04 LTS (Lucid) > Postgres 8.4.4 > > pgbench -i -s 100 -U test > pgbench -c 5 -t 20000 -U test > tps = 4903 > pgbench -c 10 -t 10000 -U test > tps = 4070 > pgbench -c20 -t 5000 -U test > tps = 5789 > pgbench -c30 -t 3333 -U test > tps = 6961 > pgbench -c40 -t 2500 -U test > tps = 2945 Numbers are okay, but you likely need much longer tests to see how they average out with the bgwriter / checkpoints happening, and keep track of your IO numbers to see where your dips are. I usually run pgbench runs, once they seem to get decent numbers, for several hours non-stop. Sometimes days during burn in. Note that running pgbench on a machine other than the actual db is often a good idea so you're not measuring how fast pgbench can run in contention with your own database.
Craig James wrote: > I've got a new server and want to make sure it's running well. Any changes to the postgresql.conf file? Generally you need at least a moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments (32 or higher) in order for the standard pgbench test to give good results. > pgbench -c20 -t 5000 -U test > tps = 5789 > pgbench -c30 -t 3333 -U test > tps = 6961 > pgbench -c40 -t 2500 -U test > tps = 2945 General numbers are OK, the major drop going from 30 to 40 clients is larger than it should be. I'd suggest running the 40 client count one again to see if that's consistent. If it is, that may just be pgbench itself running into a problem. It doesn't handle high client counts very well unless you use the 9.0 version that supports multiple pgbench workers with the "-j" option. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 6/25/10 12:03 PM, Greg Smith wrote: > Craig James wrote: >> I've got a new server and want to make sure it's running well. > > Any changes to the postgresql.conf file? Generally you need at least a > moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments > (32 or higher) in order for the standard pgbench test to give good results. max_connections = 500 shared_buffers = 1000MB work_mem = 128MB synchronous_commit = off full_page_writes = off wal_buffers = 256kB checkpoint_segments = 30 effective_cache_size = 4GB For fun I ran it with the installation defaults, and it never got above 1475 TPS. >> pgbench -c20 -t 5000 -U test >> tps = 5789 >> pgbench -c30 -t 3333 -U test >> tps = 6961 >> pgbench -c40 -t 2500 -U test >> tps = 2945 > > General numbers are OK, the major drop going from 30 to 40 clients is > larger than it should be. I'd suggest running the 40 client count one > again to see if that's consistent. It is consistent. When I run pgbench from a different server, I get this: pgbench -c40 -t 2500 -U test tps = 7999 pgbench -c100 -t 1000 -U test tps = 6693 Craig
On Mon, Jun 28, 2010 at 1:12 PM, Craig James <craig_james@emolecules.com> wrote: > On 6/25/10 12:03 PM, Greg Smith wrote: >> >> Craig James wrote: >>> >>> I've got a new server and want to make sure it's running well. >> >> Any changes to the postgresql.conf file? Generally you need at least a >> moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments >> (32 or higher) in order for the standard pgbench test to give good >> results. > > max_connections = 500 > shared_buffers = 1000MB > work_mem = 128MB > synchronous_commit = off > full_page_writes = off > wal_buffers = 256kB > checkpoint_segments = 30 > effective_cache_size = 4GB > > For fun I ran it with the installation defaults, and it never got above 1475 > TPS. > >>> pgbench -c20 -t 5000 -U test >>> tps = 5789 >>> pgbench -c30 -t 3333 -U test >>> tps = 6961 >>> pgbench -c40 -t 2500 -U test >>> tps = 2945 >> >> General numbers are OK, the major drop going from 30 to 40 clients is >> larger than it should be. I'd suggest running the 40 client count one >> again to see if that's consistent. > > It is consistent. When I run pgbench from a different server, I get this: > > pgbench -c40 -t 2500 -U test > tps = 7999 > > pgbench -c100 -t 1000 -U test > tps = 6693 6k tps over 8 7200 rpm disks is quite good imo. synchronous_commit setting is making that possible. building a server that could handle that much was insanely expensive just a few years ago, on relatively cheap sorage. that's 21m transactions an hour or ~ half a billion transactions a day (!). running this kind of load 24x7 on postgres 7.x would have been an enormous headache. how quickly we forget! :-) your 'real' tps write, 1475 tps, spread over 4 disks doing the actual writing, is giving you ~ 370 tps/device. not bad at all -- the raid controller is doing a good job (the raw drive might get 200 or so). I bet performance will be somewhat worse with a higher scaling factor (say, 500) because there is less locality of writes -- something to consider if you expect your database to get really big. merlin
Craig James wrote: > synchronous_commit = off > full_page_writes = off I don't have any numbers handy on how much turning synchronous_commit and full_page_writes off improves performance on a system with a battery-backed write cache. Your numbers are therefore a bit inflated against similar ones that are doing a regular sync commit. Just something to keep in mind when comparing against other people's results. Also, just as a general comment, increase in work_mem and effective_cache_size don't actually do anything to the built-in pgbench test results. >> General numbers are OK, the major drop going from 30 to 40 clients is >> larger than it should be. I'd suggest running the 40 client count one >> again to see if that's consistent. > > It is consistent. When I run pgbench from a different server, I get > this: > > pgbench -c40 -t 2500 -U test > tps = 7999 > > pgbench -c100 -t 1000 -U test > tps = 6693 Looks like you're just running into the limitations of the old pgbench code failing to keep up with high client count loads when run on the same system as the server. Nothing to be concerned about--that the drop is only small with the pgbench client remote says there's not actually a server problem here. With that sorted out, your system looks in the normal range for the sort of hardware you're using. I'm always concerned about the potential reliability issues that come with async commit and turning off full page writes though, so you might want to re-test with those turned on and see if you can live with the results. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
I can query either my PARENT table joined to PRICES, or my VERSION table joined to PRICES, and get an answer in 30-40 msec. But put the two together, it jumps to 4 seconds. What am I missing here? I figured this query would be nearly instantaneous. The VERSION.ISOSMILES and PARENT.ISOSMILES columns both have unique indexes. Instead of using these indexes,it's doing a full-table scan of both tables, even though there can't possibly be more than one match in each table. I guess I could rewrite this as a UNION of the two subqueries, but that seems contrived. This is PG 8.3.10 on Linux. Thanks, Craig => explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id -> from plus p join sample s -> on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) -> join version vn on (s.version_id = vn.version_id) join parent pn -> on (s.parent_id = pn.parent_id) -> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' -> or pn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' -> order by price; Sort (cost=71922.00..71922.00 rows=1 width=19) (actual time=4337.114..4337.122 rows=10 loops=1) Sort Key: p.price Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=18407.53..71921.99 rows=1 width=19) (actual time=1122.685..4337.028 rows=10 loops=1) -> Hash Join (cost=18407.53..71903.71 rows=4 width=20) (actual time=1122.624..4336.682 rows=7 loops=1) Hash Cond: (s.version_id = vn.version_id) Join Filter: ((vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text) OR (pn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text)) -> Hash Join (cost=8807.15..44470.73 rows=620264 width=54) (actual time=431.501..2541.329 rows=620264 loops=1) Hash Cond: (s.parent_id = pn.parent_id) -> Seq Scan on sample s (cost=0.00..21707.64 rows=620264 width=24) (actual time=0.008..471.340 rows=620264loops=1) -> Hash (cost=5335.40..5335.40 rows=277740 width=38) (actual time=431.166..431.166 rows=277740 loops=1) -> Seq Scan on parent pn (cost=0.00..5335.40 rows=277740 width=38) (actual time=0.012..195.822rows=277740 loops=1) -> Hash (cost=5884.06..5884.06 rows=297306 width=38) (actual time=467.267..467.267 rows=297306 loops=1) -> Seq Scan on version vn (cost=0.00..5884.06 rows=297306 width=38) (actual time=0.017..215.285 rows=297306loops=1) -> Index Scan using i_plus_compound_id on plus p (cost=0.00..4.51 rows=4 width=26) (actual time=0.039..0.041rows=1 loops=7) Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id)) Total runtime: 4344.222 ms (17 rows) If I only query the VERSION table, it's very fast: x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id -> from plus p -> join sample s on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) -> join version vn on (s.version_id = vn.version_id) -> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price; Sort (cost=45.73..45.74 rows=1 width=19) (actual time=32.438..32.448 rows=10 loops=1) Sort Key: p.price Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..45.72 rows=1 width=19) (actual time=32.309..32.411 rows=10 loops=1) -> Nested Loop (cost=0.00..36.58 rows=2 width=20) (actual time=32.295..32.319 rows=7 loops=1) -> Index Scan using i_version_isosmiles on version vn (cost=0.00..8.39 rows=1 width=4) (actual time=32.280..32.281rows=1 loops=1) Index Cond: (isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text) -> Index Scan using i_sample_version_id on sample s (cost=0.00..28.12 rows=6 width=20) (actual time=0.011..0.024rows=7 loops=1) Index Cond: (s.version_id = vn.version_id) -> Index Scan using i_plus_compound_id on plus p (cost=0.00..4.51 rows=4 width=26) (actual time=0.010..0.011rows=1 loops=7) Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id)) Total runtime: 32.528 ms (12 rows) Same good performance if I only query the PARENT table: x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, pn.parent_id from plus p join sample s on (p.compound_id= s.compound_id and p.supplier_id = s.supplier_id) join parent pn on (s.parent_id = pn.parent_id) where pn.isosmiles= 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=57.73..57.74 rows=1 width=19) (actual time=43.564..43.564 rows=10 loops=1) Sort Key: p.price Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..57.72 rows=1 width=19) (actual time=43.429..43.537 rows=10 loops=1) -> Nested Loop (cost=0.00..48.58 rows=2 width=20) (actual time=43.407..43.430 rows=7 loops=1) -> Index Scan using i_parent_isosmiles on parent pn (cost=0.00..8.38 rows=1 width=4) (actual time=27.342..27.343rows=1 loops=1) Index Cond: (isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1'::text) -> Index Scan using i_sample_parent_id on sample s (cost=0.00..40.09 rows=9 width=20) (actual time=16.057..16.070rows=7 loops=1) Index Cond: (s.parent_id = pn.parent_id) -> Index Scan using i_plus_compound_id on plus p (cost=0.00..4.51 rows=4 width=26) (actual time=0.010..0.011rows=1 loops=7) Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id = s.compound_id)) Total runtime: 43.628 ms x=> \d version Table "x.version" Column | Type | Modifiers ------------+---------+----------- version_id | integer | not null parent_id | integer | not null isosmiles | text | not null coord_2d | text | Indexes: "version_pkey" PRIMARY KEY, btree (version_id) "i_version_isosmiles" UNIQUE, btree (isosmiles) "i_version_parent_id" btree (parent_id) Foreign-key constraints: "fk_parent" FOREIGN KEY (parent_id) REFERENCES parent(parent_id) ON DELETE CASCADE x=> \d parent Table "x.parent" Column | Type | Modifiers -----------+---------+----------- parent_id | integer | not null isosmiles | text | not null coord_2d | text | Indexes: "parent_pkey" PRIMARY KEY, btree (parent_id) "i_parent_isosmiles" UNIQUE, btree (isosmiles) => \d sample Table "reaxys.sample" Column | Type | Modifiers --------------------+---------+----------------------------------------------------- sample_id | integer | not null default nextval('sample_id_seq'::regclass) sample_id_src | integer | parent_id | integer | not null version_id | integer | not null supplier_id | integer | not null catalogue_id | integer | not null catalogue_issue_id | integer | not null load_id | integer | not null load_file_id | integer | compound_id | text | not null cas_number | text | purity | text | chemical_name | text | url | text | price_code | text | comment | text | salt_comment | text | Indexes: "sample_pkey" PRIMARY KEY, btree (sample_id) "i_sample_casno" btree (cas_number) "i_sample_catalogue_id" btree (catalogue_id) "i_sample_catalogue_issue_id" btree (catalogue_issue_id) "i_sample_chem_name" btree (chemical_name) "i_sample_compound_id" btree (compound_id) "i_sample_load_id" btree (load_id) "i_sample_parent_id" btree (parent_id) "i_sample_sample_id_src" btree (sample_id_src) "i_sample_supplier_id" btree (supplier_id) "i_sample_version_id" btree (version_id) Foreign-key constraints: "fk_item" FOREIGN KEY (version_id) REFERENCES version(version_id) ON DELETE CASCADE
On Thu, Aug 5, 2010 at 2:34 PM, Craig James <craig_james@emolecules.com> wrote: > => explain analyze select p.price, p.amount, p.units, s.catalogue_id, > vn.version_id > -> from plus p join sample s > -> on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) > -> join version vn on (s.version_id = vn.version_id) join parent pn > -> on (s.parent_id = pn.parent_id) > -> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' > -> or pn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' > -> order by price; Well, you can't evaluate the WHERE clause here until you've joined {s vn pn}. > If I only query the VERSION table, it's very fast: > > x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, > vn.version_id > -> from plus p > -> join sample s on (p.compound_id = s.compound_id and p.supplier_id = > s.supplier_id) > -> join version vn on (s.version_id = vn.version_id) > -> where vn.isosmiles = 'Fc1ncccc1B1OC(C)(C)C(C)(C)O1' order by price; But here you can push the WHERE clause all the way down to the vn table, and evaluate it right at the get go, which is pretty much exactly what is happening. In the first case, you have to join all 297,306 vn rows against s, because they could be interesting if the other half of the WHERE clause turns out to hold. In the second case, you can throw away 297,305 of those 297,306 rows before doing anything else, because there's no possibility that they can ever be interesting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company