Thread: Maximum number of tables per database and slowness
What is the current maximum number of tables per database? Also, does having more tables slow down performance in any way?
Benjamin Arai wrote: > What is the current maximum number of tables per database? Also, does > having more tables slow down performance in any way? For most cases, the answer is no. However, once you get near 6 figure table counts, pg_catalog ends up being pretty massive. The problem is that the query planner must check pg_catalog for every query to see what indexes are available, what the statistics & value distributions are, etc. in order to build the optimal plan. At some point, a really large pg_catalog can begin to bog down your system. See the performance list for a recent discussion about this very issue.
My issue actually stems from the fact that I cannot do large weekly updates on fast enough to meet a weekend window for the following work week. I am currently using a machine with a raid 1, 4GB RAM, and dual opteron. I could go 0+1 but peroformance increase is only about 20% from the benchmarks I have read. ANY suggestions would be greatly appreciated.
On Sat, Dec 10, 2005 at 03:22:47PM -0800, Benjamin Arai wrote: > My issue actually stems from the fact that I cannot do large weekly > updates on fast enough to meet a weekend window for the following work > week. I am currently using a machine with a raid 1, 4GB RAM, and dual > opteron. I could go 0+1 but peroformance increase is only about 20% > from the benchmarks I have read. ANY suggestions would be greatly > appreciated. Since you have given us no information to work with you're not going to get any specific recommendations either. If you can tell is the things you do most commonly and are most performance sensetive (exact commands, timings, versions and settings please) we may be able to help. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
To be more specific, there are two types of commands that are run on the system. There are application commands that do all different types of joins and etc but for the most part are fast enough to meet user expectations. On the other hand there is a weekly update (This is the problem) that updates all of the modified records for a bunch of finacial data such as closes and etc. For the most part they are records of the type name,date,value. The update currently takes almost two days. The update does deletions, insertion, and updates depending on what has happened from the previous week. For the most part the updates are simple one liners. I currently commit in large batch to increase performance but it still takes a while as stated above. From evaluating the computers performance during an update, the system is thrashing both memory and disk. I am currently using Postgresql 8.0.3. Example command "UPDATE data where name=x and date=y;". Martijn van Oosterhout wrote: >On Sat, Dec 10, 2005 at 03:22:47PM -0800, Benjamin Arai wrote: > > >>My issue actually stems from the fact that I cannot do large weekly >>updates on fast enough to meet a weekend window for the following work >>week. I am currently using a machine with a raid 1, 4GB RAM, and dual >>opteron. I could go 0+1 but peroformance increase is only about 20% >>from the benchmarks I have read. ANY suggestions would be greatly >>appreciated. >> >> > >Since you have given us no information to work with you're not going to >get any specific recommendations either. If you can tell is the things >you do most commonly and are most performance sensetive (exact >commands, timings, versions and settings please) we may be able to >help. > >
On 12/10/05, Benjamin Arai <barai@cs.ucr.edu> wrote: > To be more specific, there are two types of commands that are run on > the system. There are application commands that do all different types > of joins and etc but for the most part are fast enough to meet user > expectations. On the other hand there is a weekly update (This is the > problem) that updates all of the modified records for a bunch of > finacial data such as closes and etc. For the most part they are > records of the type name,date,value. The update currently takes almost > two days. The update does deletions, insertion, and updates depending > on what has happened from the previous week. > > For the most part the updates are simple one liners. I currently commit > in large batch to increase performance but it still takes a while as > stated above. From evaluating the computers performance during an > update, the system is thrashing both memory and disk. I am currently > using Postgresql 8.0.3. > > Example command "UPDATE data where name=x and date=y;". > Try using VACUUM or VACUUM FULL after those weekly updates... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
On Saturday 10 December 2005 19:28, Jaime Casanova wrote: > On 12/10/05, Benjamin Arai <barai@cs.ucr.edu> wrote: > > To be more specific, there are two types of commands that are run on > > the system. There are application commands that do all different types > > of joins and etc but for the most part are fast enough to meet user > > expectations. On the other hand there is a weekly update (This is the > > problem) that updates all of the modified records for a bunch of > > finacial data such as closes and etc. For the most part they are > > records of the type name,date,value. The update currently takes almost > > two days. The update does deletions, insertion, and updates depending > > on what has happened from the previous week. > > > > For the most part the updates are simple one liners. I currently commit > > in large batch to increase performance but it still takes a while as > > stated above. From evaluating the computers performance during an > > update, the system is thrashing both memory and disk. I am currently > > using Postgresql 8.0.3. > > > > Example command "UPDATE data where name=x and date=y;". > > Try using VACUUM or VACUUM FULL after those weekly updates... Once you've updated a majority of the rows in a large tables, if your still doing further work on that table that involves complex conditionals, you'll probably notice a slow down, so you might even think about doing a vacuum durring the updates if you can swing it. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
William Yu <wyu@talisys.com> writes: > Benjamin Arai wrote: >> What is the current maximum number of tables per database? Also, does >> having more tables slow down performance in any way? > For most cases, the answer is no. However, once you get near 6 figure > table counts, pg_catalog ends up being pretty massive. You also have to think about the performance implications of having tens of thousands of files in your database directory. While some newer filesystems aren't fazed by that particularly, a lot of 'em bog down on lookups when there are more than a few thousand entries in a directory. regards, tom lane
Benjamin Arai wrote: > For the most part the updates are simple one liners. I currently commit > in large batch to increase performance but it still takes a while as > stated above. From evaluating the computers performance during an > update, the system is thrashing both memory and disk. I am currently > using Postgresql 8.0.3. > > Example command "UPDATE data where name=x and date=y;". Before you start throwing the baby out with the bathwater by totally revamping your DB architecture, try some simple debugging first to see why these queries take a long time. Use explain analyze, test vacuuming/analyzing mid-updates, fiddle with postgresql.conf parameters (the wal/commit settings especially). Try using using commit w/ different amounts of transactions -- the optimal # won't be the same across all development tools. My own experience is that periodic vacuuming & analyzing are very much needed for batches of small update commands. For our batch processing, autovacuum plus 1K-10K commit batches did the trick in keeping performance up.
Please start new threads to ask unrelated questions, rather than replying to an existing thread. This makes the archives less useful, and may keep people from seeing your question. On Sat, Dec 10, 2005 at 15:37:01 -0800, Benjamin Arai <barai@cs.ucr.edu> wrote: > To be more specific, there are two types of commands that are run on > the system. There are application commands that do all different types > of joins and etc but for the most part are fast enough to meet user > expectations. On the other hand there is a weekly update (This is the > problem) that updates all of the modified records for a bunch of > finacial data such as closes and etc. For the most part they are > records of the type name,date,value. The update currently takes almost > two days. The update does deletions, insertion, and updates depending > on what has happened from the previous week. > > For the most part the updates are simple one liners. I currently commit > in large batch to increase performance but it still takes a while as > stated above. From evaluating the computers performance during an > update, the system is thrashing both memory and disk. I am currently > using Postgresql 8.0.3. > > Example command "UPDATE data where name=x and date=y;". Another possibility is that foreign key checks may be slowing things down. You may need to create some indexes on referencing tables if you are doing deletes (or updates which change the key) in referenced tables. Finding out more about which queries are slow and what plans they are using is a good first step.
On Dec 10, 2005, at 6:37 PM, Benjamin Arai wrote: > For the most part the updates are simple one liners. I currently > commit in large batch to increase performance but it still takes a > while as stated above. From evaluating the computers performance > during an update, the system is thrashing both memory and disk. I > am currently using Postgresql 8.0.3. Then buy faster disks. My current favorite is to use U320 15kRPM disks using a dual-chanel RAID controller with 1/2 the disks on one channel and 1/2 on the other and mirroring them across channels, then striping down the mirrors (ie, RAID10). I use no fewer than 6 disks (RAID 10) for data and 2 for pg_log in a RAID1.