Thread: RE: [GENERAL] Benchmarks
In his very insightful post last week, Mike Mascari pointed out that, on tables with heavy insert/updates, it was much faster to drop the index, vacuum analyze, and then rebuild the index. Maybe in vacuum there is a specific inefficiency in what Mike coined "defragment"ing indexes. [Snip] 8. Not running VACUUM - PostgreSQL won't use indexes, or won't optimize correctly unless the record count and dispersion estimates are up-to-date. People have reported problems with running vacuum while under heavy load. We haven't seen it, but we run vacuum each night at 4:05 a.m. However, if you perform a LARGE number of INSERTS/UPDATES, it is better for you to do the following: DROP INDEX index_on_heavilty_used_table; VACUUM ANALYZE; CREATE INDEX index_on_heavily_used_table; Because VACUUM will sit there, and, row by row, essentially "defragment" your indexes, which can take damn near forever for any number of updates or deletes greater than, say, 30,000 rows. [Snip] -----Original Message----- From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] Sent: Thursday, January 06, 2000 10:14 AM To: Dustin Sallings Cc: The Hermit Hacker; pgsql-general@hub.org Subject: Re: [GENERAL] Benchmarks > Untrue, vacuum is *extremely* important for updating statistics. > If you have a lot of data in a table, and you have never vacuumed, you > might as well not have any indices. It'd be nice if you could seperate > the stat update from the storage reclaim. Actually, it'd be nice if you > could reuse storage, so that an actual vacuum wouldn't be necessary unless > you just wanted to free up disk space you might end up using again anyway. > > The vacuum also doesn't seem to be very efficient. In one of my > databases, a vacuum could take in excess of 24 hours, while I've written a > small SQL script that does a select rename and a insert into select from > that will do the same job in about ten minutes. This is a database that > cannot lock for more than a few minutes. This is serious. Why would an INSERT / RENAME be so much faster. Are we that bad with VACUUM? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ************
On Thu, 6 Jan 2000, Culberson, Philip wrote: This is a considerable amount faster. I never thought about the indices getting hit here. Thanks a lot. # In his very insightful post last week, Mike Mascari pointed out that, on # tables with heavy insert/updates, it was much faster to drop the index, # vacuum analyze, and then rebuild the index. Maybe in vacuum there is a # specific inefficiency in what Mike coined "defragment"ing indexes. # # [Snip] # # 8. Not running VACUUM - PostgreSQL won't use indexes, or won't optimize # correctly unless the record count and dispersion estimates are up-to-date. # People have reported problems with running vacuum while under heavy load. We # haven't seen it, but we run vacuum each night at 4:05 a.m. However, if you # perform a LARGE number of INSERTS/UPDATES, it is better for you to do the # following: # # DROP INDEX index_on_heavilty_used_table; # VACUUM ANALYZE; # CREATE INDEX index_on_heavily_used_table; # # Because VACUUM will sit there, and, row by row, essentially "defragment" # your indexes, which can take damn near forever for any number of updates or # deletes greater than, say, 30,000 rows. # # [Snip] # # -----Original Message----- # From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] # Sent: Thursday, January 06, 2000 10:14 AM # To: Dustin Sallings # Cc: The Hermit Hacker; pgsql-general@hub.org # Subject: Re: [GENERAL] Benchmarks # # # > Untrue, vacuum is *extremely* important for updating statistics. # > If you have a lot of data in a table, and you have never vacuumed, you # > might as well not have any indices. It'd be nice if you could seperate # > the stat update from the storage reclaim. Actually, it'd be nice if you # > could reuse storage, so that an actual vacuum wouldn't be necessary unless # > you just wanted to free up disk space you might end up using again anyway. # > # > The vacuum also doesn't seem to be very efficient. In one of my # > databases, a vacuum could take in excess of 24 hours, while I've written a # > small SQL script that does a select rename and a insert into select from # > that will do the same job in about ten minutes. This is a database that # > cannot lock for more than a few minutes. # # This is serious. Why would an INSERT / RENAME be so much faster. Are # we that bad with VACUUM? # # -- # Bruce Momjian | http://www.op.net/~candle # maillist@candle.pha.pa.us | (610) 853-3000 # + If your life is a hard drive, | 830 Blythe Avenue # + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 # # ************ # # -- SA, beyond.com My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
> Subsidiary question : why is mySQL excluded from RDBMS > comparison on postgress www site ? Maybe because it's much faster ;) Sure, MySQL doesn't support transactions, rollbacks, ... Maybe this question will sound a bit naive, but why doesn't we have the choice to send queries to PostgreSQL as transactional or not ? It's probably not meaningful to say that a single query is transactional or not, but what about a global parameter at the server level ? Forgive me again for the naivety of this question, this may mean to have two completely different engines. And it would have been already done if possible ... I've compared both engines and MySQL is much faster. However I'll need transaction to ensure reliability for the database updates. I've thought at using PostgreSQL for updates, and MySQL for select, the database being dumped from PostgreSQL and reloaded into MySQL every night. Probably with specific queries and scripts rather than a dump to get a MySQL-compliant dump file. Has anyone an experience about a similar solution ? Alain
On Fri, 7 Jan 2000, Alain TESIO wrote: > I've compared both engines and MySQL is much faster. Alain, have you run postgreSQL with the -F option and compared them then? I'd venture to say that if you did so, you'ld find PostgreSQL a bit more desireable on the speed factor. Just remember, when you do so, you loose some recovery possibilities -- as you're relying on the OS to sync data from memory to disc. ---------------------------------------------------------------- Chad Walstrom mailto:chewie@wookimus.net a.k.a ^chewie, gunnarr http://wookimus.net/~chewie Gnupg = B4AB D627 9CBD 687E 7A31 1950 0CC7 0B18 206C 5AFD ----------------------------------------------------------------
Alain TESIO wrote: > > Subsidiary question : why is mySQL excluded from RDBMS > > comparison on postgress www site ? I believe it's fairly stated: mySQL is not an RDBMS, so it is not listed. A similar question: Why is Filemaker excluded? FoxPro? They all have a speedy, flat file, structure, glued together in such a way that relational-like interactions are available. > Maybe because it's much faster ;) The speed of not having some relational feature that you desparately need, but is not availabe, is approximately zero. :-) I like it for simple structures, simple db's, but as soon as you're trying to manage 30-40 tables, mySQL starts to get in its own way. For running SQL to a flat db? It's much better than PostgreSQL. 200 Tables? Don't even think about it, get an RDBMS. > Sure, MySQL doesn't support transactions, rollbacks, ... Foreign keys... > I've compared both engines and MySQL is much faster. Yup. Different design goals. > However I'll need transaction to ensure reliability > for the database updates. I've thought at using PostgreSQL > for updates, and MySQL for select, the database being > dumped from PostgreSQL and reloaded into MySQL every > night. Probably with specific queries and scripts rather > than a dump to get a MySQL-compliant dump file. > Has anyone an experience about a similar solution ? We are in the midst of nightly dumps of mySQL -> PostgreSQL, in order to gradually migrate to a more robust solution... however, it seems like your proposed idea _could_ work, with some decent scripting. Fast for simple scans, too... basically, you'd need text dumps, which are then re-wrapped to mySQL-friendly insert (or whatever) statements. HTH, -Bop
> Alain, have you run postgreSQL with the -F option and compared them then? > I'd venture to say that if you did so, you'ld find PostgreSQL a bit more > desireable on the speed factor. Just remember, when you do so, you loose > some recovery possibilities -- as you're relying on the OS to sync data > from memory to disc. Hello, I've often experienced crashes with the -F option, the script which fills my database can never run completely. If any developper is interested, I have a dump file (about 2 Mo gzipped, directly produced from pg_dump) which is crashing on a create index with the option -F set, and is working fine without it. My configuration is : Linux RedHat 6.0, Kernel 2.2-15, PostgreSQL 6.5.3, 64 Mo RAM. Alain
"Alain TESIO" <tesio@easynet.fr> writes: > I've often experienced crashes with the -F option, the script which fills my > database can never run completely. If any developper is interested, I have > a dump file (about 2 Mo gzipped, directly produced from pg_dump) which > is crashing on a create index with the option -F set, and is working fine > without it. Really!? Wow, I can hardly guess what's causing that, but it sure deserves looking into. Send me your dump, or give me an FTP or web pointer if that's easier. regards, tom lane