Thread: PostgreSQL slow after VACUUM
Hi all. I have the following: - Mandrake Linux 9.1 - PostgreSQL 7.3.2 MDK5 There is one DB and one DB user. The DB is cleared and loaded with the data of same volume each month (monthly report). The volume is not small and it usually takes 3 hours to load. Loading is done with SQL files which use transactions, 10,000 SQL statements per transaction. A couple of days ago, disk became full, since we were not doing VACUUM on the DB at all. So, I deleted all records from the 3 tables the DB has and performed "VACUUM FULL ANALYZE". This reclaimed the space. My problem is that the load is now taking (to my estimate) 20 times more! Anything I could do to find out what's going on? There is nothing in the logs that I can see. Nix.
From: "Nikola Milutinovic" <Nikola.Milutinovic@ev.co.yu> > There is one DB and one DB user. The DB is cleared and loaded with the > data of same volume each month (monthly report). The volume is not small > and it usually takes 3 hours to load. Loading is done with SQL files > which use transactions, 10,000 SQL statements per transaction. > > A couple of days ago, disk became full, since we were not doing VACUUM > on the DB at all. So, I deleted all records from the 3 tables the DB has > and performed "VACUUM FULL ANALYZE". This reclaimed the space. > > My problem is that the load is now taking (to my estimate) 20 times more! it sounds from your description, that you are not analyzing the table after the batch loads. a analyze on an empty table effectively destroys all statistics. gnari
Hi Nix, The problem is, that while doing the vacuum full ANALYZE the table was empty. It therefore gathered statistics of a situation which isn't there anymore when you fill up the table. In an empty or small table, it is normal to do sequential scans. Which you most of the time don't want in a large table. My suggestion is to VACUUM (FULL) the table after you've deleted the data. Then fill up the table and do a ANALYZE when you're done filling it. That way, the analysis of the data will be much more accurate. My guess is, it'll use indexes much sooner and be much faster. Have a look at the output of EXPLAIN ANALYZE yourstatement; before doing ANALYZE and after. Best regards, Arjen On 26-11-2004 7:35, Nikola Milutinovic wrote: > Hi all. > > I have the following: > > - Mandrake Linux 9.1 > - PostgreSQL 7.3.2 MDK5 > > There is one DB and one DB user. The DB is cleared and loaded with the > data of same volume each month (monthly report). The volume is not small > and it usually takes 3 hours to load. Loading is done with SQL files > which use transactions, 10,000 SQL statements per transaction. > > A couple of days ago, disk became full, since we were not doing VACUUM > on the DB at all. So, I deleted all records from the 3 tables the DB has > and performed "VACUUM FULL ANALYZE". This reclaimed the space. > > My problem is that the load is now taking (to my estimate) 20 times more! > > Anything I could do to find out what's going on? There is nothing in the > logs that I can see. > > Nix. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl> writes: > My suggestion is to VACUUM (FULL) the table after you've deleted the data. > Then fill up the table and do a ANALYZE when you're done filling it. For now you might be able to improve matters by doing an ANALYZE even while it's busy doing the import. Depending on how the import is coded it may or may not have an immediate effect. VACUUM and ANALYZE (and VACUUM ANALYZE) are fairly quick and don't block access to the tables, though they can slow it down somewhat. You probably want to be running them frequently so you don't have to resort to VACUUM FULL. -- greg
It seems this issue has been mentionned several times lately... I'd propose something to do to avoid it : * When TRUNCAT'ing a table : - the row-count is reset to 0 (of course !) - however, the column stats are kept, on the basis that the data which will be inserted later in the table will most likely have at least some correlation to the data that was in it before. Of course, VACUUM ANALYZE will update the statistics with real ones when it is run. This would make TRUNCATE behave more like DELETE FROM... * When INSERT'ing into a table : The OP's problem was that the query is planned as if the table was almost empty (hence seq scan), well it is in fact empty, but many rows are inserted. Of course, the query can't be replanned in the middle of its execution (although that'd be a nice feature to ad on the withlist for postgres version 2020)... When planning for inserts, the planner currently uses the estimated number of rows in the target table, taken from the statistics. However, at this time, the planner has an estimation of how many rows it's going to insert : - INSERT ... SELECT, or CREATE TABLE ... AS SELECT : The SELECT from which the rows will come is planned, so there should be an estimated number of rows readily available. - COPY FROM... The number of rows could be estimated by peeking at the first N rows in the file (not on COPY FROM stdin though), or an additional parameter to COPY could be give,. Then, the INSERT can be planned as if the target table contained (estimated number of rows in the target table) + (estimated number of rows to be inserted), and it will choose a good plan for populating empty tables... What do you think ? > > The problem is, that while doing the vacuum full ANALYZE the table was > empty. It therefore gathered statistics of a situation which isn't there > anymore when you fill up the table. In an empty or small table, it is > normal to do sequential scans. Which you most of the time don't want in > a large table. > My suggestion is to VACUUM (FULL) the table after you've deleted the > data. Then fill up the table and do a ANALYZE when you're done filling > it. > > That way, the analysis of the data will be much more accurate. My guess > is, it'll use indexes much sooner and be much faster. Have a look at the > output of EXPLAIN ANALYZE yourstatement; before doing ANALYZE and after. > > Best regards, > > Arjen
On Fri, Nov 26, 2004 at 02:00:48PM +0100, Pierre-Frédéric Caillaud wrote: > > It seems this issue has been mentionned several times lately... > I'd propose something to do to avoid it : > > * When TRUNCAT'ing a table : > - the row-count is reset to 0 (of course !) > - however, the column stats are kept, on the basis that the > data which will be inserted later in the table will most likely have at Currently, TRUNCATE doesn't affect the statistics, nor does it set the row count to zero. Which means that when new data is inserted it will plan as if the table were still full. So no change required here. > * When INSERT'ing into a table : > The OP's problem was that the query is planned as if the table was > almost empty (hence seq scan), well it is in fact empty, but many rows are > inserted. Of course, the query can't be replanned in the middle of its > execution (although that'd be a nice feature to ad on the withlist for > postgres version 2020)... The size of the table you are inserting to is irrelevent to the planner. All that matters is where the data is coming from. Think about it, UPDATE, DELETE and INSERT are just fancy wrappers around SELECT to do something special with the rows that are finally selected. Which you use has very little effect on the plan finally used. I guess as a special case, DELETE can optimise the fact that no data need be returned, only a list of rows... The original user's problem stemmed from the fact they were running ANALYZE on an empty table, *that* was killing the statistics. Stop doing that and the statistics will remain as if the table was full. The VACUUM will set the rowcount back to zero, but that's it. Seems PostgreSQL is already doing what you suggest anyway... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Nikola Milutinovic <Nikola.Milutinovic@ev.co.yu> writes: > - PostgreSQL 7.3.2 MDK5 > ... > A couple of days ago, disk became full, since we were not doing VACUUM > on the DB at all. So, I deleted all records from the 3 tables the DB has > and performed "VACUUM FULL ANALYZE". This reclaimed the space. The subsequent discussion pointed out that you probably shouldn't have ANALYZEd right at that point, but I didn't see anyone suggest that you should have done TRUNCATEs rather than delete all/vacuum full. The TRUNCATE way is a good deal faster, and it will also eliminate index bloat while vacuum full won't. BTW, 7.3.2 has several known serious bugs; I'd recommend an update to 7.3.8, if not upgrading to 7.4.*. regards, tom lane