Thread: Adding more space, and a vacuum question.
Hello. We have two problems (which may actually be related...)
1. We are running at over 90% capacity of the disk at one of the servers - a report/data warehouse system. We have ran out of disk space several times. Now we need to make some file-archived data available on the database to support our legal team. This means two huge tables to be added to the database. The only solution that I see is to add more space by means of another tablespace. The two tables are static - after loading them and creating indexes they will not be changed.
The machine has no additional room for internal disks. It is a recent purchase and not likely to be replaced any time soon. Now, my position is that the best solution would be to add an external hard disk, via USB/firewire, and use it for the archive tables. My sysadmin, on the other hand, wants to mount a storage machine remotely and use it for the extra tablespace, as the storage machine is a more reliable hardware. I think that remote mounted volumes are not a proper device for a database, as the network is subject to load and I've ran into frozen mounts in both NFS and SMB in the past. Never mind being slower.
Which solution would you advise and which one of us is right?
2. That database has a few really huge tables. I think they are not being automatically vacuumed properly. In the past few days I've noticed a vacuum process on one of them which has been running since January 14th. Unfortunately, it never finished, because we were informed of a scheduled power down in our building yesterday, and had to shut down the machine. The questions are:
a. Is it normal for vacuum processes to take two weeks?
b. What happens if the vacuum process is stopped? Are the tuples partially recovered, or are they only recovered if the process completes properly?
c. Is there anything I can do to make vacuums shorter?
d. After restarting the server, all the data in pg_stat_user_tables seem to have been reset. What does this mean and how does this affect vacuum scheduling?
Thank you in advance,
Herouth
On 01/29/2011 05:12 AM, Herouth Maoz wrote: > The machine has no additional room for internal disks. It is a recent > purchase and not likely to be replaced any time soon. Newly acquired or not, it sounds like it isn't sized correctly for the load and needs an upgrade if it can't be shifted into a more suitable role and replaced. > Now, my position > is that the best solution would be to add an external hard disk, via > USB/firewire eSATA? Via a PCI or PCIe add-in SATA controller if there's no existing eSATA. FireWire is usable for a database. USB is too ... kind of. Performance will be poor because of the high latency, CPU-heavy non-DMA access done by the USB stack. For something read-only, that might be OK. > and use it for the archive tables. My sysadmin, on the > other hand, wants to mount a storage machine remotely and use it for the > extra tablespace, as the storage machine is a more reliable hardware. If you have iSCSI or ATA-over-Ethernet disk volumes you can mount, that might be a good idea. I'd personally avoid NFS or SMB. That said, again if it's read-only you might be fine. > think that remote mounted volumes are not a proper device for a > database, as the network is subject to load and I've ran into frozen > mounts in both NFS and SMB in the past. Never mind being slower. Remote *file* *level* shares aren't great for databases, IMO. > a. Is it normal for vacuum processes to take two weeks? Define "really huge" and describe the hardware; without numbers it's hard to know. What version of Pg are you using? Was it a standalone VACUUM or was it an autovacuum worker? > b. What happens if the vacuum process is stopped? Are the tuples > partially recovered, or are they only recovered if the process completes > properly? I *think* tuples become available progressively, but I'm not certain of that. > c. Is there anything I can do to make vacuums shorter? Do it much more often. Use Pg 8.4 or later, with visibility map. -- Craig Ringer
בתאריך 29/01/11 13:57, ציטוט Craig Ringer: > On 01/29/2011 05:12 AM, Herouth Maoz wrote: > >> The machine has no additional room for internal disks. It is a recent >> purchase and not likely to be replaced any time soon. > > Newly acquired or not, it sounds like it isn't sized correctly for the > load and needs an upgrade if it can't be shifted into a more suitable > role and replaced. Sigh. Budget considerations, you know. > >> Now, my position >> is that the best solution would be to add an external hard disk, via >> USB/firewire > > eSATA? Via a PCI or PCIe add-in SATA controller if there's no existing > eSATA. Oh, yes, I forgot about eSATA. I meant basically a real local connection rather than network one. > > FireWire is usable for a database. USB is too ... kind of. Performance > will be poor because of the high latency, CPU-heavy non-DMA access > done by the USB stack. > > For something read-only, that might be OK. > >> and use it for the archive tables. My sysadmin, on the >> other hand, wants to mount a storage machine remotely and use it for the >> extra tablespace, as the storage machine is a more reliable hardware. > > If you have iSCSI or ATA-over-Ethernet disk volumes you can mount, > that might be a good idea. I'd personally avoid NFS or SMB. OK. > > That said, again if it's read-only you might be fine. Question is - if the read-only tablespace gets stuck/frozen, what happens to the read-write part of the database, which is absolutely essential to have in good responsive working order? >> a. Is it normal for vacuum processes to take two weeks? > > Define "really huge" and describe the hardware; without numbers it's > hard to know. What version of Pg are you using? Pg 8.3.11. The tables have more than 200,000,000 records. About the hardware, I'm not entirely in the loop, but it has two dual-core Intel Xeon 5130 CPUs, 4G of memory, and its system disk (111G) is separate from the database disk (825G). The disks are hardware RAID, but I'm not sure which level, and I think they are 10,000 RPM but I could be wrong. > > Was it a standalone VACUUM or was it an autovacuum worker? Autovacuum worker. TIA, Herouth
On 28 Jan 2011, at 22:12, Herouth Maoz wrote:2. That database has a few really huge tables. I think they are not being automatically vacuumed properly. In the past few days I've noticed a vacuum process on one of them which has been running since January 14th. Unfortunately, it never finished, because we were informed of a scheduled power down in our building yesterday, and had to shut down the machine. The questions are:a. Is it normal for vacuum processes to take two weeks?For a 200M record table that's definitely on the long side. It was probably waiting on a lock by another transaction. In most cases that means that some transaction was kept open for that duration.If that transaction came into existence by accident, then vacuum should be fine now that the server has restarted - that transaction is gone now. You may want to keep an eye out for long-running transactions though, that's usually a programming error - it's sometimes done deliberately, but it's still a bad idea from the point of the database.
In older PG versions autovacuum could get stuck like that on large tables. It keeps starting over trying to vacuum that same table, but never reaches the end of it. Since it's only a single worker process (in those versions), it also will never vacuum any tables beyond the table it got stuck on.
If you don't delete or update tuples a lot, then the tables are probably just that big. If you do delete/update them regularly, try if a normal vacuum will shrink them enough (probably not) and if not, schedule a VACUUM FULL and a REINDEX at some time the database isn't too busy. Both are quite heavy operations that take exclusive locks on things (tables, indices).
d. After restarting the server, all the data in pg_stat_user_tables seem to have been reset. What does this mean and how does this affect vacuum scheduling?I recall reading somewhere that that's normal; probably this list. It probably contained an explanation too.
On Sun, Jan 30, 2011 at 04:56:29PM +0200, Herouth Maoz wrote: > > Unless my eyes were deceiving me, this was not the case. Sure, there have been heavy transactions during that time (e.g.the daily backup of the database, and the daily inserts into other tables, which take a long time, and a few selectswhich I haven't been able to find an optimal index for). But this is the query I use to see these processes (ran froma superuser): > > SELECT usename, procpid, query_start, client_addr, client_port, current_query,waiting > FROM pg_stat_activity > WHERE query_start < now() - interval '3 seconds' > AND xact_start is not null order by xact_start > > Any long transactions should be caught by it, but most of the time, all I see are vacuum workers. Well, what's your I/O on the disk? Have you tuned vacuum? Maybe you're just saturating the ability of the table to be vacuumed, or else vacuum is being told to back off? > Yes, I do delete many tuples from that table. My mode of usage is > like this: I have a small table called billing which receives new > data every night. I want to keep that table small so that those > nightly updates don't take an overly long time, because all data > (several such tables) has to be ready in the database by the next > morning. Therefore, once a week on the weekend, I move a week's > worth of data to billing__archive (the table we are discussing), and > delete a week's worth from its end. Now, the indexes on that table > would make this impossible to do within the weekend, so what I do is > drop all the indexes before I do the inserts, and then recreate > them, and then do the deletes. Without looking at the details of your database, I have to say that the above sounds to me like more work than letting the system handle this itself. I have a suspicion that what you really want to do is trickle out the changes rather than trying to do things in big batches this way. > If a vacuum takes me several days (let alone over a week!) than a > VACUUM FULL is out of the question. VACUUM FULL locks the table > completely and that table is essential to our customer care. If push > comes to shove, I think I'd rather dump that table, drop it, and > restore it over the weekend, which I believe will be faster than a > VACUUM FULL. Yes, I think so too. And I bet at the current state of affairs, that's a good bet. Whatever the situation, I suspect things are too bad off to be worth trying to get through a vacuum with. > One other important question: a tuple marked by VACUUM as reusable > (not VACUUM FULL which restores it to the operating system) - can > its space ever be used by another table, or can it only be used for > new inserts into the same table? It's managed by postgres, but given your churn rate on these tables I'd be tempted to set a fillfactor with a lot of room, and let the tables be "big" (i.e. with a lot of empty space) so that their long term storage footprint is stable. A -- Andrew Sullivan ajs@crankycanuck.ca
OK, so you're pre-8.4 , which means you have the max_fsm settings to play with. Have you seen any messages in the logs about the free space map (fsm)? If your install didn't have a big enough fsm to keep track of deleted tuples, you'd face massive table bloat that a regular vacuum couldn't fix.
You also don't have the visibility map, which means that (auto)vacuum can't skip bits of the tables it knows don't need vacuuming. Your vacuums will be slower.
Autovacuum improved significantly in both 8.4 and 9.0; consider an upgrade.
On 01/31/2011 12:14 AM, Herouth Maoz wrote: > > On 30/01/2011, at 12:27, Craig Ringer wrote: >> >> OK, so you're pre-8.4 , which means you have the max_fsm settings to >> play with. Have you seen any messages in the logs about the free space >> map (fsm)? If your install didn't have a big enough fsm to keep track >> of deleted tuples, you'd face massive table bloat that a regular >> vacuum couldn't fix. > > Ouch. You're absolutely right. There are messages about max_fsm_pages in > the postgres log. It's currently set to 153600. According to the > documentation, I can increase it up to 200000. Will that even help? How > do I find out how many I need to set it to? I think the logs suggest what to set. I haven't used 8.3 in ages and don't remember well. Increasing it won't help after the fact. You almost certainly have badly bloated tables. Fixing that will be interesting in your current low-disk-space situation. VACUUM FULL would work - but will exclusively lock the table being vacuumed for *ages*, so nothing else can do any work, not even reads. CLUSTER will do the same, and while it's much faster, to work it requires enough free disk space to store a complete copy of the still-valid parts of the table while the bloated original is still on disk. You may have to look into some of the lockless fake vacuum full approaches. I think table bloat identification and management is one of the worst problems PostgreSQL has remaining. It's too hard, out of the box, to discover bloat developing, and it's too disruptive to fix it if and when it does happen. The automatic free space map management in 8.4, and the ongoing autovacuum improvements, help reduce the chances of bloat happening, but it's still a pain to monitor for and a pain to fix when it does happen. For approaches to possibly fixing your problem, see: http://www.depesz.com/index.php/2010/10/17/reduce-bloat-of-table-without-longexclusive-locks/ http://blog.endpoint.com/2010/09/reducing-bloat-without-locking.html -- Craig Ringer
On 30 Jan 2011, at 17:14, Herouth Maoz wrote: > On 30/01/2011, at 12:27, Craig Ringer wrote: >> >> OK, so you're pre-8.4 , which means you have the max_fsm settings to play with. Have you seen any messages in the logsabout the free space map (fsm)? If your install didn't have a big enough fsm to keep track of deleted tuples, you'd facemassive table bloat that a regular vacuum couldn't fix. > > Ouch. You're absolutely right. There are messages about max_fsm_pages in the postgres log. It's currently set to 153600.According to the documentation, I can increase it up to 200000. Will that even help? How do I find out how many Ineed to set it to? > >> >> You also don't have the visibility map, which means that (auto)vacuum can't skip bits of the tables it knows don't needvacuuming. Your vacuums will be slower. >> >> Autovacuum improved significantly in both 8.4 and 9.0; consider an upgrade. > > I will consider it. Thank you. I'm not trying to push an upgrade on you, but if restoring a dump is one of your better options to get rid of table bloatanyway, now seems a good time. What I'd do is: - Find a test system with enough disk space - Install PG 8.4 on it - Create a dump of your database, using this 8.4 version of pg_dump - Restore it to the test version and check everything's alright - If it is, pull the plug on that 8.3 database - you'll probably want to create a fresh dump here if the system is stillin production - Install an 8.4 and use the previously made dump to restore it - Don't forget to run analyse right after if it's going to be used right away You will want to check the release notes. One of the things that sometimes bites people is that 8.4 has tighter type-conversionrestrictions. Some implicit type-casts that used to work don't anymore, unless made explicit. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d466a8511732033268635!
For approaches to possibly fixing your problem, see:
http://www.depesz.com/index.php/2010/10/17/reduce-bloat-of-table-without-longexclusive-locks/
http://blog.endpoint.com/2010/09/reducing-bloat-without-locking.html