Thread: Big copy slowdown
I'm not sure if this is the right forum for this question, so I'll start here and escalate as necessary. One of the things I do a lot is copy large tables from one database to another. I'd started life using pgdump, but this requires I save the whole table into an intermediate file (or play games with pipes). Rather than doing either, I instead wrote a quick script which creates a cursor at the source, and starts schlepping data from one to other. For "small" tables (say, a million rows or less), this works just fine, but when I try to copy a bigger table (tens of millions of rows), the performance continually drops off, with the copy becoming slower and slower. On 8.3-beta1, I see the memory utilization skyrocketing, specifically the non-shared residential space of the postmaster daemon, driving the whole system deep into swap (sooner or later), and bringing things to a screeching halt. But on 8.1.4, I don't see memory utilization problem- the memory utilization of a daemon pegs out at just a little larger than the shared buffers size (i.e. what I'd expect)- but I still see the slowdown. It is definately the copy that is the problem- I've tried a variation of my script with the copy commented out (just throwing the data away) to test if it's the cursor- and I get 20K+ rows/second.from the cursor. I've also tried breaking the copy up- finishing the current copy command and starting a new one every million rows or so, and that doesn't help. I've also fiddled with transactions, and rate-limiting the copy, and neither of those help. Is this a bug in postgres? If not, as I'm assume, what should I be doing to make this work fast? Brian
Brian Hurt <bhurt@janestcapital.com> writes: > Is this a bug in postgres? Well, if you'd provide enough info for someone else to reproduce it, we could have a look. regards, tom lane
Tom Lane wrote:
Hello- I just wanted to close the book on this problem. I first noticed the problem when long copies kept slowing down. It turns out it wasn't a problem with Postgresql at all, as I recreated it with bonnie++ (which explains the long silence on this issue). After doing a lot of sustained I/O, we see I/O wait times climb until we're getting virtually no I/O performance at all, and it doesn't matter if it's Postgresql or bonnie++ doing the I/O. iostat would report that we'd be doing only 2MB/sec, and we'd be seeing 90%+ iowait percentages in atop or top. So our solution is to fix out I/O subsystem. We're replacing the expensive TLA SAN storage device with a low end Fibre Channel raid (which we were going to do anyways, as even at it's best, the TLA SAN wasn't impressive), upgrading from an old 2.4 linux kernel to a newer 2.6 kernel, changing the I/O Scheduler to deadline, and upgrading the server hardware. Some combination of the above solves the problem.
Hopefully no one has been lying awake worrying about this problem :-), but I wanted to close it out, and to leave a permanent record in the archives for the next person who has this problem.
Brian
Brian Hurt <bhurt@janestcapital.com> writes:Is this a bug in postgres?Well, if you'd provide enough info for someone else to reproduce it, we could have a look.
Hello- I just wanted to close the book on this problem. I first noticed the problem when long copies kept slowing down. It turns out it wasn't a problem with Postgresql at all, as I recreated it with bonnie++ (which explains the long silence on this issue). After doing a lot of sustained I/O, we see I/O wait times climb until we're getting virtually no I/O performance at all, and it doesn't matter if it's Postgresql or bonnie++ doing the I/O. iostat would report that we'd be doing only 2MB/sec, and we'd be seeing 90%+ iowait percentages in atop or top. So our solution is to fix out I/O subsystem. We're replacing the expensive TLA SAN storage device with a low end Fibre Channel raid (which we were going to do anyways, as even at it's best, the TLA SAN wasn't impressive), upgrading from an old 2.4 linux kernel to a newer 2.6 kernel, changing the I/O Scheduler to deadline, and upgrading the server hardware. Some combination of the above solves the problem.
Hopefully no one has been lying awake worrying about this problem :-), but I wanted to close it out, and to leave a permanent record in the archives for the next person who has this problem.
Brian