Thread: postgres 8.4, COPY, and high concurrency
I was working on a data warehousing project where a fair number of files could be COPY'd more or less directly into tables. I have a somewhat nice machine to work with, and I ran on 75% of the cores I have (75% of 32 is 24).
Performance was pretty bad. With 24 processes going, each backend (in COPY) spent 98% of it's time in semop (as identified by strace). I tried larger and smaller shared buffers, all sorts of other tweaks, until I tried reducing the number of concurrent processes from 24 to 4.
Disk I/O went up (on average) at least 10X and strace reports that the top system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty reasonable IMO.
Given that each COPY is into it's own, newly-made table with no indices or foreign keys, etc, I would have expected the interaction among the backends to be minimal, but that doesn't appear to be the case. What is the likely cause of the semops?
I can't really try a newer version of postgres at this time (perhaps soon).
I'm using PG 8.4.13 on ScientificLinux 6.2 (x86_64), and the CPU is a 32 core Xeon E5-2680 @ 2.7 GHz.
--
Jon
Performance was pretty bad. With 24 processes going, each backend (in COPY) spent 98% of it's time in semop (as identified by strace). I tried larger and smaller shared buffers, all sorts of other tweaks, until I tried reducing the number of concurrent processes from 24 to 4.
Disk I/O went up (on average) at least 10X and strace reports that the top system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty reasonable IMO.
Given that each COPY is into it's own, newly-made table with no indices or foreign keys, etc, I would have expected the interaction among the backends to be minimal, but that doesn't appear to be the case. What is the likely cause of the semops?
I can't really try a newer version of postgres at this time (perhaps soon).
I'm using PG 8.4.13 on ScientificLinux 6.2 (x86_64), and the CPU is a 32 core Xeon E5-2680 @ 2.7 GHz.
--
Jon
On 13.11.2012 21:13, Jon Nelson wrote: > I was working on a data warehousing project where a fair number of files > could be COPY'd more or less directly into tables. I have a somewhat nice > machine to work with, and I ran on 75% of the cores I have (75% of 32 is > 24). > > Performance was pretty bad. With 24 processes going, each backend (in COPY) > spent 98% of it's time in semop (as identified by strace). I tried larger > and smaller shared buffers, all sorts of other tweaks, until I tried > reducing the number of concurrent processes from 24 to 4. > > Disk I/O went up (on average) at least 10X and strace reports that the top > system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty > reasonable IMO. > > Given that each COPY is into it's own, newly-made table with no indices or > foreign keys, etc, I would have expected the interaction among the backends > to be minimal, but that doesn't appear to be the case. What is the likely > cause of the semops? I'd guess it's lock contention on WALInsertLock. That means, the system is experiencing lock contention on generating WAL records for the insertions. If that theory is correct, you ought to get a big gain if you have wal_level=minimal, and you create or truncate the table in the same transaction with the COPY. That allows the system to skip WAL-logging the COPY. Or you could upgrade to 9.2. The WAL-logging of bulk COPY was optimized in 9.2, it should help precisely the scenario you're facing. - Heikki
On Tue, Nov 13, 2012 at 11:13 AM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > I was working on a data warehousing project where a fair number of files > could be COPY'd more or less directly into tables. I have a somewhat nice > machine to work with, and I ran on 75% of the cores I have (75% of 32 is > 24). > > Performance was pretty bad. With 24 processes going, each backend (in COPY) > spent 98% of it's time in semop (as identified by strace). They are probably fighting over the right to insert records into the WAL stream. This has been improved in 9.2 > Given that each COPY is into it's own, newly-made table with no indices or > foreign keys, etc, I would have expected the interaction among the backends > to be minimal, but that doesn't appear to be the case. On newer versions if you set wal_level to minimal and archive_mode to off, then these operations would bypass WAL entirely. I can't figure out if there is a corresponding optimization in 8.4, though. Cheers, Jeff
On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
wal_level doesn't exist for 8.4, but I have archive_mode = "off" and I am creating the table in the same transaction as the COPY.
Unfortunately, that's what I was expecting.
On 13.11.2012 21:13, Jon Nelson wrote:I'd guess it's lock contention on WALInsertLock. That means, the system is experiencing lock contention on generating WAL records for the insertions. If that theory is correct, you ought to get a big gain if you have wal_level=minimal, and you create or truncate the table in the same transaction with the COPY. That allows the system to skip WAL-logging the COPY.I was working on a data warehousing project where a fair number of files
could be COPY'd more or less directly into tables. I have a somewhat nice
machine to work with, and I ran on 75% of the cores I have (75% of 32 is
24).
Performance was pretty bad. With 24 processes going, each backend (in COPY)
spent 98% of it's time in semop (as identified by strace). I tried larger
and smaller shared buffers, all sorts of other tweaks, until I tried
reducing the number of concurrent processes from 24 to 4.
Disk I/O went up (on average) at least 10X and strace reports that the top
system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty
reasonable IMO.
Given that each COPY is into it's own, newly-made table with no indices or
foreign keys, etc, I would have expected the interaction among the backends
to be minimal, but that doesn't appear to be the case. What is the likely
cause of the semops?
wal_level doesn't exist for 8.4, but I have archive_mode = "off" and I am creating the table in the same transaction as the COPY.
Or you could upgrade to 9.2. The WAL-logging of bulk COPY was optimized in 9.2, it should help precisely the scenario you're facing.
Unfortunately, that's what I was expecting.
--
Jon
On Tue, Nov 13, 2012 at 12:03 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> >> On 13.11.2012 21:13, Jon Nelson wrote: >>> >> >> I'd guess it's lock contention on WALInsertLock. That means, the system is >> experiencing lock contention on generating WAL records for the insertions. >> If that theory is correct, you ought to get a big gain if you have >> wal_level=minimal, and you create or truncate the table in the same >> transaction with the COPY. That allows the system to skip WAL-logging the >> COPY. > > > wal_level doesn't exist for 8.4, but I have archive_mode = "off" and I am > creating the table in the same transaction as the COPY. That should work to bypass WAL. Can you directly verify whether you are generating lots of WAL (look at the churn in pg_xlog) during those loads? Maybe your contention is someplace else. Since they must all be using different tables, I don't think it would be the relation extension lock. Maybe buffer mapping lock or freelist lock? Cheers, Jeff
On Tue, Nov 13, 2012 at 2:43 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Nov 13, 2012 at 12:03 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: >> On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas >> <hlinnakangas@vmware.com> wrote: >>> >>> On 13.11.2012 21:13, Jon Nelson wrote: >>>> >>> >>> I'd guess it's lock contention on WALInsertLock. That means, the system is >>> experiencing lock contention on generating WAL records for the insertions. >>> If that theory is correct, you ought to get a big gain if you have >>> wal_level=minimal, and you create or truncate the table in the same >>> transaction with the COPY. That allows the system to skip WAL-logging the >>> COPY. >> >> >> wal_level doesn't exist for 8.4, but I have archive_mode = "off" and I am >> creating the table in the same transaction as the COPY. > > > That should work to bypass WAL. Can you directly verify whether you > are generating lots of WAL (look at the churn in pg_xlog) during those > loads? > > Maybe your contention is someplace else. Since they must all be using > different tables, I don't think it would be the relation extension > lock. Maybe buffer mapping lock or freelist lock? I had moved on to a different approach to importing the data which does not work concurrently. However, I went back and tried to re-create the situation and - at least a naive attempt failed. I'll give it a few more tries -- I was creating two tables using CREATE TABLE <unique name> LIKE (some other table INCLUDING <everything>). Then I would copy the data in, add some constraints (FK constraints but only within these two tables) and then finally (for each table) issue an ALTER TABLE <unique name> INHERIT <some other table>. To be clear, however, everything bogged down in the COPY stage which was immediately following the table creation. I'll note that my naive test showed almost no unexpected overhead at all, so it's clearly not representative of the problem I encountered. -- Jon
On Tue, Nov 13, 2012 at 7:10 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > I had moved on to a different approach to importing the data which > does not work concurrently. However, I went back and tried to > re-create the situation and - at least a naive attempt failed. I'll > give it a few more tries -- I was creating two tables using CREATE > TABLE <unique name> LIKE (some other table INCLUDING <everything>). > Then I would copy the data in, add some constraints (FK constraints > but only within these two tables) and then finally (for each table) > issue an ALTER TABLE <unique name> INHERIT <some other table>. To be > clear, however, everything bogged down in the COPY stage which was > immediately following the table creation. > > I'll note that my naive test showed almost no unexpected overhead at > all, so it's clearly not representative of the problem I encountered. I'm still unable to replicate the problem, but I can show I wasn't crazy, either. The average time to perform one of these COPY operations when things are working is in the 15-45 second range. I had configured PG to log any statement that look longer than 3 seconds, so I got a bunch of those in the logs. I have since reconfigured to log *everything*. Anyway, when things were going badly, COPY would take anywhere from 814 seconds to just under 1400 seconds for the exact same files. UPDATE: I have been able to replicate the issue. The parent table (the one referenced in the LIKE portion of the CREATE TABLE statement) had three indices. Now that I've been able to replicate the issue, are there tests that I can perform that would be useful to people? I will also try to build a stand-alone test. -- Jon
On Wed, Nov 14, 2012 at 6:41 AM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > > UPDATE: I have been able to replicate the issue. The parent table (the > one referenced in the LIKE portion of the CREATE TABLE statement) had > three indices. > > Now that I've been able to replicate the issue, are there tests that I > can perform that would be useful to people? > I will also try to build a stand-alone test. While the WAL is suppressed for the table inserts, it is not suppressed for the index inserts, and the index WAL traffic is enough to lead to contention. I don't know why that is the case, it seems like the same method that allows us to bypass WAL for the table would work for the indices as well. Maybe it is just that no one bothered to implement it. After all, building the index after the copy will be even more efficient than building it before but by-passing WAL. But it does seem like the docs could at least be clarified here. Cheers, Jeff
On Wed, Nov 14, 2012 at 1:01 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, Nov 14, 2012 at 6:41 AM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: >> >> UPDATE: I have been able to replicate the issue. The parent table (the >> one referenced in the LIKE portion of the CREATE TABLE statement) had >> three indices. >> >> Now that I've been able to replicate the issue, are there tests that I >> can perform that would be useful to people? >> I will also try to build a stand-alone test. > > While the WAL is suppressed for the table inserts, it is not > suppressed for the index inserts, and the index WAL traffic is enough > to lead to contention. Aha! > I don't know why that is the case, it seems like the same method that > allows us to bypass WAL for the table would work for the indices as > well. Maybe it is just that no one bothered to implement it. After > all, building the index after the copy will be even more efficient > than building it before but by-passing WAL. > But it does seem like the docs could at least be clarified here. In general, then, would it be safe to say that concurrent (parallel) index creation may be a source of significant WAL contention? I was planning on taking advantage of this due to modern, beefy boxes with 10's of CPUs all just sitting there bored. -- Jon
On Wed, Nov 14, 2012 at 12:04 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > On Wed, Nov 14, 2012 at 1:01 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> >> While the WAL is suppressed for the table inserts, it is not >> suppressed for the index inserts, and the index WAL traffic is enough >> to lead to contention. > > Aha! > >> I don't know why that is the case, it seems like the same method that >> allows us to bypass WAL for the table would work for the indices as >> well. Maybe it is just that no one bothered to implement it. After >> all, building the index after the copy will be even more efficient >> than building it before but by-passing WAL. > >> But it does seem like the docs could at least be clarified here. > > In general, then, would it be safe to say that concurrent (parallel) > index creation may be a source of significant WAL contention? No, that shouldn't lead to WAL contention. The creation of an index on an already-populated table bypasses most WAL when you are not using archiving. It is the maintenance of an already existing index that generates WAL. "begin; truncate; copy; create index" generates little WAL. "begin; truncate; create index; copy" generates a lot of WAL, and is slower for other reason as well. Cheers, Jeff
If you are inserting a lot of data into the same table, table extension locks are a problem, and will be extended in only8k increments which if you have a lot of clients hitting/expanding the same table you are going to have a lot of overhead. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Jeff Janes Sent: Wednesday, November 14, 2012 3:26 PM To: Jon Nelson Cc: Heikki Linnakangas; pgsql-performance@postgresql.org Subject: Re: [PERFORM] postgres 8.4, COPY, and high concurrency On Wed, Nov 14, 2012 at 12:04 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > On Wed, Nov 14, 2012 at 1:01 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> >> While the WAL is suppressed for the table inserts, it is not >> suppressed for the index inserts, and the index WAL traffic is enough >> to lead to contention. > > Aha! > >> I don't know why that is the case, it seems like the same method that >> allows us to bypass WAL for the table would work for the indices as >> well. Maybe it is just that no one bothered to implement it. After >> all, building the index after the copy will be even more efficient >> than building it before but by-passing WAL. > >> But it does seem like the docs could at least be clarified here. > > In general, then, would it be safe to say that concurrent (parallel) > index creation may be a source of significant WAL contention? No, that shouldn't lead to WAL contention. The creation of an index on an already-populated table bypasses most WAL whenyou are not using archiving. It is the maintenance of an already existing index that generates WAL. "begin; truncate; copy; create index" generates little WAL. "begin; truncate; create index; copy" generates a lot of WAL, and is slower for other reason as well. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.