Thread: Postgres INSERT performance and scalability
Let's say that I want to do INSERT SELECT of 1,000 items into a table. The table has some ints, varchars, TEXT and BLOB fields.
Would the time that it takes, differ a great deal, depending on whether the table has only 100,000 or 5,000,000 records?
Thanks
i
On Mon, Sep 19, 2011 at 4:11 PM, Igor Chudov <ichudov@gmail.com> wrote: > Let's say that I want to do INSERT SELECT of 1,000 items into a table. The > table has some ints, varchars, TEXT and BLOB fields. > Would the time that it takes, differ a great deal, depending on whether the > table has only 100,000 or 5,000,000 records? Depends. Got any indexes? The more indexes you have to update the slower it's gonna be. You can test this, it's easy to create test rows like so: insert into test select generate_series(1,10000); etc. There's lots of examples floating around on how to do that. So, make yourself a couple of tables and test it.
Igor, * Igor Chudov (ichudov@gmail.com) wrote: > Would the time that it takes, differ a great deal, depending on whether the > table has only 100,000 or 5,000,000 records? Yes, because PostgreSQL is going to copy the data. If you don't need or want it to be copied, just use a view. I've never heard of any relational database implementing 'copy on write' type semantics, if that's what you're asking about. Databases, unlike applications with code in memory that's constantly copied, are typically focused around minimizing duplication of data (since it all has to end up on disk at some point). Not much point in having the overhead of COW for that kind of environment, I wouldn't think. Thanks, Stephen
Attachment
On Mon, Sep 19, 2011 at 6:32 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Depends. Got any indexes? The more indexes you have to update theOn Mon, Sep 19, 2011 at 4:11 PM, Igor Chudov <ichudov@gmail.com> wrote:
> Let's say that I want to do INSERT SELECT of 1,000 items into a table. The
> table has some ints, varchars, TEXT and BLOB fields.
> Would the time that it takes, differ a great deal, depending on whether the
> table has only 100,000 or 5,000,000 records?
slower it's gonna be. You can test this, it's easy to create test
rows like so:
insert into test select generate_series(1,10000);
etc. There's lots of examples floating around on how to do that. So,
make yourself a couple of tables and test it.
Well, my question is, rather, whether the time to do a bulk INSERT of N records into a large table, would take substantially longer than a bulk insert of N records into a small table. In other words, does the populating time grow as the table gets more and more rows?
i
* Igor Chudov (ichudov@gmail.com) wrote: > Well, my question is, rather, whether the time to do a bulk INSERT of N > records into a large table, would take substantially longer than a bulk > insert of N records into a small table. In other words, does the populating > time grow as the table gets more and more rows? Oh, in that regard, the answer would generally be 'no'. PostgreSQL maintains a table known as the 'free space map', where it keeps track of where there is 'free space' to insert data into a table. As someone else mentioned, if there's a lot of indexes then it's possible that the increased depth in the index due to the larger number of tuples might mean the larger table is slower, but I don't think it'd make a huge difference, to be honest... Are you seeing that behavior? There's nothing like testing it to see exactly what happens, of course.. Thanks, Stephen
Attachment
On Mon, Sep 19, 2011 at 7:53 PM, Stephen Frost <sfrost@snowman.net> wrote: > Igor, > > * Igor Chudov (ichudov@gmail.com) wrote: >> Would the time that it takes, differ a great deal, depending on whether the >> table has only 100,000 or 5,000,000 records? > > Yes, because PostgreSQL is going to copy the data. If you don't need or > want it to be copied, just use a view. I've never heard of any > relational database implementing 'copy on write' type semantics, if > that's what you're asking about. Databases, unlike applications with > code in memory that's constantly copied, are typically focused around > minimizing duplication of data (since it all has to end up on disk at > some point). Not much point in having the overhead of COW for that kind > of environment, I wouldn't think. Isn't the WAL basically COW? -- Jon
* Jon Nelson (jnelson+pgsql@jamponi.net) wrote: > Isn't the WAL basically COW? eh..? No.. The WAL is used to record what changes are made to the various files in the database, it certainly isn't an kind of "copy-on-write" system, where we wait until a change is made to data before copying it.. If you INSERT .. SELECT, you're going to get the real data in the WAL, and also in the heap of the new table.. Thanks, Stephen
Attachment
On 09/20/2011 09:21 AM, Jon Nelson wrote: > Isn't the WAL basically COW? Nope, it's a lot more like a filesystem journal - though it includes all data, not just metadata like filesystem journals usually do. Now, you could argue that PostgreSQL uses a copy-on-write like system to maintain row versions so that already-running statements (or SERIALIZABLE transactions) don't see data from the future and to maintain rollback data for uncommitted transactions. It's the *new* data that gets written to the WAL, though, not the old data. (OK, if you have full_page_writes enabled you might get a mix of old and new data written to WAL, but that's an implementation detail). -- Craig Ringer