Thread: Postgres INSERT performance and scalability

Postgres INSERT performance and scalability

From
Igor Chudov
Date:
Let's say that I want to do INSERT SELECT of 1,000 items into a table. The table has some ints, varchars, TEXT and BLOB fields. 

Would the time that it takes, differ a great deal, depending on whether the table has only 100,000 or 5,000,000 records?

Thanks

i

Re: Postgres INSERT performance and scalability

From
Scott Marlowe
Date:
On Mon, Sep 19, 2011 at 4:11 PM, Igor Chudov <ichudov@gmail.com> wrote:
> Let's say that I want to do INSERT SELECT of 1,000 items into a table. The
> table has some ints, varchars, TEXT and BLOB fields.
> Would the time that it takes, differ a great deal, depending on whether the
> table has only 100,000 or 5,000,000 records?

Depends.  Got any indexes?  The more indexes you have to update the
slower it's gonna be.  You can test this, it's easy to create test
rows like so:

insert into test select generate_series(1,10000);

etc.  There's lots of examples floating around on how to do that.  So,
make yourself a couple of tables and test it.

Re: Postgres INSERT performance and scalability

From
Stephen Frost
Date:
Igor,

* Igor Chudov (ichudov@gmail.com) wrote:
> Would the time that it takes, differ a great deal, depending on whether the
> table has only 100,000 or 5,000,000 records?

Yes, because PostgreSQL is going to copy the data.  If you don't need or
want it to be copied, just use a view.  I've never heard of any
relational database implementing 'copy on write' type semantics, if
that's what you're asking about.  Databases, unlike applications with
code in memory that's constantly copied, are typically focused around
minimizing duplication of data (since it all has to end up on disk at
some point).  Not much point in having the overhead of COW for that kind
of environment, I wouldn't think.

    Thanks,

        Stephen

Attachment

Re: Postgres INSERT performance and scalability

From
Igor Chudov
Date:


On Mon, Sep 19, 2011 at 6:32 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Sep 19, 2011 at 4:11 PM, Igor Chudov <ichudov@gmail.com> wrote:
> Let's say that I want to do INSERT SELECT of 1,000 items into a table. The
> table has some ints, varchars, TEXT and BLOB fields.
> Would the time that it takes, differ a great deal, depending on whether the
> table has only 100,000 or 5,000,000 records?

Depends.  Got any indexes?  The more indexes you have to update the
slower it's gonna be.  You can test this, it's easy to create test
rows like so:

insert into test select generate_series(1,10000);

etc.  There's lots of examples floating around on how to do that.  So,
make yourself a couple of tables and test it.

Well, my question is, rather, whether the time to do a bulk INSERT of N records into a large table, would take substantially longer than a bulk insert of N records into a small table. In other words, does the populating time grow as the table gets more and more rows?

i

Re: Postgres INSERT performance and scalability

From
Stephen Frost
Date:
* Igor Chudov (ichudov@gmail.com) wrote:
> Well, my question is, rather, whether the time to do a bulk INSERT of N
> records into a large table, would take substantially longer than a bulk
> insert of N records into a small table. In other words, does the populating
> time grow as the table gets more and more rows?

Oh, in that regard, the answer would generally be 'no'.  PostgreSQL
maintains a table known as the 'free space map', where it keeps track of
where there is 'free space' to insert data into a table.  As someone
else mentioned, if there's a lot of indexes then it's possible that the
increased depth in the index due to the larger number of tuples might
mean the larger table is slower, but I don't think it'd make a huge
difference, to be honest...

Are you seeing that behavior?  There's nothing like testing it to see
exactly what happens, of course..

    Thanks,

        Stephen

Attachment

Re: Postgres INSERT performance and scalability

From
Jon Nelson
Date:
On Mon, Sep 19, 2011 at 7:53 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Igor,
>
> * Igor Chudov (ichudov@gmail.com) wrote:
>> Would the time that it takes, differ a great deal, depending on whether the
>> table has only 100,000 or 5,000,000 records?
>
> Yes, because PostgreSQL is going to copy the data.  If you don't need or
> want it to be copied, just use a view.  I've never heard of any
> relational database implementing 'copy on write' type semantics, if
> that's what you're asking about.  Databases, unlike applications with
> code in memory that's constantly copied, are typically focused around
> minimizing duplication of data (since it all has to end up on disk at
> some point).  Not much point in having the overhead of COW for that kind
> of environment, I wouldn't think.

Isn't the WAL basically COW?

--
Jon

Re: Postgres INSERT performance and scalability

From
Stephen Frost
Date:
* Jon Nelson (jnelson+pgsql@jamponi.net) wrote:
> Isn't the WAL basically COW?

eh..?  No..  The WAL is used to record what changes are made to the
various files in the database, it certainly isn't an kind of
"copy-on-write" system, where we wait until a change is made to data
before copying it..

If you INSERT .. SELECT, you're going to get the real data in the WAL,
and also in the heap of the new table..

    Thanks,

        Stephen

Attachment

Re: Postgres INSERT performance and scalability

From
Craig Ringer
Date:
On 09/20/2011 09:21 AM, Jon Nelson wrote:
> Isn't the WAL basically COW?

Nope, it's a lot more like a filesystem journal - though it includes all
data, not just metadata like filesystem journals usually do.

Now, you could argue that PostgreSQL uses a copy-on-write like system to
maintain row versions so that already-running statements (or
SERIALIZABLE transactions) don't see data from the future and to
maintain rollback data for uncommitted transactions. It's the *new* data
that gets written to the WAL, though, not the old data.

(OK, if you have full_page_writes enabled you might get a mix of old and
new data written to WAL, but that's an implementation detail).

--
Craig Ringer