Thread: inserting, index and no index - speed

inserting, index and no index - speed

From

zilch@home.se

Date:

07 June 2001, 12:06:21

I just noticed that inserting 10000 tuples in an indexed table took exactly
the same amount of time as inserting 10000 tuples in a non-indexed table
(194 seconds). Why is this? The difference in MySQL is about 50% longer in
an indexed table.

Thanks

Daniel Akerud.

Re: inserting, index and no index - speed

From

Tom Lane

Date:

08 June 2001, 14:32:50

zilch@home.se writes:
> I just noticed that inserting 10000 tuples in an indexed table took exactly
> the same amount of time as inserting 10000 tuples in a non-indexed table
> (194 seconds). Why is this? The difference in MySQL is about 50% longer in
> an indexed table.

Surprises me too.  Which PG version, and what are the test conditions
exactly?  (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)

Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.

            regards, tom lane

Re: inserting, index and no index - speed

From

zilch@home.se

Date:

10 June 2001, 14:07:18

The test script that set up the tables is the following:

---

/* Cleanup */

DROP SEQUENCE index_with_id_seq;
DROP SEQUENCE index_without_id_seq;

DROP INDEX name_index;

DROP TABLE index_with;
DROP TABLE index_without;

/* Create a table with an index */

CREATE TABLE index_with (

 id SERIAL,
 name TEXT

);

CREATE INDEX name_index ON index_with(name);

/* Create a table without an index */

CREATE TABLE index_without (

  id SERIAL,
  name TEXT

);

---

This is run just before it is tested,
then I have this little C++ program that inserts N rows into the tables, and
meassures how long it takes.

A DELETE * FROM table (both tables) followed by a VACCUUM is also run
before each test run (which consists of regular INSERT statements).

Do I do anything wrong?

The postmaster (7.1.2) is run with then current Debian testing/unstable
standard options.

Daniel Akerud

> > I just noticed that inserting 10000 tuples in an indexed table took exactly
> > the same amount of time as inserting 10000 tuples in a non-indexed table
> > (194 seconds). Why is this? The difference in MySQL is about 50% longer in
> > an indexed table.
>
> Surprises me too.  Which PG version, and what are the test conditions
> exactly?  (Table and index declarations; is table empty initially;
> how is backend being driven, and what commands are issued exactly?
> How many shared buffers, platform, etc)
>
> Under PG 7.1, it's possible that your test caused no actual I/O except
> to the WAL log ... but I'd still think that the volume of WAL I/O
> would be greater when writing an index.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

Re: inserting, index and no index - speed

From

zilch@home.se

Date:

10 June 2001, 14:19:20

Yes, actually...
forgot to say that... VACUUM & VACUUM ANALYZE before each test run...

Thanks

Daniel Akerud

> Did you VACUUM ANALYZE as well, after you created the tables/indexes?
>
> ----- Original Message -----
> From: <zilch@home.se>
> To: <pgsql-general@postgresql.org>
> Sent: Sunday, June 10, 2001 2:15 PM
> Subject: Re: [GENERAL] inserting, index and no index - speed
>
>
> >
> > The test script that set up the tables is the following:
> >
> > ---
> >
> > /* Cleanup */
> >
> > DROP SEQUENCE index_with_id_seq;
> > DROP SEQUENCE index_without_id_seq;
> >
> > DROP INDEX name_index;
> >
> > DROP TABLE index_with;
> > DROP TABLE index_without;
> >
> > /* Create a table with an index */
> >
> > CREATE TABLE index_with (
> >
> >  id SERIAL,
> >  name TEXT
> >
> > );
> >
> > CREATE INDEX name_index ON index_with(name);
> >
> > /* Create a table without an index */
> >
> > CREATE TABLE index_without (
> >
> >   id SERIAL,
> >   name TEXT
> >
> > );
> >
> > ---
> >
> > This is run just before it is tested,
> > then I have this little C++ program that inserts N rows into the tables,
> and
> > meassures how long it takes.
> >
> > A DELETE * FROM table (both tables) followed by a VACCUUM is also run
> > before each test run (which consists of regular INSERT statements).
> >
> > Do I do anything wrong?
> >
> > The postmaster (7.1.2) is run with then current Debian testing/unstable
> > standard options.
> >
> > Daniel Akerud
> >
> > > > I just noticed that inserting 10000 tuples in an indexed table took
> exactly
> > > > the same amount of time as inserting 10000 tuples in a non-indexed
> table
> > > > (194 seconds). Why is this? The difference in MySQL is about 50%
> longer in
> > > > an indexed table.
> > >
> > > Surprises me too.  Which PG version, and what are the test conditions
> > > exactly?  (Table and index declarations; is table empty initially;
> > > how is backend being driven, and what commands are issued exactly?
> > > How many shared buffers, platform, etc)
> > >
> > > Under PG 7.1, it's possible that your test caused no actual I/O except
> > > to the WAL log ... but I'd still think that the volume of WAL I/O
> > > would be greater when writing an index.
> > >
> > > regards, tom lane
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> > http://www.postgresql.org/search.mpl
> >
>
>



---
Daniel Åkerud, zilch@home.se

Re: inserting, index and no index - speed

From

zilch@home.se

Date:

10 June 2001, 15:11:23

I just rerun the application to confirm that it was really like that. So,
using the test-environment previously described i got the following output:

Database vacuumed
pg: Trying 1000 inserts with indexing on...
Time taken: 24 seconds

pg: Trying 1000 inserts with indexing off...
Time taken: 22 seconds

Database vacuumed
pg: Trying 10000 inserts with indexing on...
Time taken: 220 seconds

pg: Trying 10000 inserts with indexing off...
Time taken: 220 seconds

Daniel Akerud

>
> Yes, actually...
> forgot to say that... VACUUM & VACUUM ANALYZE before each test run...
>
> Thanks
>
> Daniel Akerud
>
> > Did you VACUUM ANALYZE as well, after you created the tables/indexes?
> >
> > ----- Original Message -----
> > From: <zilch@home.se>
> > To: <pgsql-general@postgresql.org>
> > Sent: Sunday, June 10, 2001 2:15 PM
> > Subject: Re: [GENERAL] inserting, index and no index - speed
> >
> >
> > >
> > > The test script that set up the tables is the following:
> > >
> > > ---
> > >
> > > /* Cleanup */
> > >
> > > DROP SEQUENCE index_with_id_seq;
> > > DROP SEQUENCE index_without_id_seq;
> > >
> > > DROP INDEX name_index;
> > >
> > > DROP TABLE index_with;
> > > DROP TABLE index_without;
> > >
> > > /* Create a table with an index */
> > >
> > > CREATE TABLE index_with (
> > >
> > >  id SERIAL,
> > >  name TEXT
> > >
> > > );
> > >
> > > CREATE INDEX name_index ON index_with(name);
> > >
> > > /* Create a table without an index */
> > >
> > > CREATE TABLE index_without (
> > >
> > >   id SERIAL,
> > >   name TEXT
> > >
> > > );
> > >
> > > ---
> > >
> > > This is run just before it is tested,
> > > then I have this little C++ program that inserts N rows into the tables,
> > and
> > > meassures how long it takes.
> > >
> > > A DELETE * FROM table (both tables) followed by a VACCUUM is also run
> > > before each test run (which consists of regular INSERT statements).
> > >
> > > Do I do anything wrong?
> > >
> > > The postmaster (7.1.2) is run with then current Debian testing/unstable
> > > standard options.
> > >
> > > Daniel Akerud
> > >
> > > > > I just noticed that inserting 10000 tuples in an indexed table took
> > exactly
> > > > > the same amount of time as inserting 10000 tuples in a non-indexed
> > table
> > > > > (194 seconds). Why is this? The difference in MySQL is about 50%
> > longer in
> > > > > an indexed table.
> > > >
> > > > Surprises me too.  Which PG version, and what are the test conditions
> > > > exactly?  (Table and index declarations; is table empty initially;
> > > > how is backend being driven, and what commands are issued exactly?
> > > > How many shared buffers, platform, etc)
> > > >
> > > > Under PG 7.1, it's possible that your test caused no actual I/O except
> > > > to the WAL log ... but I'd still think that the volume of WAL I/O
> > > > would be greater when writing an index.
> > > >
> > > > regards, tom lane
> > > >
> > > > ---------------------------(end of broadcast)---------------------------
> > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 6: Have you searched our list archives?
> > >
> > > http://www.postgresql.org/search.mpl
> > >
> >
> >
>
>
>
> ---
> Daniel Åkerud, zilch@home.se
>
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>



---
Daniel Åkerud, zilch@home.se

Re: inserting, index and no index - speed

From

Tom Lane

Date:

10 June 2001, 16:10:04

zilch@home.se writes:
> then I have this little C++ program that inserts N rows into the tables,
> meassures how long it takes.

Which is done how, exactly?

            regards, tom lane

Re: inserting, index and no index - speed

From

zilch@home.se

Date:

10 June 2001, 16:26:47

> zilch@home.se writes:
> > then I have this little C++ program that inserts N rows into the tables,
> > meassures how long it takes.
>
> Which is done how, exactly?
>
>             regards, tom lane

---

// delete from index_with

// vacuum

// vacuum analyze

timeAnalyzer.startClock();

for ( int i = 0; i < nrIterations; i++) {

  sprintf(sqlStatement, "INSERT INTO index_with (name) VALUES ('%s')", data[i]);

  try {

      db->runCommand(sqlStatement);

  }
  catch(Exception e) {

       cout << "Exception caught: " << e.getException() << endl;

  }

}

timeAnalyzer.stopClock();

// output

---

data[i] is data created with 'pwgen 8 100000 > data'

Then anotherone but index_without instead.

Could it be that the 10000 rows in the table makes it so much slower so that
the next 10000 in the same database but different table get so much slower?

Daniel Akerud

Re: inserting, index and no index - speed

From

zilch@home.se

Date:

10 June 2001, 16:43:07

> Could it be that the 10000 rows in the table makes it so much slower so that
> the next 10000 in the same database but different table get so much slower?
>
> Daniel Akerud

It wasn't.
Now the database is cleared and VACUUMed and VACUUMED ANALYZEd before both
tests (table WITH and WITHOUT and index). Still same results. 195/194 seconds.

Daniel Akerud

Re: inserting, index and no index - speed

From

Tom Lane

Date:

10 June 2001, 17:33:32

zilch@home.se writes:
> CREATE TABLE index_with (
>  id SERIAL,
>  name TEXT
> );
> CREATE INDEX name_index ON index_with(name);

> CREATE TABLE index_without (
>   id SERIAL,
>   name TEXT
> );

Actually, what you are comparing here is a table with two indexes to a
table with one index.  Moreover, both of them incur a sequence nextval()
operation for each insert.  So it's not two files updated versus one,
it's four versus three.

Also, given the small size of these tables, it's likely that most of the
updates occur in in-memory disk buffers.  If you are running with fsync
on, nearly all the actual I/O per insert will be the write and fsync of
the WAL log.  The time required for that is not going to be very
sensitive to the amount of data written, as long as it's much less than
one disk block per transaction, which will be true in both these cases.
You end up writing one block to the log per transaction anyway.

You might try running the ten thousand inserts as a single transaction
(do "begin" and "end" around them).  It'd also be educational to try it
with fsync disabled, or with id declared as plain int not serial.

            regards, tom lane

Re: inserting, index and no index - speed

From

zilch@home.se

Date:

10 June 2001, 17:48:22

Thanks Tom,
really appreciate it!

Daniel Akerud

> zilch@home.se writes:
> > CREATE TABLE index_with (
> >  id SERIAL,
> >  name TEXT
> > );
> > CREATE INDEX name_index ON index_with(name);
>
> > CREATE TABLE index_without (
> >   id SERIAL,
> >   name TEXT
> > );
>
> Actually, what you are comparing here is a table with two indexes to a
> table with one index.  Moreover, both of them incur a sequence nextval()
> operation for each insert.  So it's not two files updated versus one,
> it's four versus three.
>
> Also, given the small size of these tables, it's likely that most of the
> updates occur in in-memory disk buffers.  If you are running with fsync
> on, nearly all the actual I/O per insert will be the write and fsync of
> the WAL log.  The time required for that is not going to be very
> sensitive to the amount of data written, as long as it's much less than
> one disk block per transaction, which will be true in both these cases.
> You end up writing one block to the log per transaction anyway.
>
> You might try running the ten thousand inserts as a single transaction
> (do "begin" and "end" around them).  It'd also be educational to try it
> with fsync disabled, or with id declared as plain int not serial.
>
>             regards, tom lane

Re: inserting, index and no index - speed

From

zilch@home.se

Date:

10 June 2001, 18:03:17

> You might try running the ten thousand inserts as a single transaction
> (do "begin" and "end" around them).

A HUGE difference (also completely took away the ID field (serial) having
only name):

Database vacuumed
pg: Trying 25000 inserts on index_with...
Time taken: 12 seconds

Database vacuumed
pg: Trying 25000 inserts on index_without...
Time taken: 12 seconds <--- MIGHT BE BACUASE OF FSYNC!? (fsyncing also for the previous run)

Database vacuumed
pg: Trying 30000 inserts on index_with...
Time taken: 15 seconds

Database vacuumed
pg: Trying 30000 inserts on index_without...
Time taken: 12 seconds

Database vacuumed
pg: Trying 35000 inserts on index_with...
Time taken: 21 seconds

Database vacuumed
pg: Trying 35000 inserts on index_without...
Time taken: 14 seconds

I can't believe what a difference that made. How can it make it faster by
putting it in a transaction? I thought that would make it slower. Like only
a 100th of the time.

Daniel Akerud

Re: inserting, index and no index - speed

From

Tom Lane

Date:

10 June 2001, 18:41:21

zilch@home.se writes:
> I can't believe what a difference that made. How can it make it faster by
> putting it in a transaction? I thought that would make it slower. Like only
> a 100th of the time.

Everything is always a transaction in Postgres.  If you don't say
begin/end, then there's an implicit begin and end around each individual
query.  So your first set of tests were paying transaction commit
overhead for each insert.

            regards, tom lane

Re: inserting, index and no index - speed

From

Vivek Khera

Date:

10 June 2001, 22:28:59

>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> Everything is always a transaction in Postgres.  If you don't say
TL> begin/end, then there's an implicit begin and end around each individual
TL> query.  So your first set of tests were paying transaction commit
TL> overhead for each insert.

This doesn't seem to hold exactly for INSERTs involving sequences as
default values.  Even if the insert fails for some other constraint,
the sequence is incremented.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Re: inserting, index and no index - speed

From

Alex Pilosov

Date:

10 June 2001, 22:44:17

On 10 Jun 2001, Vivek Khera wrote:

> >>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> TL> Everything is always a transaction in Postgres.  If you don't say
> TL> begin/end, then there's an implicit begin and end around each individual
> TL> query.  So your first set of tests were paying transaction commit
> TL> overhead for each insert.
>
> This doesn't seem to hold exactly for INSERTs involving sequences as
> default values.  Even if the insert fails for some other constraint,
> the sequence is incremented.
No, that's exactly how it is supposed to work, to guarantee that you will
never get same value from two separate calls to nextval.

-alex

Re: Re: inserting, index and no index - speed

From

Vivek Khera

Date:

10 June 2001, 23:27:30

>>>>> "AP" == Alex Pilosov <alex@pilosoft.com> writes:

TL> Everything is always a transaction in Postgres.  If you don't say
TL> begin/end, then there's an implicit begin and end around each individual
>>
>> This doesn't seem to hold exactly for INSERTs involving sequences as
>> default values.  Even if the insert fails for some other constraint,
>> the sequence is incremented.

AP> No, that's exactly how it is supposed to work, to guarantee that you will
AP> never get same value from two separate calls to nextval.

Even if your transaction fails?  That seems to counter the definition
of a transaction that aborts; the state of the database is different
than before.

Or am I really thinking wrongly about what an aborted transaction
should leave behind?

Re: Re: inserting, index and no index - speed

From

Alex Pilosov

Date:

10 June 2001, 23:39:04

On Sun, 10 Jun 2001, Vivek Khera wrote:

> Even if your transaction fails?  That seems to counter the definition
> of a transaction that aborts; the state of the database is different
> than before.
Yes, except for the sequences.

Consider this example, transactions A, B, C, sequence S.

in A S.nextval = 1
in B S.nextval = 2
in C S.nextval = 3

transaction B then aborts, A and C succeed. Then, in your logic, nextval
of S should be 2, but really, to keep this kind of state, you need a table
listing  'currently unused values'. That, when your sequence gets to
millions, is a colossal waste of space.

If you want "maximum id that's not currently used in my table" use
max(id), if you want "give me a non-repeating number", use sequence.

There also are implications on concurrency when you use max(id), as only
one transaction can do it without danger of repeating IDs.

-alex