Thread: inserting, index and no index - speed
I just noticed that inserting 10000 tuples in an indexed table took exactly the same amount of time as inserting 10000 tuples in a non-indexed table (194 seconds). Why is this? The difference in MySQL is about 50% longer in an indexed table. Thanks Daniel Akerud.
zilch@home.se writes: > I just noticed that inserting 10000 tuples in an indexed table took exactly > the same amount of time as inserting 10000 tuples in a non-indexed table > (194 seconds). Why is this? The difference in MySQL is about 50% longer in > an indexed table. Surprises me too. Which PG version, and what are the test conditions exactly? (Table and index declarations; is table empty initially; how is backend being driven, and what commands are issued exactly? How many shared buffers, platform, etc) Under PG 7.1, it's possible that your test caused no actual I/O except to the WAL log ... but I'd still think that the volume of WAL I/O would be greater when writing an index. regards, tom lane
The test script that set up the tables is the following: --- /* Cleanup */ DROP SEQUENCE index_with_id_seq; DROP SEQUENCE index_without_id_seq; DROP INDEX name_index; DROP TABLE index_with; DROP TABLE index_without; /* Create a table with an index */ CREATE TABLE index_with ( id SERIAL, name TEXT ); CREATE INDEX name_index ON index_with(name); /* Create a table without an index */ CREATE TABLE index_without ( id SERIAL, name TEXT ); --- This is run just before it is tested, then I have this little C++ program that inserts N rows into the tables, and meassures how long it takes. A DELETE * FROM table (both tables) followed by a VACCUUM is also run before each test run (which consists of regular INSERT statements). Do I do anything wrong? The postmaster (7.1.2) is run with then current Debian testing/unstable standard options. Daniel Akerud > > I just noticed that inserting 10000 tuples in an indexed table took exactly > > the same amount of time as inserting 10000 tuples in a non-indexed table > > (194 seconds). Why is this? The difference in MySQL is about 50% longer in > > an indexed table. > > Surprises me too. Which PG version, and what are the test conditions > exactly? (Table and index declarations; is table empty initially; > how is backend being driven, and what commands are issued exactly? > How many shared buffers, platform, etc) > > Under PG 7.1, it's possible that your test caused no actual I/O except > to the WAL log ... but I'd still think that the volume of WAL I/O > would be greater when writing an index. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Yes, actually... forgot to say that... VACUUM & VACUUM ANALYZE before each test run... Thanks Daniel Akerud > Did you VACUUM ANALYZE as well, after you created the tables/indexes? > > ----- Original Message ----- > From: <zilch@home.se> > To: <pgsql-general@postgresql.org> > Sent: Sunday, June 10, 2001 2:15 PM > Subject: Re: [GENERAL] inserting, index and no index - speed > > > > > > The test script that set up the tables is the following: > > > > --- > > > > /* Cleanup */ > > > > DROP SEQUENCE index_with_id_seq; > > DROP SEQUENCE index_without_id_seq; > > > > DROP INDEX name_index; > > > > DROP TABLE index_with; > > DROP TABLE index_without; > > > > /* Create a table with an index */ > > > > CREATE TABLE index_with ( > > > > id SERIAL, > > name TEXT > > > > ); > > > > CREATE INDEX name_index ON index_with(name); > > > > /* Create a table without an index */ > > > > CREATE TABLE index_without ( > > > > id SERIAL, > > name TEXT > > > > ); > > > > --- > > > > This is run just before it is tested, > > then I have this little C++ program that inserts N rows into the tables, > and > > meassures how long it takes. > > > > A DELETE * FROM table (both tables) followed by a VACCUUM is also run > > before each test run (which consists of regular INSERT statements). > > > > Do I do anything wrong? > > > > The postmaster (7.1.2) is run with then current Debian testing/unstable > > standard options. > > > > Daniel Akerud > > > > > > I just noticed that inserting 10000 tuples in an indexed table took > exactly > > > > the same amount of time as inserting 10000 tuples in a non-indexed > table > > > > (194 seconds). Why is this? The difference in MySQL is about 50% > longer in > > > > an indexed table. > > > > > > Surprises me too. Which PG version, and what are the test conditions > > > exactly? (Table and index declarations; is table empty initially; > > > how is backend being driven, and what commands are issued exactly? > > > How many shared buffers, platform, etc) > > > > > > Under PG 7.1, it's possible that your test caused no actual I/O except > > > to the WAL log ... but I'd still think that the volume of WAL I/O > > > would be greater when writing an index. > > > > > > regards, tom lane > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://www.postgresql.org/search.mpl > > > > --- Daniel Åkerud, zilch@home.se
I just rerun the application to confirm that it was really like that. So, using the test-environment previously described i got the following output: Database vacuumed pg: Trying 1000 inserts with indexing on... Time taken: 24 seconds pg: Trying 1000 inserts with indexing off... Time taken: 22 seconds Database vacuumed pg: Trying 10000 inserts with indexing on... Time taken: 220 seconds pg: Trying 10000 inserts with indexing off... Time taken: 220 seconds Daniel Akerud > > Yes, actually... > forgot to say that... VACUUM & VACUUM ANALYZE before each test run... > > Thanks > > Daniel Akerud > > > Did you VACUUM ANALYZE as well, after you created the tables/indexes? > > > > ----- Original Message ----- > > From: <zilch@home.se> > > To: <pgsql-general@postgresql.org> > > Sent: Sunday, June 10, 2001 2:15 PM > > Subject: Re: [GENERAL] inserting, index and no index - speed > > > > > > > > > > The test script that set up the tables is the following: > > > > > > --- > > > > > > /* Cleanup */ > > > > > > DROP SEQUENCE index_with_id_seq; > > > DROP SEQUENCE index_without_id_seq; > > > > > > DROP INDEX name_index; > > > > > > DROP TABLE index_with; > > > DROP TABLE index_without; > > > > > > /* Create a table with an index */ > > > > > > CREATE TABLE index_with ( > > > > > > id SERIAL, > > > name TEXT > > > > > > ); > > > > > > CREATE INDEX name_index ON index_with(name); > > > > > > /* Create a table without an index */ > > > > > > CREATE TABLE index_without ( > > > > > > id SERIAL, > > > name TEXT > > > > > > ); > > > > > > --- > > > > > > This is run just before it is tested, > > > then I have this little C++ program that inserts N rows into the tables, > > and > > > meassures how long it takes. > > > > > > A DELETE * FROM table (both tables) followed by a VACCUUM is also run > > > before each test run (which consists of regular INSERT statements). > > > > > > Do I do anything wrong? > > > > > > The postmaster (7.1.2) is run with then current Debian testing/unstable > > > standard options. > > > > > > Daniel Akerud > > > > > > > > I just noticed that inserting 10000 tuples in an indexed table took > > exactly > > > > > the same amount of time as inserting 10000 tuples in a non-indexed > > table > > > > > (194 seconds). Why is this? The difference in MySQL is about 50% > > longer in > > > > > an indexed table. > > > > > > > > Surprises me too. Which PG version, and what are the test conditions > > > > exactly? (Table and index declarations; is table empty initially; > > > > how is backend being driven, and what commands are issued exactly? > > > > How many shared buffers, platform, etc) > > > > > > > > Under PG 7.1, it's possible that your test caused no actual I/O except > > > > to the WAL log ... but I'd still think that the volume of WAL I/O > > > > would be greater when writing an index. > > > > > > > > regards, tom lane > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 6: Have you searched our list archives? > > > > > > http://www.postgresql.org/search.mpl > > > > > > > > > > > --- > Daniel Åkerud, zilch@home.se > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > --- Daniel Åkerud, zilch@home.se
zilch@home.se writes: > then I have this little C++ program that inserts N rows into the tables, > meassures how long it takes. Which is done how, exactly? regards, tom lane
> zilch@home.se writes: > > then I have this little C++ program that inserts N rows into the tables, > > meassures how long it takes. > > Which is done how, exactly? > > regards, tom lane --- // delete from index_with // vacuum // vacuum analyze timeAnalyzer.startClock(); for ( int i = 0; i < nrIterations; i++) { sprintf(sqlStatement, "INSERT INTO index_with (name) VALUES ('%s')", data[i]); try { db->runCommand(sqlStatement); } catch(Exception e) { cout << "Exception caught: " << e.getException() << endl; } } timeAnalyzer.stopClock(); // output --- data[i] is data created with 'pwgen 8 100000 > data' Then anotherone but index_without instead. Could it be that the 10000 rows in the table makes it so much slower so that the next 10000 in the same database but different table get so much slower? Daniel Akerud
> Could it be that the 10000 rows in the table makes it so much slower so that > the next 10000 in the same database but different table get so much slower? > > Daniel Akerud It wasn't. Now the database is cleared and VACUUMed and VACUUMED ANALYZEd before both tests (table WITH and WITHOUT and index). Still same results. 195/194 seconds. Daniel Akerud
zilch@home.se writes: > CREATE TABLE index_with ( > id SERIAL, > name TEXT > ); > CREATE INDEX name_index ON index_with(name); > CREATE TABLE index_without ( > id SERIAL, > name TEXT > ); Actually, what you are comparing here is a table with two indexes to a table with one index. Moreover, both of them incur a sequence nextval() operation for each insert. So it's not two files updated versus one, it's four versus three. Also, given the small size of these tables, it's likely that most of the updates occur in in-memory disk buffers. If you are running with fsync on, nearly all the actual I/O per insert will be the write and fsync of the WAL log. The time required for that is not going to be very sensitive to the amount of data written, as long as it's much less than one disk block per transaction, which will be true in both these cases. You end up writing one block to the log per transaction anyway. You might try running the ten thousand inserts as a single transaction (do "begin" and "end" around them). It'd also be educational to try it with fsync disabled, or with id declared as plain int not serial. regards, tom lane
Thanks Tom, really appreciate it! Daniel Akerud > zilch@home.se writes: > > CREATE TABLE index_with ( > > id SERIAL, > > name TEXT > > ); > > CREATE INDEX name_index ON index_with(name); > > > CREATE TABLE index_without ( > > id SERIAL, > > name TEXT > > ); > > Actually, what you are comparing here is a table with two indexes to a > table with one index. Moreover, both of them incur a sequence nextval() > operation for each insert. So it's not two files updated versus one, > it's four versus three. > > Also, given the small size of these tables, it's likely that most of the > updates occur in in-memory disk buffers. If you are running with fsync > on, nearly all the actual I/O per insert will be the write and fsync of > the WAL log. The time required for that is not going to be very > sensitive to the amount of data written, as long as it's much less than > one disk block per transaction, which will be true in both these cases. > You end up writing one block to the log per transaction anyway. > > You might try running the ten thousand inserts as a single transaction > (do "begin" and "end" around them). It'd also be educational to try it > with fsync disabled, or with id declared as plain int not serial. > > regards, tom lane
> You might try running the ten thousand inserts as a single transaction > (do "begin" and "end" around them). A HUGE difference (also completely took away the ID field (serial) having only name): Database vacuumed pg: Trying 25000 inserts on index_with... Time taken: 12 seconds Database vacuumed pg: Trying 25000 inserts on index_without... Time taken: 12 seconds <--- MIGHT BE BACUASE OF FSYNC!? (fsyncing also for the previous run) Database vacuumed pg: Trying 30000 inserts on index_with... Time taken: 15 seconds Database vacuumed pg: Trying 30000 inserts on index_without... Time taken: 12 seconds Database vacuumed pg: Trying 35000 inserts on index_with... Time taken: 21 seconds Database vacuumed pg: Trying 35000 inserts on index_without... Time taken: 14 seconds I can't believe what a difference that made. How can it make it faster by putting it in a transaction? I thought that would make it slower. Like only a 100th of the time. Daniel Akerud
zilch@home.se writes: > I can't believe what a difference that made. How can it make it faster by > putting it in a transaction? I thought that would make it slower. Like only > a 100th of the time. Everything is always a transaction in Postgres. If you don't say begin/end, then there's an implicit begin and end around each individual query. So your first set of tests were paying transaction commit overhead for each insert. regards, tom lane
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: TL> Everything is always a transaction in Postgres. If you don't say TL> begin/end, then there's an implicit begin and end around each individual TL> query. So your first set of tests were paying transaction commit TL> overhead for each insert. This doesn't seem to hold exactly for INSERTs involving sequences as default values. Even if the insert fails for some other constraint, the sequence is incremented. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On 10 Jun 2001, Vivek Khera wrote: > >>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > TL> Everything is always a transaction in Postgres. If you don't say > TL> begin/end, then there's an implicit begin and end around each individual > TL> query. So your first set of tests were paying transaction commit > TL> overhead for each insert. > > This doesn't seem to hold exactly for INSERTs involving sequences as > default values. Even if the insert fails for some other constraint, > the sequence is incremented. No, that's exactly how it is supposed to work, to guarantee that you will never get same value from two separate calls to nextval. -alex
>>>>> "AP" == Alex Pilosov <alex@pilosoft.com> writes: TL> Everything is always a transaction in Postgres. If you don't say TL> begin/end, then there's an implicit begin and end around each individual >> >> This doesn't seem to hold exactly for INSERTs involving sequences as >> default values. Even if the insert fails for some other constraint, >> the sequence is incremented. AP> No, that's exactly how it is supposed to work, to guarantee that you will AP> never get same value from two separate calls to nextval. Even if your transaction fails? That seems to counter the definition of a transaction that aborts; the state of the database is different than before. Or am I really thinking wrongly about what an aborted transaction should leave behind?
On Sun, 10 Jun 2001, Vivek Khera wrote: > Even if your transaction fails? That seems to counter the definition > of a transaction that aborts; the state of the database is different > than before. Yes, except for the sequences. Consider this example, transactions A, B, C, sequence S. in A S.nextval = 1 in B S.nextval = 2 in C S.nextval = 3 transaction B then aborts, A and C succeed. Then, in your logic, nextval of S should be 2, but really, to keep this kind of state, you need a table listing 'currently unused values'. That, when your sequence gets to millions, is a colossal waste of space. If you want "maximum id that's not currently used in my table" use max(id), if you want "give me a non-repeating number", use sequence. There also are implications on concurrency when you use max(id), as only one transaction can do it without danger of repeating IDs. -alex