Thread: non-WAL btree?
Guys, I was wondering if there is a btree indexing implementation that is not WAL-logged. I'm loading data in bulks, and index logging is an unnecessary overhead for me (easier to rebuild on crash). Thanks! best regards, Alex Vinogradovs
On Fri, 01 Aug 2008 12:41:12 -0700 Alex Vinogradovs <AVinogradovs@Clearpathnet.com> wrote: > Guys, > > I was wondering if there is a btree indexing implementation that > is not WAL-logged. I'm loading data in bulks, and index logging > is an unnecessary overhead for me (easier to rebuild on crash). Drop the index during load? > > Thanks! > > best regards, > Alex Vinogradovs > -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
By loading in bulks, I mean I load some 40-50 thousand rows at once into a table that already has some millions. Index rebuild on that table after each 50k inserts will be even less efficient ;) Alex. On Fri, 2008-08-01 at 12:57 -0700, Joshua Drake wrote: > On Fri, 01 Aug 2008 12:41:12 -0700 > Alex Vinogradovs <AVinogradovs@Clearpathnet.com> wrote: > > > Guys, > > > > I was wondering if there is a btree indexing implementation that > > is not WAL-logged. I'm loading data in bulks, and index logging > > is an unnecessary overhead for me (easier to rebuild on crash). > > Drop the index during load? > > > > > Thanks! > > > > best regards, > > Alex Vinogradovs > > > >
On Fri, Aug 1, 2008 at 4:07 PM, Alex Vinogradovs <AVinogradovs@clearpathnet.com> wrote: > By loading in bulks, I mean I load some 40-50 thousand > rows at once into a table that already has some millions. > Index rebuild on that table after each 50k inserts will > be even less efficient ;) How many indexes do you have on this...? I do this pretty regularly (actually, I do 4k batches with COPY, 4-10 concurrent batches every 10 seconds, for 2-3 days at a time) and, having testing dropping indices, nothing to do with the index has a particularly strong performance impact. That said, a significant increase in checkpoint segments was required to get good performance out of the above use case. If you haven't tried that, I'd say that's a good place to start. What makes you think it's specifically index WAL work, though? -- - David T. Wilson david.t.wilson@gmail.com
On Fri, 01 Aug 2008 13:07:18 -0700 Alex Vinogradovs <AVinogradovs@Clearpathnet.com> wrote: > By loading in bulks, I mean I load some 40-50 thousand > rows at once into a table that already has some millions. > Index rebuild on that table after each 50k inserts will > be even less efficient ;) Fair enough, to answer your question, "no". We don't have any non wal table tables or index types. Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Fri, Aug 1, 2008 at 4:16 PM, David Wilson <david.t.wilson@gmail.com> wrote: > On Fri, Aug 1, 2008 at 4:07 PM, Alex Vinogradovs > <AVinogradovs@clearpathnet.com> wrote: >> By loading in bulks, I mean I load some 40-50 thousand >> rows at once into a table that already has some millions. >> Index rebuild on that table after each 50k inserts will >> be even less efficient ;) > > How many indexes do you have on this...? I do this pretty regularly > (actually, I do 4k batches with COPY, 4-10 concurrent batches every 10 > seconds, for 2-3 days at a time) and, having testing dropping indices, > nothing to do with the index has a particularly strong performance > impact. Sorry, as I hit send, I realized I should clarify this: I do my bulk loads with the indexes active; I don't rebuild them (as they're necessary during the batch calculations). Dropping the indexes and using test data didn't show a significant performance improvement over leaving the indexes enabled. -- - David T. Wilson david.t.wilson@gmail.com
Isn't hash indexing implementation non-WAL ? Alex. On Fri, 2008-08-01 at 13:16 -0700, Joshua Drake wrote: > On Fri, 01 Aug 2008 13:07:18 -0700 > Alex Vinogradovs <AVinogradovs@Clearpathnet.com> wrote: > > > By loading in bulks, I mean I load some 40-50 thousand > > rows at once into a table that already has some millions. > > Index rebuild on that table after each 50k inserts will > > be even less efficient ;) > > Fair enough, to answer your question, "no". We don't have any non wal > table tables or index types. > > Sincerely, > > Joshua D. Drake
It's not that I expect a lot of improvement by having non-WAL indexing, it just sounds logical to me to have that, since index can be re-created fast enough during recovery, and it would reduce my IO to some extent. Alex. > Sorry, as I hit send, I realized I should clarify this: I do my bulk > loads with the indexes active; I don't rebuild them (as they're > necessary during the batch calculations). Dropping the indexes and > using test data didn't show a significant performance improvement over > leaving the indexes enabled. >
On 4:36 pm 08/01/08 Alex Vinogradovs <AVinogradovs@Clearpathnet.com> wrote: > It's not that I expect a lot of improvement by having non-WAL > indexing Have you tried using a fill factor less than 90%? That is on my list of things to test, but have not done yet.. In particular you need to find a balance where the speed gained justifies the increase in size of the index.. and the potential slowdowns because of larger indexes.
On Fri, Aug 1, 2008 at 3:32 PM, Alex Vinogradovs <AVinogradovs@clearpathnet.com> wrote: > Isn't hash indexing implementation non-WAL ? > yes, but that's because no one thinks is worth the effort of making them WAL logged while they keep slower than btree... -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Guayaquil - Ecuador Cel. (593) 87171157
On Fri, Aug 1, 2008 at 3:36 PM, Alex Vinogradovs <AVinogradovs@clearpathnet.com> wrote: > It's not that I expect a lot of improvement by having non-WAL > indexing, it just sounds logical to me to have that, since > index can be re-created fast enough during recovery, and why you think that? if they are non WAL logged the only way to re-create them after a recovery is with a REINDEX... dropping the index and create after the bulk is just the same, i think... -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Guayaquil - Ecuador Cel. (593) 87171157
It's all about number of repetions. If say I load my table with 50k every minute, and run reindex every minute, how long do you think it would take by end of the day, when my table (it's daily partition actually) is at maximum capacity ? And database may actually never crash, and I won't have to run reindex at all ;) Btw, SELECT INTO is also a non-WAL operation when archiving is disabled, or am I missing something ? Alex. On Fri, 2008-08-01 at 16:43 -0500, Jaime Casanova wrote: > On Fri, Aug 1, 2008 at 3:36 PM, Alex Vinogradovs > <AVinogradovs@clearpathnet.com> wrote: > > It's not that I expect a lot of improvement by having non-WAL > > indexing, it just sounds logical to me to have that, since > > index can be re-created fast enough during recovery, > > and why you think that? if they are non WAL logged the only way to > re-create them after a recovery is with a REINDEX... dropping the > index and create after the bulk is just the same, i think... >
Jaime Casanova wrote: > On Fri, Aug 1, 2008 at 3:36 PM, Alex Vinogradovs > <AVinogradovs@clearpathnet.com> wrote: >> It's not that I expect a lot of improvement by having non-WAL >> indexing, it just sounds logical to me to have that, since >> index can be re-created fast enough during recovery, > > and why you think that? if they are non WAL logged the only way to > re-create them after a recovery is with a REINDEX... dropping the > index and create after the bulk is just the same, i think... They don't all have to be non-WAL, first off; it could be optional per index. Second, non-WAL would provide a benefit in the case the OP mentioned, and the only time it would be a detriment is in the event of a fault. Reindexing of non-WAL indexes could be automatic during recovery. Non-WAL indexing is an option I would almost certainly take advantage of if it existed. -Glen
On Fri, Aug 1, 2008 at 4:49 PM, Alex Vinogradovs <AVinogradovs@clearpathnet.com> wrote: > It's all about number of repetions. If say I load my table > with 50k every minute, and run reindex every minute, how > long do you think it would take by end of the day, when > my table (it's daily partition actually) is at maximum > capacity ? And database may actually never crash, and > I won't have to run reindex at all ;) > maybe http://www.postgresql.org/docs/8.3/static/wal-async-commit.html is what you need... begin; set local synchronous_commit to off; insert... insert... ... commit; or set synchronous_commit to off; copy command; set synchronous_commit to on; -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Guayaquil - Ecuador Cel. (593) 87171157