Thread: temporary indexes?
I couldn't find any mention of this on the archives... Have the project maintainers ever considered extending CREATE INDEX to support "temporary" indexes like CREATE TEMPORARYTABLE? When creating temporary tables for analytics/reporting, I've noticed that I often need to create (then drop) indexes on regulartables. Temporary indexes seemed like a natural fit here, so i was wondering if there was any reason why they'renot supported (other than no one wanted it!)
On 10/21/2015 11:43 AM, Jonathan Vanasco wrote: > I couldn't find any mention of this on the archives... > > Have the project maintainers ever considered extending CREATE INDEX to support "temporary" indexes like CREATE TEMPORARYTABLE? > > When creating temporary tables for analytics/reporting, I've noticed that I often need to create (then drop) indexes onregular tables. Temporary indexes seemed like a natural fit here, so i was wondering if there was any reason why they'renot supported (other than no one wanted it!) Something like this?: aklaver@test=> create temporary table temp_test(id int, fld_1 varchar); CREATE TABLE aklaver@test=> create index temp_idx on temp_test(fld_1); CREATE INDEX aklaver@test=> \d temp_test Table "pg_temp_2.temp_test" Column | Type | Modifiers --------+-------------------+----------- id | integer | fld_1 | character varying | Indexes: "temp_idx" btree (fld_1) > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, Oct 21, 2015 at 11:50 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/21/2015 11:43 AM, Jonathan Vanasco wrote:I couldn't find any mention of this on the archives...
Have the project maintainers ever considered extending CREATE INDEX to support "temporary" indexes like CREATE TEMPORARY TABLE?
When creating temporary tables for analytics/reporting, I've noticed that I often need to create (then drop) indexes on regular tables. Temporary indexes seemed like a natural fit here, so i was wondering if there was any reason why they're not supported (other than no one wanted it!)
Something like this?:
aklaver@test=> create temporary table temp_test(id int, fld_1 varchar);
CREATE TABLE
aklaver@test=> create index temp_idx on temp_test(fld_1);
CREATE INDEX
I think he means more like:
create temporary table temp_test(id int, fld_1 varchar);
create temporary index on permanent_table (fld_1);
select something from temp_test join permanent_table using (fld_1) where a=b;
select something_else from temp_test join permanent_table using (fld_1) where c=d;
Cheers,
Jeff
On Oct 21, 2015, at 2:59 PM, Jeff Janes wrote: > I think he means more like: > > create temporary table temp_test(id int, fld_1 varchar); > create temporary index on permanent_table (fld_1); > > select something from temp_test join permanent_table using (fld_1) where a=b; > select something_else from temp_test join permanent_table using (fld_1) where c=d; Yes. That's exactly what I mean: A "temporary index" on a "permanent table" , which expires in the same manner of a "temporarytable".
What Adrian is saying is that there is no need for "temporary" indexes. You can create the idxs on a temp table and they get dropped when you drop the table.
Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Adrian Klaver <adrian.klaver@aklaver.com>
Date: 10/21/2015 14:50 (GMT-05:00)
To: Jonathan Vanasco <postgres@2xlp.com>, PostgreSQL mailing lists <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] temporary indexes?
> I couldn't find any mention of this on the archives...
>
> Have the project maintainers ever considered extending CREATE INDEX to support "temporary" indexes like CREATE TEMPORARY TABLE?
>
> When creating temporary tables for analytics/reporting, I've noticed that I often need to create (then drop) indexes on regular tables. Temporary indexes seemed like a natural fit here, so i was wondering if there was any reason why they're not supported (other than no one wanted it!)
Something like this?:
aklaver@test=> create temporary table temp_test(id int, fld_1 varchar);
CREATE TABLE
aklaver@test=> create index temp_idx on temp_test(fld_1);
CREATE INDEX
aklaver@test=> \d temp_test
Table "pg_temp_2.temp_test"
Column | Type | Modifiers
--------+-------------------+-----------
id | integer |
fld_1 | character varying |
Indexes:
"temp_idx" btree (fld_1)
>
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 10/21/2015 12:27 PM, Jonathan Vanasco wrote: > > On Oct 21, 2015, at 2:59 PM, Jeff Janes wrote: >> I think he means more like: >> >> create temporary table temp_test(id int, fld_1 varchar); >> create temporary index on permanent_table (fld_1); >> >> select something from temp_test join permanent_table using (fld_1) where a=b; >> select something_else from temp_test join permanent_table using (fld_1) where c=d; > > Yes. That's exactly what I mean: A "temporary index" on a "permanent table" , which expires in the same manner of a "temporarytable". I misunderstood then. The only thing I can think of is to wrap in a transaction, though that presents other issues with open transactions and/or errors in the transaction. > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Oct 21, 2015, at 3:42 PM, Adrian Klaver wrote: > I misunderstood then. The only thing I can think of is to wrap in a transaction, though that presents other issues withopen transactions and/or errors in the transaction. I just explicitly drop. The convenience of an auto-drop would be a nice backup. Transactions and table-locking issues are probably why temporary indexes don't exist.
On 10/21/2015 01:28 PM, Jonathan Vanasco wrote: > > On Oct 21, 2015, at 3:42 PM, Adrian Klaver wrote: > >> I misunderstood then. The only thing I can think of is to wrap in a transaction, though that presents other issues withopen transactions and/or errors in the transaction. > > I just explicitly drop. The convenience of an auto-drop would be a nice backup. > > Transactions and table-locking issues are probably why temporary indexes don't exist. > On later versions there is CREATE INDEX CONCURRENTLY which alleviates locking issues at the expense of time. I would think the greater issue is the time and overhead of building an index for a table of any size would eat into 'temporary'. Seems if you are joining temporary tables against permanent tables on a regular basis it would pay just to keep the indexes on the permanent tables and pay the expense over a longer period of time. -- Adrian Klaver adrian.klaver@aklaver.com
Jonathan Vanasco <postgres@2xlp.com> wrote: > I couldn't find any mention of this on the archives... > > Have the project maintainers ever considered extending CREATE INDEX to > support "temporary" indexes like CREATE TEMPORARY TABLE? Not sure if you mean something like this: http://www.depesz.com/2015/09/07/hypothetical-indexes/ > When creating temporary tables for analytics/reporting, I've noticed > that I often need to create (then drop) indexes on regular tables. Than you have to create this index. But the creation of an index is expensive, maybe it's cheaper do the nalytics/reporting without an index. And yes: you can create an indexe concurrently. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On 21-10-15 20:43, Jonathan Vanasco wrote: > I couldn't find any mention of this on the archives... > > Have the project maintainers ever considered extending CREATE INDEX to support "temporary" indexes like CREATE TEMPORARYTABLE? > > When creating temporary tables for analytics/reporting, I've noticed that I often need to create (then drop) indexes onregular tables. Temporary indexes seemed like a natural fit here, so i was wondering if there was any reason why they'renot supported (other than no one wanted it!) > Do you mean creating a temporary index on a non-temporary table to speed up the queries that fills the temporary table?
On 10/21/15 3:28 PM, Jonathan Vanasco wrote: > > On Oct 21, 2015, at 3:42 PM, Adrian Klaver wrote: > >> I misunderstood then. The only thing I can think of is to wrap in a transaction, though that presents other issues withopen transactions and/or errors in the transaction. > > I just explicitly drop. The convenience of an auto-drop would be a nice backup. > > Transactions and table-locking issues are probably why temporary indexes don't exist. I think it's more that no one has proposed it until now. It probably wouldn't be terribly hard to add them... the biggest issue would probably be changing the buffer management code so it didn't assume that a temporary relation went into temporary buffers. Other than that, presumably they'd work the same as temp tables. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 10/21/15 3:28 PM, Jonathan Vanasco wrote: >> Transactions and table-locking issues are probably why temporary indexes don't exist. > I think it's more that no one has proposed it until now. It probably > wouldn't be terribly hard to add them... the biggest issue would > probably be changing the buffer management code so it didn't assume that > a temporary relation went into temporary buffers. Uh, why would you do that? You'd be throwing away one of the principal performance advantages of temp tables. regards, tom lane
On 10/22/15 12:36 PM, Tom Lane wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> On 10/21/15 3:28 PM, Jonathan Vanasco wrote: >>> Transactions and table-locking issues are probably why temporary indexes don't exist. > >> I think it's more that no one has proposed it until now. It probably >> wouldn't be terribly hard to add them... the biggest issue would >> probably be changing the buffer management code so it didn't assume that >> a temporary relation went into temporary buffers. > > Uh, why would you do that? You'd be throwing away one of the principal > performance advantages of temp tables. This would be for temporary *indexes* on permanent tables. Actually, it depends on what behavior you'd expect from a temporary index. If it was only going to exist for the duration of a REPEATABLE READ transaction it wouldn't care about concurrent DML on the table, so the index could use temp buffers and the index creation could take shortcuts as well, since it'd only need to index tuples that satisfy that transaction's snapshot. OTOH, if you had anything looser than that the index would need to operate the same as a regular index, so all other backends would need to update it. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 10/22/15 12:36 PM, Tom Lane wrote: >> Uh, why would you do that? You'd be throwing away one of the principal >> performance advantages of temp tables. > Actually, it depends on what behavior you'd expect from a temporary > index. If it was only going to exist for the duration of a REPEATABLE > READ transaction it wouldn't care about concurrent DML on the table, so > the index could use temp buffers and the index creation could take > shortcuts as well, since it'd only need to index tuples that satisfy > that transaction's snapshot. > OTOH, if you had anything looser than that the index would need to > operate the same as a regular index, so all other backends would need to > update it. Hmm, good point. Still, that means that such indexes would be considerably more expensive than true temp indexes, because of the concurrency and locking requirements, which would be just like regular indexes. AFAICS it would be better to think of them as unlogged indexes, because suppressing WAL logging is all you could get out of it. FWIW, I don't find much attraction in the idea of building an index for use by a single query. There basically isn't any scenario where that's going to beat running a plan that doesn't require the index. The value of an index is generally to avoid a whole-table scan and/or a sort, but you'll necessarily pay those costs to make the index. regards, tom lane
On Oct 22, 2015, at 2:08 PM, Tom Lane wrote: > FWIW, I don't find much attraction in the idea of building an index for > use by a single query. There basically isn't any scenario where that's > going to beat running a plan that doesn't require the index. The value of > an index is generally to avoid a whole-table scan and/or a sort, but > you'll necessarily pay those costs to make the index. On Oct 22, 2015, at 8:17 AM, vincent elschot wrote: > Do you mean creating a temporary index on a non-temporary table to speed up the queries that fills the temporary table? One of the use-cases is speeding up inserts on create, but another is for periodic analytics routines (which we handle withexplicit create/drop index commands. In one example of our analytics routines, we end up needing to create/drop about 15 indexes to optimize 45 queries. Thisspeeds up the execution by 1000% and minimizes RAM usage. We don't keep the indexes active, because we only need themfor analytics and the overhead of managing them during high write periods during the day is noticeable. Creating anddropping these indexes on-demand gives us all the benefit with none of the drawbacks.
On 10/22/15 3:15 PM, Jonathan Vanasco wrote: > On Oct 22, 2015, at 8:17 AM, vincent elschot wrote: >> Do you mean creating a temporary index on a non-temporary table to speed up the queries that fills the temporary table? > > One of the use-cases is speeding up inserts on create, but another is for periodic analytics routines (which we handlewith explicit create/drop index commands. > > In one example of our analytics routines, we end up needing to create/drop about 15 indexes to optimize 45 queries. Thisspeeds up the execution by 1000% and minimizes RAM usage. We don't keep the indexes active, because we only need themfor analytics and the overhead of managing them during high write periods during the day is noticeable. Creating anddropping these indexes on-demand gives us all the benefit with none of the drawbacks. What % of execution time is spent creating those indexes? Or is that factored into the 1000%? Also, could your analysis queries be run in a REPEATABLE READ transaction (meaning that once the transaction starts it doesn't get any new data)? If it could then the temp indexes could be static, which would mean no update overhead. Tom's point about unlogged indexes is a good one, I hadn't thought about that. Something else to think about here is catalog churn. So the ideal case for an index that couldn't be static would be an unlogged index that you could invalidate and rebuild at will. When you're done with it you invalidate it and maintenance costs go to essentially 0. Rebuilds and maintenance are cheaper because they're not WAL logged. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On Oct 22, 2015, at 5:04 PM, Jim Nasby wrote: > > What % of execution time is spent creating those indexes? Or is that factored into the 1000%? Also, could your analysisqueries be run in a REPEATABLE READ transaction (meaning that once the transaction starts it doesn't get any newdata)? If it could then the temp indexes could be static, which would mean no update overhead. Running without the indexes would take over an hour to execute the scripts, and totally jams the machine (we got 30minutesin once, and had to kill it). That's because of millions of rows used in joins and sequential scans. Building all the indexes takes 30 seconds; most SQL commands then run only against the indexes (some of which are partial)and the entire suite finishes in about 3 minutes. If the indexes stay active during the day, there seems to be a 2-3% drop in write performance. This is on a webapp, sowe're just happier shifting the index work from peak hours to offpeak hours. It means we can delay spinning up anotherapplication server a bit longer. I'll definitely look into your suggestions the next time I hit this code.