Thread: Indexes on Large Tables
I am running a dual athlon 1800 with an gig of ram. I am running postgres 7.2. I have a table that hs 15 million rows. I have a query that has one columns and the select cluase and one column in the where clause with no joining or sub-queries. If I do not index the column from the where clause the query returns the 150,000 rows in 4 and a half minutes. If in do index the column from the where clause using a btree the 150,000 rows return in 11 and a half minutes. Any insight would be greatly appreciated. Donny L. Drummonds
On Mon, 3 Feb 2003, Donny Drummonds wrote: > I am running a dual athlon 1800 with an gig of ram. I am running postgres > 7.2. I have a table that hs 15 million rows. I have a query that has one > columns and the select cluase and one column in the where clause with no > joining or sub-queries. If I do not index the column from the where clause > the query returns the 150,000 rows in 4 and a half minutes. If in do index > the column from the where clause using a btree the 150,000 rows return in 11 > and a half minutes. > > Any insight would be greatly appreciated. Hmm, have you analyzed recently? In any case, the table definition, query and explain analyze outpute would be useful to see.
On Mon, 3 Feb 2003, Donny Drummonds wrote: > If I do not index the column from the where clause the query returns > the 150,000 rows in 4 and a half minutes. If in do index the column > from the where clause using a btree the 150,000 rows return in 11 and > a half minutes. > > Any insight would be greatly appreciated. Well, I'm not sure if this is the insight you're looking for, but.... The reason it takes longer if you use an index is that you change from sequential I/O (which is relatively fast) to random I/O (which is relatively slow). With the table scan (reading the entire table in whatever order it's in on the disk) you're reading several times as much data, but you're not doing head seeks all over the place to move the head to the place where the next bit of data to be read is. Obviously, in this case, even though an index was available, the planner was wrong to chose to use it rather than just read the entire table. That is, as someone else mentioned, likely due to bad statistics: the planner thought you were going to select a very small part of the table, rather than ten percent of it (which is a pretty large fraction, for these purposes). Try doing an ANALYZE. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
----- Original Message ----- From: "Curt Sampson" <cjs@cynic.net> To: "Donny Drummonds" <donny@cypresstg.com> Cc: <pgsql-admin@postgresql.org> Sent: Friday, February 07, 2003 5:13 AM Subject: Re: [ADMIN] Indexes on Large Tables > On Mon, 3 Feb 2003, Donny Drummonds wrote: > > > If I do not index the column from the where clause the query returns > > the 150,000 rows in 4 and a half minutes. If in do index the column > > from the where clause using a btree the 150,000 rows return in 11 and > > a half minutes. > > > > Any insight would be greatly appreciated. > > Well, I'm not sure if this is the insight you're looking for, but.... > > The reason it takes longer if you use an index is that you change > from sequential I/O (which is relatively fast) to random I/O (which > is relatively slow). With the table scan (reading the entire table in > whatever order it's in on the disk) you're reading several times as much > data, but you're not doing head seeks all over the place to move the > head to the place where the next bit of data to be read is. > > Obviously, in this case, even though an index was available, the planner > was wrong to chose to use it rather than just read the entire table. > That is, as someone else mentioned, likely due to bad statistics: the > planner thought you were going to select a very small part of the table, > rather than ten percent of it (which is a pretty large fraction, for > these purposes). Try doing an ANALYZE. > > cjs If the index that you use in the WHERE clause is the most common method that you are accessing the table then I recommend that you periodically run the cluster command: CLUSTER indexname ON tablename When a table is clustered, it is physically reordered on disk based on the index information. Donald