Thread: Slow queries on big table
I have a two column table with over 160 million rows in it. As the size of the table grows queries on this table get exponentially slower. I am using version 8.1.5 32-bit on Red Hat Enterprise Linux 3. The hardware is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5 configuration. For current testing I am running a single database connection with no other applications running on the machine, and the swap is not being used at all. Here is the table definition: mdsdb=# \d backup_location Table "public.backup_location" Column | Type | Modifiers -----------+---------+----------- record_id | bigint | not null backup_id | integer | not null Indexes: "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id) "backup_location_rid" btree (record_id) Foreign-key constraints: "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES backups(backup_id) ON DELETE CASCADE Here is the table size: mdsdb=# select count(*) from backup_location; count ----------- 162101296 (1 row) And here is a simple query on this table that takes nearly 20 minutes to return less then 3000 rows. I ran an analyze immediately before I ran this query: mdsdb=# explain analyze select record_id from backup_location where backup_id = 1070; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------- Index Scan using backup_location_pkey on backup_location (cost=0.00..1475268.53 rows=412394 width=8) (actual time=3318.057..1196723.915 rows=2752 loops=1) Index Cond: (backup_id = 1070) Total runtime: 1196725.617 ms (3 rows) Obviously at this point the application is not usable. If possible we would like to grow this table to the 3-5 billion row range, but I don't know if that is realistic. Any guidance would be greatly appreciated. Thanks, Ed
Tyrrill, Ed wrote: > I have a two column table with over 160 million rows in it. As the size > of the table grows queries on this table get exponentially slower. I am > using version 8.1.5 32-bit on Red Hat Enterprise Linux 3. The hardware > is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5 > configuration. For current testing I am running a single database > connection with no other applications running on the machine, and the > swap is not being used at all. > > Here is the table definition: > > mdsdb=# \d backup_location > Table "public.backup_location" > Column | Type | Modifiers > -----------+---------+----------- > record_id | bigint | not null > backup_id | integer | not null > Indexes: > "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id) > "backup_location_rid" btree (record_id) > Foreign-key constraints: > "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES > backups(backup_id) ON DELETE CASCADE > > Here is the table size: > > mdsdb=# select count(*) from backup_location; > count > ----------- > 162101296 > (1 row) > > And here is a simple query on this table that takes nearly 20 minutes to > return less then 3000 rows. I ran an analyze immediately before I ran > this query: > > mdsdb=# explain analyze select record_id from backup_location where > backup_id = 1070; > > QUERY PLAN > > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > ------------- > Index Scan using backup_location_pkey on backup_location > (cost=0.00..1475268.53 rows=412394 width=8) (actual > time=3318.057..1196723.915 rows=2752 loops=1) > Index Cond: (backup_id = 1070) > Total runtime: 1196725.617 ms > (3 rows) > I've got a few points. Firstly, is your data amenable to partitioning? If so that might be a big winner. Secondly, it might be more efficient for the planner to choose the backup_location_rid index than the combination primary key index. You can test this theory with this cool pg trick: begin; alter table backup_location drop constraint backup_location_pkey; explain analyze select .... rollback; to see if it's faster. > Obviously at this point the application is not usable. If possible we > would like to grow this table to the 3-5 billion row range, but I don't > know if that is realistic. > > Any guidance would be greatly appreciated. > Without knowing more about your usage patterns, it's hard to say. But partitioning seems like your best choice at the moment.
"Tyrrill, Ed" <tyrrill_ed@emc.com> writes: > Index Scan using backup_location_pkey on backup_location > (cost=0.00..1475268.53 rows=412394 width=8) (actual > time=3318.057..1196723.915 rows=2752 loops=1) > Index Cond: (backup_id = 1070) > Total runtime: 1196725.617 ms If we take that at face value it says the indexscan is requiring 434 msec per actual row fetched. Which is just not very credible; the worst case should be about 1 disk seek per row fetched. So there's something going on that doesn't meet the eye. What I'm wondering about is whether the table is heavily updated and seldom vacuumed, leading to lots and lots of dead tuples being fetched and then rejected (hence they'd not show in the actual-rows count). The other thing that seems pretty odd is that it's not using a bitmap scan --- for such a large estimated rowcount I'd have expected a bitmap scan not a plain indexscan. What do you get from EXPLAIN ANALYZE if you force a bitmap scan? (Set enable_indexscan off, and enable_seqscan too if you have to.) regards, tom lane
Tyrrill, Ed wrote: > mdsdb=# \d backup_location > Table "public.backup_location" > Column | Type | Modifiers > -----------+---------+----------- > record_id | bigint | not null > backup_id | integer | not null > Indexes: > "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id) > "backup_location_rid" btree (record_id) > Foreign-key constraints: > "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES > backups(backup_id) ON DELETE CASCADE [snip] > mdsdb=# explain analyze select record_id from backup_location where > backup_id = 1070; > > QUERY PLAN > > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > ------------- > Index Scan using backup_location_pkey on backup_location > (cost=0.00..1475268.53 rows=412394 width=8) (actual > time=3318.057..1196723.915 rows=2752 loops=1) > Index Cond: (backup_id = 1070) > Total runtime: 1196725.617 ms > (3 rows) The "backup_location_rid" index on your table is not necessary. The primary key index on (record_id, backup_id) can be used by Postgres, even if the query is only constrained by record_id. See http://www.postgresql.org/docs/8.2/interactive/indexes-multicolumn.html for details. The explain plan indicates that your query is filtered on backup_id, but is using the primary key index on (record_id, backup_id). Based on the table definition, you do not have any good index for filtering on backup_id. The explain plan also seems way off, as I would expect a sequential scan would be used without a good index for backup_id. Did you disable sequential scans before running this query? Have you altered any other configuration or planner parameters? As your "backup_location_rid" is not necessary, I would recommend dropping that index and creating a new one on just backup_id. This should be a net wash on space, and the new index should make for a straight index scan for the query you presented. Don't forget to analyze after changing the indexes. Hope this helps. Andrew
Scott Marlowe <smarlowe@g2switchworks.com> writes: > Secondly, it might be more efficient for the planner to choose the > backup_location_rid index than the combination primary key index. Oh, I'm an idiot; I didn't notice the way the index was set up. Yeah, that index pretty well sucks for a query on backup_id --- it has to scan the entire index, since there's no constraint on the leading column. So that's where the time is going. This combination of indexes: > Indexes: > "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id) > "backup_location_rid" btree (record_id) is really just silly. You should have the pkey and then an index on backup_id alone. See the discussion of multiple indexes in the fine manual: http://www.postgresql.org/docs/8.2/static/indexes-multicolumn.html http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > > Scott Marlowe <smarlowe@g2switchworks.com> writes: > > Secondly, it might be more efficient for the planner to choose the > > backup_location_rid index than the combination primary key index. > > Oh, I'm an idiot; I didn't notice the way the index was set up. > Yeah, that index pretty well sucks for a query on backup_id --- > it has to scan the entire index, since there's no constraint on the > leading column. > So that's where the time is going. > > This combination of indexes: > > > Indexes: > > "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id) > > "backup_location_rid" btree (record_id) > > is really just silly. You should have the pkey and then an index on > backup_id alone. See the discussion of multiple indexes in the fine > manual: > http://www.postgresql.org/docs/8.2/static/indexes-multicolumn.html > http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html > > regards, tom lane Thanks for the help guys! That was my problem. I actually need the backup_location_rid index for a different query so I am going to keep it. Here is the result with the new index: mdsdb=# explain analyze select record_id from backup_location where backup_id = 1070; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ Index Scan using backup_location_bid on backup_location (cost=0.00..9573.07 rows=415897 width=8) (actual time=0.106..3.486 rows=2752 loops=1) Index Cond: (backup_id = 1070) Total runtime: 4.951 ms (3 rows)
On Fri, May 18, 2007 at 02:22:52PM -0700, Tyrrill, Ed wrote: > Total runtime: 4.951 ms Going from 1197 seconds to 5 milliseconds. That's some sort of record in a while, I think :-) /* Steinar */ -- Homepage: http://www.sesse.net/
"Tyrrill, Ed" <tyrrill_ed@emc.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> This combination of indexes: >> >>> Indexes: >>> "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id) >>> "backup_location_rid" btree (record_id) >> >> is really just silly. You should have the pkey and then an index on >> backup_id alone. > Thanks for the help guys! That was my problem. I actually need the > backup_location_rid index for a different query so I am going to keep > it. Well, you don't really *need* it; the two-column index on (record_id, backup_id) will serve perfectly well for queries on its leading column alone. It'll be physically bigger and hence slightly slower to scan than a single-column index; but unless the table is almost completely read-only, the update overhead of maintaining all three indexes is probably going to cost more than you can save with it. Try that other query with and without backup_location_rid and see how much you're really saving. > Index Scan using backup_location_bid on backup_location > (cost=0.00..9573.07 rows=415897 width=8) (actual time=0.106..3.486 > rows=2752 loops=1) > Index Cond: (backup_id = 1070) > Total runtime: 4.951 ms That's more like it ;-) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: >> Thanks for the help guys! That was my problem. I actually need the >> backup_location_rid index for a different query so I am going to keep >> it. > > Well, you don't really *need* it; the two-column index on (record_id, > backup_id) will serve perfectly well for queries on its leading column > alone. It'll be physically >>bigger and hence slightly slower to scan > than a single-column index; but unless the table is almost completely > read-only, the update overhead of maintaining all three indexes is > probably going to cost more than you can save with it. Try that other > query with and without backup_location_rid and see how much you're > really saving. Well, the query that got me to add backup_location_rid took 105 minutes using only the primary key index. After I added backup_location_rid the query was down to about 45 minutes. Still not very good, and I am still fiddling around with it. The query is: mdsdb=# explain analyze select backupobjects.record_id from backupobjects left outer join backup_location using(record_id) where backup_id is null; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------- Merge Left Join (cost=0.00..21408455.06 rows=11790970 width=8) (actual time=2784967.410..2784967.410 rows=0 loops=1) Merge Cond: ("outer".record_id = "inner".record_id) Filter: ("inner".backup_id IS NULL) -> Index Scan using backupobjects_pkey on backupobjects (cost=0.00..443484.31 rows=11790970 width=8) (actual time=0.073..47865.957 rows=11805996 loops=1) -> Index Scan using backup_location_rid on backup_location (cost=0.00..20411495.21 rows=162435366 width=12) (actual time=0.110..2608485.437 rows=162426837 loops=1) Total runtime: 2784991.612 ms (6 rows) It is of course the same backup_location, but backupobjects is: mdsdb=# \d backupobjects Table "public.backupobjects" Column | Type | Modifiers ----------------+-----------------------------+----------- record_id | bigint | not null dir_record_id | integer | name | text | extension | character varying(64) | hash | character(40) | mtime | timestamp without time zone | size | bigint | user_id | integer | group_id | integer | meta_data_hash | character(40) | Indexes: "backupobjects_pkey" PRIMARY KEY, btree (record_id) "backupobjects_meta_data_hash_key" UNIQUE, btree (meta_data_hash) "backupobjects_extension" btree (extension) "backupobjects_hash" btree (hash) "backupobjects_mtime" btree (mtime) "backupobjects_size" btree (size) record_id has in backupobjects has a many to many relationship to record_id in backup_location. Ed