Thread: Very long time to execute and Update, suggestions?
Hi all, I have a table with a little over 200,000 columns in it that I need to update using a regular expression. I admit that though I am not a beginner and postgres, I am also far from an expert. :p I tried to create an Index that would optimize the UPDATE but I may have made an error in how I built it. Here is the table structure, the index I tried to create and an 'EXPLAIN ANALYZE' of the UPDATE (though I am still just now learning how to use 'EXPLAIN'). tle-bu=> \d file_info_3 Table "public.file_info_3" Column | Type | Modifiers -----------------+----------------------+----------------------------------------- file_group_name | text | not null file_group_uid | bigint | not null file_mod_time | bigint | not null file_name | text | not null file_parent_dir | text | not null file_perm | text | not null file_size | bigint | not null file_type | character varying(2) | not null default 'f'::character varying file_user_name | text | not null file_user_uid | bigint | not null file_backup | boolean | not null default true file_display | boolean | not null default false file_restore | boolean | not null default false Indexes: "file_info_3_display_idx" btree (file_type, file_parent_dir, file_name) Here is the EXPLAIN: tle-bu=> EXPLAIN ANALYZE UPDATE file_info_3 SET file_backup='f' WHERE file_parent_dir~'^/home' OR (file_parent_dir='/' AND file_name='home'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Seq Scan on file_info_3 (cost=0.00..7770.00 rows=1006 width=206) (actual time=1050.813..5648.462 rows=67835 loops=1) Filter: ((file_parent_dir ~ '^/home'::text) OR ((file_parent_dir = '/'::text) AND (file_name = 'home'::text))) Total runtime: 68498.898 ms (3 rows) I thought that it would have used the index because 'file_parent_dir' and 'file_name' are in the index but is I am reading the "EXPLAIN" output right it isn't but is instead doing a sequencial scan. If that is the case, how would I best built the index? Should I have just used the 'file_parent_dir' and 'file_name'? Thanks all!! Madison -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Madison Kelly (Digimer) TLE-BU, The Linux Experience; Back Up http://tle-bu.thelinuxexperience.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Philip Hallstrom wrote: > I'm not sure about this which is why I'm replying off list, but your > index is on file_type, file_parent_dir, and file_name and you're query > is on file_parent_dir and file_name. > > I seem to remember reading that that the index will only get used if the > columns in the where clause "match up" "in order". > > That is um... if you have an index on columns a and b and a where clause > of "b = 1" it woin't use the index since the index "looks like" > > a, b > a, b > a, b > etc... > > Does that make any sense? Not sure if that's right or not, but easy > enough to remove the "file_type" from your index and try it. > > post back to the list if that's it. > > -philip Thanks for the reply! I have played around a little more and have created a few different test Indexes and it looks like it is the regex that is causing it to do the sequential scan. If I remove the regex and create a 'file_parent_dir', 'file_name' index it will use it. If I create an Index just for 'file_parent_dir' and change my UPDATE to just look for the regex '... WHERE file_parent_dir~'^/<dir>'...' it will still do the sequential scan anyway. So I need to either find an Index that will work with regexes or re-write my code to update each subdirectory separately and use simpler UPDATE statement for each. Thanks again! Madison PS - I cc'ed the list to follow up on what I found out so far. (Hi list!) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Madison Kelly (Digimer) TLE-BU, The Linux Experience; Back Up http://tle-bu.thelinuxexperience.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> So I need to either find an Index that will work with regexes or > re-write my code to update each subdirectory separately and use simpler > UPDATE statement for each. Why don't you use a LTREE type to model your directory tree ? It's been designed specifically for this purpose and has indexed regular expression search. http://www.sai.msu.su/~megera/postgres/gist/ltree/ http://www.sai.msu.su/~megera/postgres/gist/