Thread: Very long time to execute and Update, suggestions?

Very long time to execute and Update, suggestions?

From
Madison Kelly
Date:
Hi all,

   I have a table with a little over 200,000 columns in it that I need
to update using a regular expression. I admit that though I am not a
beginner and postgres, I am also far from an expert. :p

   I tried to create an Index that would optimize the UPDATE but I may
have made an error in how I built it. Here is the table structure, the
index I tried to create and an 'EXPLAIN ANALYZE' of the UPDATE (though I
  am still just now learning how to use 'EXPLAIN').

tle-bu=> \d file_info_3
                             Table "public.file_info_3"
      Column      |         Type         |                Modifiers
-----------------+----------------------+-----------------------------------------
  file_group_name | text                 | not null
  file_group_uid  | bigint               | not null
  file_mod_time   | bigint               | not null
  file_name       | text                 | not null
  file_parent_dir | text                 | not null
  file_perm       | text                 | not null
  file_size       | bigint               | not null
  file_type       | character varying(2) | not null default
'f'::character varying
  file_user_name  | text                 | not null
  file_user_uid   | bigint               | not null
  file_backup     | boolean              | not null default true
  file_display    | boolean              | not null default false
  file_restore    | boolean              | not null default false
Indexes:
     "file_info_3_display_idx" btree (file_type, file_parent_dir, file_name)

   Here is the EXPLAIN:

tle-bu=> EXPLAIN ANALYZE UPDATE file_info_3 SET file_backup='f' WHERE
file_parent_dir~'^/home' OR (file_parent_dir='/' AND file_name='home');
                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
  Seq Scan on file_info_3  (cost=0.00..7770.00 rows=1006 width=206)
(actual time=1050.813..5648.462 rows=67835 loops=1)
    Filter: ((file_parent_dir ~ '^/home'::text) OR ((file_parent_dir =
'/'::text) AND (file_name = 'home'::text)))
  Total runtime: 68498.898 ms
(3 rows)

   I thought that it would have used the index because 'file_parent_dir'
and 'file_name' are in the index but is I am reading the "EXPLAIN"
output right it isn't but is instead doing a sequencial scan. If that is
the case, how would I best built the index? Should I have just used the
'file_parent_dir' and 'file_name'?

   Thanks all!!

Madison

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Re: Very long time to execute and Update, suggestions?

From
Madison Kelly
Date:
Philip Hallstrom wrote:
> I'm not sure about this which is why I'm replying off list, but your
> index is on file_type, file_parent_dir, and file_name and you're query
> is on file_parent_dir and file_name.
>
> I seem to remember reading that that the index will only get used if the
> columns in the where clause "match up" "in order".
>
> That is um... if you have an index on columns a and b and a where clause
> of "b = 1" it woin't use the index since the index "looks like"
>
> a, b
> a, b
> a, b
> etc...
>
> Does that make any sense?  Not sure if that's right or not, but easy
> enough to remove the "file_type" from your index and try it.
>
> post back to the list if that's it.
>
> -philip

Thanks for the reply!

   I have played around a little more and have created a few different
test Indexes and it looks like it is the regex that is causing it to do
the sequential scan. If I remove the regex and create a
'file_parent_dir', 'file_name' index it will use it. If I create an
Index just for 'file_parent_dir' and change my UPDATE to just look for
the regex '... WHERE file_parent_dir~'^/<dir>'...' it will still do the
sequential scan anyway.

   So I need to either find an Index that will work with regexes or
re-write my code to update each subdirectory separately and use simpler
UPDATE statement for each.

   Thanks again!

Madison

PS - I cc'ed the list to follow up on what I found out so far. (Hi list!)


--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Madison Kelly (Digimer)
TLE-BU, The Linux Experience; Back Up
http://tle-bu.thelinuxexperience.com
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Re: Very long time to execute and Update, suggestions?

From
PFC
Date:
>    So I need to either find an Index that will work with regexes or
> re-write my code to update each subdirectory separately and use simpler
> UPDATE statement for each.

    Why don't you use a LTREE type to model your directory tree ? It's been
designed specifically for this purpose and has indexed regular expression
search.

http://www.sai.msu.su/~megera/postgres/gist/ltree/
http://www.sai.msu.su/~megera/postgres/gist/