Re: tsearch_core patch: permissions and security issues - Mailing list pgsql-hackers

From Tom Lane
Subject Re: tsearch_core patch: permissions and security issues
Date
Msg-id 19307.1181850561@sss.pgh.pa.us
Whole thread Raw
In response to Re: tsearch_core patch: permissions and security issues  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: tsearch_core patch: permissions and security issues
List pgsql-hackers
Oleg Bartunov <oleg@sai.msu.su> writes:
> You're correct. But we can't defend users from all possible errors. 
> Other side, that we need somehow to help user to identify what fts 
> configuration was used to produce tsvector. For example, comment on
> tsvector column would be useful, but we don't know how to do this
> automatically.

Yeah, I was wondering about that too.  The only way we could relax the
superuser, you-better-know-what-you're-doing restriction on changing
configurations would be if we had a way to identify which tsvector
columns needed to be updated.  Right now that's pretty hard to find out
because the references to configurations are buried in the bodies of
trigger functions.  That whole trigger-function business is not the
nicest part of tsearch2, either ... it'd be better if we could automate
tsvector maintenance more.

One thing I was thinking about is that rather than storing a physical
tsvector column, people might index a "virtual" column using functional
indexes:
create index ... (to_tsvector('english', big_text_col))

which could be queried
select ... where to_tsvector('english', big_text_col) @@ tsquery

Assuming that the index is lossy, the index condition would have to be
rechecked, so to_tsvector() would have to be recomputed, but only at the
rows identified as candidate matches by the index.  The I/O savings from
eliminating the heap's tsvector column might counterbalance the extra
CPU for recomputing tsvectors.  Or not, but in any case this is
attractive because it doesn't need any handmade maintenance support like
a trigger --- the regular index maintenance code does it all.

It strikes me that we could play the same kind of game we played to make
nextval() references to sequences be recognized as dependencies on
sequences.  Invent a "regconfig" OID type that's just like regclass
except it handles OIDs of ts_config entries instead of pg_class entries,
and make the first argument of to_tsvector be one of those:
create index ... (to_tsvector('english'::regconfig, big_text_col))

Now dependency.c can be taught to recognize the regconfig Const as
depending on the referenced ts_config entry, and voila we have a
pg_depend entry showing that the index depends on the configuration.
What we actually do about it is another question, but this at least
gets the knowledge into the system.

[ thinks some more... ]  If we revived the GENERATED AS patch,
you could imagine computing tsvector columns via "GENERATED AS
to_tsvector('english'::regconfig, big_text_col)" instead of a
trigger, and then again you've got the dependency exposed where
the system can see it.  I don't wanna try to do that for 8.3,
but it might be a good path to pursue in future, instead of assuming
that triggers will be the way forevermore.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: tsearch_core patch: permissions and security issues
Next
From: Michael Paesold
Date:
Subject: Re: tsearch_core patch: permissions and security issues