Thread: Triggers and Full Text Search *
More than a year ago, I implemented full text search on one of my sites. From the beginning, there was one problem (or at least, what I perceive to be a problem): when I use a script to insert many documents at once, they do *not* get indexed in fts. If a document is created or inserted one at a time, fts indexes immediately. The workaround I came up with was just to open each of those script inserted documents and then close them. As soon as they are opened, they get indexed.
I assume this has to do with the trigger, which is set to BEFORE, and which I carefully followed from the blog post that I got the code from. I wrote to that author at the time, but he was of no help. My thought was that the trigger was not firing, and thus the documents were not getting indexed, because until the document was actually there, there was nothing to index. Therefore, I thought a simple switch from BEFORE to AFTER would solve my problem. However, in the example in the official docs, BEFORE is used as well, so I abandoned that idea and decided to post this question.
Another solution I had in mind was to simply include an additional step in my insert script to sleep for one second, during which the current document would be opened, and hopefully indexed, and then closed, so the script could go on to the next document. Note my insert script is in Python and goes through Django. This is not a ‘pure’ postgresql operation, if that matters.
My questions are:
Does this sleep / open / close / proceed idea seem like a workable solution?
Is there a better workaround?
At first blush, I would think the speed of insertion would not be an issue for any trigger - it would seem to defeat the purpose - but am I wrong about that?
I am not (yet) posting the trigger code because this post is long already, and if your answers are 1) yes, 2) no and 3) triggers often work / fail like this, then there’s no point and we can wrap this up. But if not, I will happily post what I have. Thank you.
[...]
I am not (yet) posting the trigger code because this post is long already, and if your answers are 1) yes, 2) no and 3) triggers often work / fail like this, then there’s no point and we can wrap this up. But if not, I will happily post what I have. Thank you.
Andreas Joseph Krogh
- Ericson Smith
På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long already, and if your answers are 1) yes, 2) no and 3) triggers often work / fail like this, then there’s no point and we can wrap this up. But if not, I will happily post what I have. Thank you.
This is too much prose for the regular programmer, show us the code, and point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh
On 4/21/20 11:04 AM, Ericson Smith wrote: > I think COPY bypasses the triggers. No: https://www.postgresql.org/docs/12/sql-copy.html "COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules." > > Best Regards > - Ericson Smith > +1 876-375-9857 (whatsapp) > +1 646-483-3420 (sms) > > > > On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh > <andreas@visena.com <mailto:andreas@visena.com>> wrote: > > På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi > <malik.a.rumi@gmail.com <mailto:malik.a.rumi@gmail.com>>: > > [...] > > I am not (yet) posting the trigger code because this post is > long already, and if your answers are 1) yes, 2) no and 3) > triggers often work / fail like this, then there’s no point and > we can wrap this up. But if not, I will happily post what I > have. Thank you. > > This is too much prose for the regular programmer, show us the code, > and point out what doesn't work for you, then we can help:-) > -- > Andreas Joseph Krogh > -- Adrian Klaver adrian.klaver@aklaver.com
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
I think COPY bypasses the triggers.Best Regards
- Ericson Smith+1 876-375-9857 (whatsapp)+1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh <andreas@visena.com> wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long already, and if your answers are 1) yes, 2) no and 3) triggers often work / fail like this, then there’s no point and we can wrap this up. But if not, I will happily post what I have. Thank you.
This is too much prose for the regular programmer, show us the code, and point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh
@Ericson,Forgive me for seeming dense, but how does COPY help or hurt here?@Andreas,I had to laugh at your reference to "prose". Would you believe I am actually a published playwright? Long before I started coding, of course. Old habits die hard.....entry_search_vector_triggerBEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
tag_search_vector_triggerBEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;
tags_search_vector_triggerBEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
search_vector_updateBEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
search_vector_update (tags)BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
Thank you!“None of you has faith until he loves for his brother or his neighbor what he loves for himself.”On Tue, Apr 21, 2020 at 1:05 PM Ericson Smith <esconsult1@gmail.com> wrote:I think COPY bypasses the triggers.Best Regards
- Ericson Smith+1 876-375-9857 (whatsapp)+1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh <andreas@visena.com> wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long already, and if your answers are 1) yes, 2) no and 3) triggers often work / fail like this, then there’s no point and we can wrap this up. But if not, I will happily post what I have. Thank you.
This is too much prose for the regular programmer, show us the code, and point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh
On 4/21/20 11:21 AM, Malik Rumi wrote: > @Ericson, > Forgive me for seeming dense, but how does COPY help or hurt here? > > @Andreas, > I had to laugh at your reference to "prose". Would you believe I am > actually a published playwright? Long before I started coding, of > course. Old habits die hard..... The script code via Python/Django/psycopg2 would be helpful as my suspicion is that you are seeing the effects of open transactions. > > entry_search_vector_trigger > BEGIN > SELECT setweight(to_tsvector(NEW.title), 'A') || > setweight(to_tsvector(NEW.content), 'B') || > setweight(to_tsvector(NEW.category), 'D') || > setweight(to_tsvector(COALESCE(string_agg(tag.tag, > ', '), '')), 'C') > INTO NEW.search_vector > FROM ktab_entry AS entry > LEFT JOIN ktab_entry_tags AS entry_tags ON > entry_tags.entry_id = entry.id <http://entry.id> > LEFT JOIN ktab_tag AS tag ON tag.id <http://tag.id> = > entry_tags.tag_id > WHERE entry.id <http://entry.id> = NEW.id > GROUP BY entry.id <http://entry.id>, category; > RETURN NEW; > END; > > tag_search_vector_trigger > BEGIN > UPDATE ktab_entry SET id = id WHERE id IN ( > SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id > ); > RETURN NEW; > END; > > tags_search_vector_trigger > BEGIN > IF (TG_OP = 'DELETE') THEN > UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id; > RETURN OLD; > ELSE > UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id; > RETURN NEW; > END IF; > END; > > search_vector_update > BEGIN > SELECT setweight(to_tsvector(NEW.title), 'A') || > setweight(to_tsvector(NEW.content), 'B') || > setweight(to_tsvector(NEW.category), 'D') || > setweight(to_tsvector(COALESCE(string_agg(tag.tag, > ', '), '')), 'C') > INTO NEW.search_vector > FROM ktab_entry AS entry > LEFT JOIN ktab_entry_tags AS entry_tags ON > entry_tags.entry_id = entry.id <http://entry.id> > LEFT JOIN ktab_tag AS tag ON tag.id <http://tag.id> = > entry_tags.tag_id > WHERE entry.id <http://entry.id> = NEW.id > GROUP BY entry.id <http://entry.id>, category; > RETURN NEW; > END; > > search_vector_update (tags) > BEGIN > IF (TG_OP = 'DELETE') THEN > UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id; > RETURN OLD; > ELSE > UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id; > RETURN NEW; > END IF; > END; > > Thank you! > > > > */“None of you has faith until he loves for his brother or his neighbor > what he loves for himself.”/* > > > On Tue, Apr 21, 2020 at 1:05 PM Ericson Smith <esconsult1@gmail.com > <mailto:esconsult1@gmail.com>> wrote: > > I think COPY bypasses the triggers. > > Best Regards > - Ericson Smith > +1 876-375-9857 (whatsapp) > +1 646-483-3420 (sms) > > > > On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh > <andreas@visena.com <mailto:andreas@visena.com>> wrote: > > På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi > <malik.a.rumi@gmail.com <mailto:malik.a.rumi@gmail.com>>: > > [...] > > I am not (yet) posting the trigger code because this post is > long already, and if your answers are 1) yes, 2) no and 3) > triggers often work / fail like this, then there’s no point > and we can wrap this up. But if not, I will happily post > what I have. Thank you. > > This is too much prose for the regular programmer, show us the > code, and point out what doesn't work for you, then we can help:-) > -- > Andreas Joseph Krogh > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 2020-04-21 at 12:24 -0500, Malik Rumi wrote: > More than a year ago, I implemented full text search on one of my sites. > From the beginning, there was one problem (or at least, what I perceive > to be a problem): when I use a script to insert many documents at once, > they do *not* get indexed in fts. If a document is created or inserted > one at a time, fts indexes immediately. The workaround I came up with > was just to open each of those script inserted documents and then close > them. As soon as they are opened, they get indexed. A trigger will fire and update the index immediately. That opening and closing you are talking about does not sound like a database activity. Rather, it sounds like your software is delaying the actual insert into the database, which would of course explain why you cannot find it in the index. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
migrations.RunSQL('''CREATE OR REPLACE FUNCTION entry_search_vector_trigger() RETURNS trigger AS $$
BEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tags.tag, ', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id = entry.id
LEFT JOIN ktab_tags AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update BEFORE INSERT OR UPDATE ON ktab_entry
FOR EACH ROW EXECUTE PROCEDURE entry_search_vector_trigger();''')
# Trigger after ktab.Author is updated
'''
Since I don't have author, and besides, his author was a separate table - SKIP
CREATE OR REPLACE FUNCTION author_search_vector_trigger() RETURNS trigger AS $$
BEGIN
UPDATE ktab_entry SET id = id WHERE author_id = NEW.id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update AFTER INSERT OR UPDATE ON ktab_author
FOR EACH ROW EXECUTE PROCEDURE author_search_vector_trigger();
'''
# Trigger after ktab.Entry.tags are added, deleted from a entry
migrations.RunSQL('''CREATE OR REPLACE FUNCTION tags_search_vector_trigger() RETURNS trigger AS $$
BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update AFTER INSERT OR UPDATE OR DELETE ON ktab_entry_tags
FOR EACH ROW EXECUTE PROCEDURE tags_search_vector_trigger();
''')
# Trigger after ktab.Tag is updated
migrations.RunSQL('''CREATE OR REPLACE FUNCTION tag_search_vector_trigger() RETURNS trigger AS $$
BEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER search_vector_update AFTER UPDATE ON ktab_tag
FOR EACH ROW EXECUTE PROCEDURE tag_search_vector_trigger();
My apologies - I did not look closely at the manual. Many many years ago (6.xx days I had a similar problem and leapt to answer).Could you post your CREATE TRIGGER statements as well?On Wed, Apr 22, 2020 at 1:21 AM Malik Rumi <malik.a.rumi@gmail.com> wrote:@Ericson,Forgive me for seeming dense, but how does COPY help or hurt here?@Andreas,I had to laugh at your reference to "prose". Would you believe I am actually a published playwright? Long before I started coding, of course. Old habits die hard.....entry_search_vector_triggerBEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
tag_search_vector_triggerBEGIN
UPDATE ktab_entry SET id = id WHERE id IN (
SELECT entry_id FROM ktab_entry_tags WHERE tag_id = NEW.id
);
RETURN NEW;
END;
tags_search_vector_triggerBEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
search_vector_updateBEGIN
SELECT setweight(to_tsvector(NEW.title), 'A') ||
setweight(to_tsvector(NEW.content), 'B') ||
setweight(to_tsvector(NEW.category), 'D') ||
setweight(to_tsvector(COALESCE(string_agg(tag.tag, ', '), '')), 'C')
INTO NEW.search_vector
FROM ktab_entry AS entry
LEFT JOIN ktab_entry_tags AS entry_tags ON entry_tags.entry_id = entry.id
LEFT JOIN ktab_tag AS tag ON tag.id = entry_tags.tag_id
WHERE entry.id = NEW.id
GROUP BY entry.id, category;
RETURN NEW;
END;
search_vector_update (tags)BEGIN
IF (TG_OP = 'DELETE') THEN
UPDATE ktab_entry SET id = id WHERE id = OLD.entry_id;
RETURN OLD;
ELSE
UPDATE ktab_entry SET id = id WHERE id = NEW.entry_id;
RETURN NEW;
END IF;
END;
Thank you!“None of you has faith until he loves for his brother or his neighbor what he loves for himself.”On Tue, Apr 21, 2020 at 1:05 PM Ericson Smith <esconsult1@gmail.com> wrote:I think COPY bypasses the triggers.Best Regards
- Ericson Smith+1 876-375-9857 (whatsapp)+1 646-483-3420 (sms)On Wed, Apr 22, 2020 at 12:32 AM Andreas Joseph Krogh <andreas@visena.com> wrote:På tirsdag 21. april 2020 kl. 19:24:10, skrev Malik Rumi <malik.a.rumi@gmail.com>:[...]
I am not (yet) posting the trigger code because this post is long already, and if your answers are 1) yes, 2) no and 3) triggers often work / fail like this, then there’s no point and we can wrap this up. But if not, I will happily post what I have. Thank you.
This is too much prose for the regular programmer, show us the code, and point out what doesn't work for you, then we can help:-)--
Andreas Joseph Krogh