Thread: tsearch2 patch status report

tsearch2 patch status report

From
Tom Lane
Date:
I've applied version 0.58 of the patch with a lot of further
editorializing.  I feel fairly confident now in the code that interfaces
between tsearch and the rest of the system, but a lot of the
lowest-level "guts" of tsearch (mainly in src/backend/tsearch/*.c and
src/backend/utils/adt/ts*.c) left my eyes glazing over.  Perhaps someone
else can make an extra review pass over that stuff.

I am quite confident that this commit broke the MSVC build, which seems
to need to know individually about each shared library ... Magnus,
can you do something about that?  We'll see what other portability
problems emerge from the buildfarm.

The main thing that is lacking at the moment is documentation.  The
stuff Bruce has been working on will be good introductory material,
but we've got basically zip in reference material.  I'll do some work
on that over the next couple of days, but there's probably room for
more hands.

Also, we need to decide what to do with contrib/tsearch2, which is
currently DOA because of conflicts with the new core code.  We could
either rip it out entirely, or try to convert it into a compatibility
package.  In view of the renamings of functions we agreed to do, I
think there is some scope for a compatibility package, but I have no
time to work on that.

This is, by a wide margin, the largest single patch ever to hit the
Postgres CVS tree.  Congratulations to Oleg and Teodor on seeing
it through!
        regards, tom lane


Re: tsearch2 patch status report

From
Andrew Dunstan
Date:

Tom Lane wrote:
> I am quite confident that this commit broke the MSVC build, which seems
> to need to know individually about each shared library ... Magnus,
> can you do something about that?  We'll see what other portability
> problems emerge from the buildfarm.
>
>
>   

You broke my shiny new MinGW and Cygwin buildfarm members too :-) (MSVC 
is on the way - I'll try to have it up in a few days).

cheers

andrew






Re: tsearch2 patch status report

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:

> Also, we need to decide what to do with contrib/tsearch2, which is
> currently DOA because of conflicts with the new core code.  We could
> either rip it out entirely, or try to convert it into a compatibility
> package.  In view of the renamings of functions we agreed to do, I
> think there is some scope for a compatibility package, but I have no
> time to work on that.

I saw we leave it as a stub with a readme pointing to the docs.

Sincerely,

Joshua D. Drake

> 
> This is, by a wide margin, the largest single patch ever to hit the
> Postgres CVS tree.  Congratulations to Oleg and Teodor on seeing
> it through!
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


- --
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/        UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGyk80ATb/zqfZUUQRAhbQAJwP95u10LTZ/apiUELtT2GthIZHfQCdGxDh
JRfdszL69TQOBD/6hlVZLuA=
=h9E9
-----END PGP SIGNATURE-----


Re: tsearch2 patch status report

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> You broke my shiny new MinGW and Cygwin buildfarm members too :-)

Yeah, I was just looking at that.  I seem to recall that the fu000001.o(.idata$3+0xc): undefined reference to
`libpostgres_a_iname'
bleat is a symptom of a reference to a variable that isn't marked
DLLIMPORT ... but CurrentMemoryContext certainly is, so there's not
anything here sufficient to fix it.  I trust someone with access to
a Windows build environment will dig into that.
        regards, tom lane


Re: tsearch2 patch status report

From
Bruce Momjian
Date:
Tom Lane wrote:
> The main thing that is lacking at the moment is documentation.  The
> stuff Bruce has been working on will be good introductory material,
> but we've got basically zip in reference material.  I'll do some work
> on that over the next couple of days, but there's probably room for
> more hands.

Oleg and Teodor did provide reference documentation.  You can see the
SGML here:
http://momjian.us/expire/textsearch/SGML/ref/

The SQL commands were in a state of flux so I haven't worked on them
yet.  I can start now.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: tsearch2 patch status report

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Oleg and Teodor did provide reference documentation.  You can see the
> SGML here:
>     http://momjian.us/expire/textsearch/SGML/ref/
> The SQL commands were in a state of flux so I haven't worked on them
> yet.  I can start now.

OK.  I whacked around the command syntax a bit in order to cut down the
number of keywords the grammar needed to know about.  (Every new keyword
creates a distributed cost in the size and speed of the parser, so we
shouldn't create 'em when we don't have to.)  So I guess I'm on the hook
to get the command syntax reference pages done while the reality is
fresh in my mind.  Will get on it tomorrow.

The other areas that need work include datatype and function reference
documentation --- how do you want to attack that?  Should we create new
<sect1> sections in those chapters?
        regards, tom lane


Re: tsearch2 patch status report

From
Magnus Hagander
Date:
On Mon, Aug 20, 2007 at 10:50:49PM -0400, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > You broke my shiny new MinGW and Cygwin buildfarm members too :-)
> 
> Yeah, I was just looking at that.  I seem to recall that the 
>     fu000001.o(.idata$3+0xc): undefined reference to `libpostgres_a_iname'
> bleat is a symptom of a reference to a variable that isn't marked
> DLLIMPORT ... but CurrentMemoryContext certainly is, so there's not
> anything here sufficient to fix it.  I trust someone with access to
> a Windows build environment will dig into that.

Mingw fixed, I think. I think that will also fix cygwin, but less sure
there..

//Magnus


Re: tsearch2 patch status report

From
Magnus Hagander
Date:
On Mon, Aug 20, 2007 at 09:23:25PM -0400, Tom Lane wrote:
> I've applied version 0.58 of the patch with a lot of further
> editorializing.  I feel fairly confident now in the code that interfaces
> between tsearch and the rest of the system, but a lot of the
> lowest-level "guts" of tsearch (mainly in src/backend/tsearch/*.c and
> src/backend/utils/adt/ts*.c) left my eyes glazing over.  Perhaps someone
> else can make an extra review pass over that stuff.
> 
> I am quite confident that this commit broke the MSVC build, which seems
> to need to know individually about each shared library ... Magnus,
> can you do something about that?  We'll see what other portability
> problems emerge from the buildfarm.

Looking at that now.

I get  awarning from the following line in bison generated parse.h:
#define TEXT 577


Because TEXT is a macro that's used in the windows headers. (A redefinition
warning).

Any chance to change that, or is it coming out of the syntax itself? (bison
newbie here, as you know :-P)


> Also, we need to decide what to do with contrib/tsearch2, which is
> currently DOA because of conflicts with the new core code.  We could
> either rip it out entirely, or try to convert it into a compatibility
> package.  In view of the renamings of functions we agreed to do, I
> think there is some scope for a compatibility package, but I have no
> time to work on that.

OTOH, if we do it as a compat package, we need to set a firm end-date on
it, so we don't have to maintain it forever. Given the issues always at
hand for doing such an upgrade, my vote is actually for ripping it out
completely and take the migration pain once and then be done with it.

> This is, by a wide margin, the largest single patch ever to hit the
> Postgres CVS tree.  Congratulations to Oleg and Teodor on seeing
> it through!

Yes, congratulations indeed. Great to see this in!

//Magnus


Re: tsearch2 patch status report

From
Alvaro Herrera
Date:
Magnus Hagander wrote:
> On Mon, Aug 20, 2007 at 09:23:25PM -0400, Tom Lane wrote:

> I get  awarning from the following line in bison generated parse.h:
> #define TEXT 577

That's easy, change the terminal to TEXT_P on gram.y and keywords.c.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: tsearch2 patch status report

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Magnus Hagander wrote:
>> I get  awarning from the following line in bison generated parse.h:
>> #define TEXT 577

> That's easy, change the terminal to TEXT_P on gram.y and keywords.c.

Yeah, I was wondering if we'd have to do that with any of the new
keywords.  Will fix, if you didn't already.
        regards, tom lane


Re: tsearch2 patch status report

From
Magnus Hagander
Date:
On Tue, Aug 21, 2007 at 10:59:52AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Magnus Hagander wrote:
> >> I get  awarning from the following line in bison generated parse.h:
> >> #define TEXT 577
> 
> > That's easy, change the terminal to TEXT_P on gram.y and keywords.c.
> 
> Yeah, I was wondering if we'd have to do that with any of the new
> keywords.  Will fix, if you didn't already.

Nope, not yet since it's just a warning, so please do.

//Magnus


Re: tsearch2 patch status report

From
Oleg Bartunov
Date:
On Mon, 20 Aug 2007, Tom Lane wrote:

> I've applied version 0.58 of the patch with a lot of further
> editorializing.  I feel fairly confident now in the code that interfaces

Great ! Just checked and most things after trivial changes are working !
We need to summarize changes and provide upgraide guide.

> The main thing that is lacking at the moment is documentation.  The
> stuff Bruce has been working on will be good introductory material,
> but we've got basically zip in reference material.  I'll do some work
> on that over the next couple of days, but there's probably room for
> more hands.

We have SQL reference and I'm sure Bruce should have it too. Bruce,
let me know when you do last changes, so I could read docs again

>
> Also, we need to decide what to do with contrib/tsearch2, which is
> currently DOA because of conflicts with the new core code.  We could
> either rip it out entirely, or try to convert it into a compatibility
> package.  In view of the renamings of functions we agreed to do, I
> think there is some scope for a compatibility package, but I have no
> time to work on that.

Probably, we could leave tsearch2 in contrib as a compat module and 
explicitly define 8.4 will be the last release with tsearch2 support.

>
> This is, by a wide margin, the largest single patch ever to hit the
> Postgres CVS tree.  Congratulations to Oleg and Teodor on seeing
> it through!

It was really hard task for all of us. It's pity I don't drink at all, but
Teodor will drink the health of "text search in PostgreSQL" today !
Tom, as I know,  already drain out bottle of vodka, so Bruce, you need to 
join us !


    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: tsearch2 patch status report

From
Devrim GÜNDÜZ
Date:
Hi,

On Tue, 2007-08-21 at 20:04 +0400, Oleg Bartunov wrote:

> It was really hard task for all of us. It's pity I don't drink at all,
> but Teodor will drink the health of "text search in PostgreSQL"
> today .

Haha :) I'd drink Vodka today, with some of my friends. (I'll drink
Absolut, and I know you guys do not like it ;) )

Thanks for the good work guys.

Cheers,
--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/



Re: tsearch2 patch status report

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> I've applied version 0.58 of the patch with a lot of further
> editorializing.  I feel fairly confident now in the code that interfaces
> between tsearch and the rest of the system, but a lot of the
> lowest-level "guts" of tsearch (mainly in src/backend/tsearch/*.c and
> src/backend/utils/adt/ts*.c) left my eyes glazing over.  Perhaps someone
> else can make an extra review pass over that stuff.

I'm skimming through tsearch code, trying to understand it. I'd like to
see more comments, at least function-comments explaining what each
function does, what the arguments are etc. I can try to add them as I go
as well, and send a patch.

The memory management of the init functions looks weird. In spell.c,
there's this piece of code:

> /*
>  * during initialization dictionary requires a lot
>  * of memory, so it will use temporary context
>  */
> static MemoryContext tmpCtx = NULL;
> 
> #define tmpalloc(sz)  MemoryContextAlloc(tmpCtx, (sz))
> #define tmpalloc0(sz)  MemoryContextAllocZero(tmpCtx, (sz))
> 
> static void
> checkTmpCtx(void)
> {
>     if (CurrentMemoryContext->firstchild == NULL)
>     {
>         tmpCtx = AllocSetContextCreate(CurrentMemoryContext,
>                                        "Ispell dictionary init context",
>                                        ALLOCSET_DEFAULT_MINSIZE,
>                                        ALLOCSET_DEFAULT_INITSIZE,
>                                        ALLOCSET_DEFAULT_MAXSIZE);
>     }
>     else
>         tmpCtx = CurrentMemoryContext->firstchild;
> }


checkTmpCtx is called by all the initialization functions in spell.c. I
believe it's assumed that if firstchild exists, it's a previously
allocated "init context". But there isn't actually any guarantee that
the CurrentMemoryContext doesn't already have a child context, in which
case we would use whatever the first child context happens to be. And at
least dispell_init calls
MemoryContextDeleteChildren(CurrentMemoryContext), again with no
guarantee that there isn't other unrelated child contexts. I think
dispell_init should create the new context before calling the functions
in spell.c, and destroy it at the end. I can submit a patch, unless I'm
missing something.

More comments as I get further...

PS. Nice to see tsearch in core!

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: tsearch2 patch status report

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> I'm skimming through tsearch code, trying to understand it. I'd like to
> see more comments, at least function-comments explaining what each
> function does, what the arguments are etc. I can try to add them as I go
> as well, and send a patch.

Yes, the patch is not well commented.  I improved matters in the
interface code that I worked over (eg tsearchcmds.c), but ran out of
patience for trying to comment the "guts" code.  Please send a patch
for any improvements you make.

> The memory management of the init functions looks weird. In spell.c,
> there's this piece of code:

I saw that and didn't care for it much, but didn't look into exactly
what would be needed to get rid of it.  (I did clean up another place
that had a *global* magic context pointer ... that looked like trouble
waiting to happen.)  If you make the outside caller create the context,
does it need to be passed explicitly or can it just be
CurrentMemoryContext when they are called?
        regards, tom lane


Re: tsearch2 patch status report

From
Heikki Linnakangas
Date:
Tom Lane wrote:
>> The memory management of the init functions looks weird. In spell.c,
>> there's this piece of code:
> 
> I saw that and didn't care for it much, but didn't look into exactly
> what would be needed to get rid of it.  (I did clean up another place
> that had a *global* magic context pointer ... that looked like trouble
> waiting to happen.)  If you make the outside caller create the context,
> does it need to be passed explicitly or can it just be
> CurrentMemoryContext when they are called?

I believe CurrentMemoryContext would work just fine. I'll put together a
patch.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: tsearch2 patch status report

From
Chris Browne
Date:
tgl@sss.pgh.pa.us (Tom Lane) writes:
> The main thing that is lacking at the moment is documentation.  The
> stuff Bruce has been working on will be good introductory material,
> but we've got basically zip in reference material.  I'll do some work
> on that over the next couple of days, but there's probably room for
> more hands.

If someone has either text or HTML formatted material that seems
plausible, I'd be happy to help get it into DocBook form.
-- 
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/linuxxian.html
"Although  Unix is  more reliable,  NT may  become more  reliable with
time"  --   Ron  Redman,  deputy  technical  director   of  the  Fleet
Introduction Division of the Aegis Program Executive Office, US Navy.


Re: tsearch2 patch status report

From
Bruce Momjian
Date:
Oleg Bartunov wrote:
> On Mon, 20 Aug 2007, Tom Lane wrote:
> 
> > I've applied version 0.58 of the patch with a lot of further
> > editorializing.  I feel fairly confident now in the code that interfaces
> 
> Great ! Just checked and most things after trivial changes are working !
> We need to summarize changes and provide upgraide guide.
> 
> > The main thing that is lacking at the moment is documentation.  The
> > stuff Bruce has been working on will be good introductory material,
> > but we've got basically zip in reference material.  I'll do some work
> > on that over the next couple of days, but there's probably room for
> > more hands.
> 
> We have SQL reference and I'm sure Bruce should have it too. Bruce,
> let me know when you do last changes, so I could read docs again

OK, I am unclear how the main documentation is going to change.  I will
look over the reference pages and get them online soon and we can all
see them.

> > This is, by a wide margin, the largest single patch ever to hit the
> > Postgres CVS tree.  Congratulations to Oleg and Teodor on seeing
> > it through!
> 
> It was really hard task for all of us. It's pity I don't drink at all, but
> Teodor will drink the health of "text search in PostgreSQL" today !
> Tom, as I know,  already drain out bottle of vodka, so Bruce, you need to 
> join us !

Shame I don't drink either.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: tsearch2 patch status report

From
"Chad Wagner"
Date:
Just a heads up, not sure if you guys are aware of it.  But one of the Makefile's (src/backend/tsearch/Makefile) added
bythis patch breaks the "build out of source tree" feature of autoconf/automake.  The problem is pretty
straightforward,and after adding $(srcdir) everything seems to be fine. <br /><br />Index:
src/backend/tsearch/Makefile<br/>===================================================================<br />RCS file:
/u01/mirrors/cvs/pgsql/pgsql/src/backend/tsearch/Makefile,v<br/>retrieving revision 1.1<br />diff -p -c - r1.1
Makefile<br/>*** src/backend/tsearch/Makefile        21 Aug 2007 01:11:18 -0000      1.1<br />---
src/backend/tsearch/Makefile       21 Aug 2007 23:58:37 -0000<br />*************** depend dep:<br />*** 31,37 ****<br
/> .PHONY: install-data <br />  install-data: $(DICTFILES) installdirs<br />        for i in $(DICTFILES); \<br
/>!              do $(INSTALL_DATA) $$i '$(DESTDIR)$(datadir)/$(DICTDIR)/'$$i; \<br />        done<br />  <br /> 
installdirs:<br/>--- 31,37 ---- <br />  .PHONY: install-data<br />  install-data: $(DICTFILES) installdirs<br />       
fori in $(DICTFILES); \<br />!               do $(INSTALL_DATA) $(srcdir)/$$i '$(DESTDIR)$(datadir)/$(DICTDIR)/'$$i;
\<br/>        done <br />  <br />  installdirs:<br /><br /> 

Re: tsearch2 patch status report

From
"Merlin Moncure"
Date:
On 8/21/07, Magnus Hagander <magnus@hagander.net> wrote:
> OTOH, if we do it as a compat package, we need to set a firm end-date on
> it, so we don't have to maintain it forever. Given the issues always at
> hand for doing such an upgrade, my vote is actually for ripping it out
> completely and take the migration pain once and then be done with it.

I would suggest making a pgfoundry project...that's what was done with
userlocks.  I'm pretty certain no one besides me has ever used the
wrappers I created...a lot more people use tsearch2 than userlocks
though.

merlin


Re: tsearch2 patch status report

From
Oleg Bartunov
Date:
On Tue, 21 Aug 2007, Chris Browne wrote:

> tgl@sss.pgh.pa.us (Tom Lane) writes:
>> The main thing that is lacking at the moment is documentation.  The
>> stuff Bruce has been working on will be good introductory material,
>> but we've got basically zip in reference material.  I'll do some work
>> on that over the next couple of days, but there's probably room for
>> more hands.
>
> If someone has either text or HTML formatted material that seems
> plausible, I'd be happy to help get it into DocBook form.
>

Documentation existed for months on DocBook form. I even got an offer to 
translate it to chinese while patch was evaluated. Bruce did a lot of
editorial work and now he maintain cvs version.

What we really need - is a summary of changes and upgrade guide.
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: tsearch2 patch status report

From
Tom Lane
Date:
"Chad Wagner" <chad.wagner@gmail.com> writes:
> Just a heads up, not sure if you guys are aware of it.  But one of the
> Makefile's (src/backend/tsearch/Makefile) added by this patch breaks the
> "build out of source tree" feature of autoconf/automake.  The problem is
> pretty straightforward, and after adding $(srcdir) everything seems to be
> fine.

Applied, thanks.  (Hm, I thought we had some buildfarm machines testing
VPATH builds these days?  Guess not ...)
        regards, tom lane


Re: tsearch2 patch status report

From
Cédric Villemain
Date:
Le mardi 21 août 2007, Oleg Bartunov a écrit :
> On Mon, 20 Aug 2007, Tom Lane wrote:
> > I've applied version 0.58 of the patch with a lot of further
> > editorializing.  I feel fairly confident now in the code that interfaces
>
> Great ! Just checked and most things after trivial changes are working !
> We need to summarize changes and provide upgraide guide.

Congratulations Teodor and Oleg !

>
> > Also, we need to decide what to do with contrib/tsearch2, which is
> > currently DOA because of conflicts with the new core code.  We could
> > either rip it out entirely, or try to convert it into a compatibility
> > package.  In view of the renamings of functions we agreed to do, I
> > think there is some scope for a compatibility package, but I have no
> > time to work on that.
>
> Probably, we could leave tsearch2 in contrib as a compat module and
> explicitly define 8.4 will be the last release with tsearch2 support.

Are you going to keep tools to compile stemmer from tsearch2 ? (I think about
those files :
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/gendict/config.sh )

Or just let people use this method :
http://momjian.us/expire/textsearch/HTML/textsearch-parser-example.html ?


--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

Re: tsearch2 patch status report

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Applied, thanks.  (Hm, I thought we had some buildfarm machines testing
> VPATH builds these days?  Guess not ...)
>
>
>   

I have switched dungbeetle to use vpath. It's a one line config file 
change, and it builds every hour (3 a day for stable banches, remainder 
for HEAD) if there are changes on the relevant branch.

cheers

andrew


Re: tsearch2 patch status report

From
Ron Mayer
Date:
Merlin Moncure wrote:
> On 8/21/07, Magnus Hagander <magnus@hagander.net> wrote:
>> OTOH, if we do it as a compat package, we need to set a firm end-date on
>> it, so we don't have to maintain it forever. ....
> 
> I would suggest making a pgfoundry project...that's what was done with
> userlocks.  I'm pretty certain no one besides me has ever used the
> wrappers I created...a lot more people use tsearch2 than userlocks
> though.
> 

Hmm..  In that case I'd think people should ask if anyone would use
the tsearch2 compatibility layer before even doing pgfoundry.
Speaking for myself, I expect migrating to the core text search
APIs as something we'd do as part of our 8.3 migration even if
such a compatibility layer existed.