Thread: I want to search my project source code

I want to search my project source code

From
Matthew Wilson
Date:
I have a lot of code -- millions of lines at this point, written
over the last 5 years.  Everything is in a bunch of nested folders.

At least once a week, I want to find some code that uses a few modules,
so I have to launch a find + grep at the top of the tree and then wait
for it to finish.

I wonder if I could store our source code in a postgresql table and
then use full text searching to index.  Then I hope I could run a query
where I ask for all files that use modules X, Y, and Z.

I'm looking for something sort of like the locate utility, except that
instead of building a quickly-searchable list of file names, I want to
be able to search file contents also.


Matt

Re: I want to search my project source code

From
Tom Lane
Date:
Matthew Wilson <matt@tplus1.com> writes:
> At least once a week, I want to find some code that uses a few modules,
> so I have to launch a find + grep at the top of the tree and then wait
> for it to finish.

Personally I use glimpse for this.  It's a bit old and creaky but it
performs wonders.  There might be something better out there by now.

I wouldn't recommend trying to use a standard FTS to index code:
code is not a natural language and the kinds of searches you usually
want to perform are a lot different.  As an example, I glimpse for
"foo" when looking for references to a function foo, but "^foo"
when seeking its definition (this relies on the coding conventions
about function layout, of course).  An FTS doesn't think start-of-line
is significant so it can't do that.

            regards, tom lane

Re: I want to search my project source code

From
Oleg Bartunov
Date:
openfts.sf.net is tool for you. It has even example scripts for
indexing/searching file system.

Oleg

On Sat, 27 Oct 2007, Matthew Wilson wrote:

> I have a lot of code -- millions of lines at this point, written
> over the last 5 years.  Everything is in a bunch of nested folders.
>
> At least once a week, I want to find some code that uses a few modules,
> so I have to launch a find + grep at the top of the tree and then wait
> for it to finish.
>
> I wonder if I could store our source code in a postgresql table and
> then use full text searching to index.  Then I hope I could run a query
> where I ask for all files that use modules X, Y, and Z.
>
> I'm looking for something sort of like the locate utility, except that
> instead of building a quickly-searchable list of file names, I want to
> be able to search file contents also.
>
>
> Matt
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: I want to search my project source code

From
Guy Rouillier
Date:
Matthew Wilson wrote:
> I have a lot of code -- millions of lines at this point, written
> over the last 5 years.  Everything is in a bunch of nested folders.
>
> At least once a week, I want to find some code that uses a few modules,
> so I have to launch a find + grep at the top of the tree and then wait
> for it to finish.
>
> I wonder if I could store our source code in a postgresql table and
> then use full text searching to index.  Then I hope I could run a query
> where I ask for all files that use modules X, Y, and Z.

DBMSs are great tools for the right job, but IMO this is not the right
job.  I can't see how a database engine, with all it's transactional
overhead and many other layers, will ever beat a simple grep
performance-wise.  I've used Eclipse for refactoring, but having done it
once, I'm sticking with grep.

--
Guy Rouillier

Re: I want to search my project source code

From
Perry Smith
Date:
On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote:

> Matthew Wilson wrote:
>> I have a lot of code -- millions of lines at this point, written
>> over the last 5 years.  Everything is in a bunch of nested folders.
>> At least once a week, I want to find some code that uses a few
>> modules,
>> so I have to launch a find + grep at the top of the tree and then
>> wait
>> for it to finish.
>> I wonder if I could store our source code in a postgresql table and
>> then use full text searching to index.  Then I hope I could run a
>> query
>> where I ask for all files that use modules X, Y, and Z.
>
> DBMSs are great tools for the right job, but IMO this is not the
> right job.  I can't see how a database engine, with all it's
> transactional overhead and many other layers, will ever beat a
> simple grep performance-wise.  I've used Eclipse for refactoring,
> but having done it once, I'm sticking with grep.

This is exactly what cscope is good for.

http://cscope.sourceforge.net/

I've used it since the early 90's.  I do level 3 support for really
big companies.  If you are an emacs fan, its hooked in to it as well.

You want to use the -q option.  If it is a million lines of code, its
going to take a while.  It pseudo-parses the code (some tricky
constructs will confuse it) and builds a very simple database file.
I think it uses Berkeley's DB file.  After that, finding all the
occurrences of foo is a few seconds.

If you want to find just definitions (like where is foo defined),
then use ctags or etags.  There is exuberant ctags here:

http://ctags.sourceforge.net/

Perry Smith ( pedz@easesoftware.com )
Ease Software, Inc. ( http://www.easesoftware.com )

Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX systems



Re: I want to search my project source code

From
"Martin Gainty"
Date:
Perry-

Does cscope support PHP?

Thanks for the link
M--
----- Original Message -----
From: "Perry Smith" <pedz@easesoftware.com>
To: "Guy Rouillier" <guyr-ml1@burntmail.com>
Cc: <pgsql-general@postgresql.org>
Sent: Sunday, October 28, 2007 10:25 AM
Subject: Re: [GENERAL] I want to search my project source code


> On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote:
>
> > Matthew Wilson wrote:
> >> I have a lot of code -- millions of lines at this point, written
> >> over the last 5 years.  Everything is in a bunch of nested folders.
> >> At least once a week, I want to find some code that uses a few
> >> modules,
> >> so I have to launch a find + grep at the top of the tree and then
> >> wait
> >> for it to finish.
> >> I wonder if I could store our source code in a postgresql table and
> >> then use full text searching to index.  Then I hope I could run a
> >> query
> >> where I ask for all files that use modules X, Y, and Z.
> >
> > DBMSs are great tools for the right job, but IMO this is not the
> > right job.  I can't see how a database engine, with all it's
> > transactional overhead and many other layers, will ever beat a
> > simple grep performance-wise.  I've used Eclipse for refactoring,
> > but having done it once, I'm sticking with grep.
>
> This is exactly what cscope is good for.
>
> http://cscope.sourceforge.net/
>
> I've used it since the early 90's.  I do level 3 support for really
> big companies.  If you are an emacs fan, its hooked in to it as well.
>
> You want to use the -q option.  If it is a million lines of code, its
> going to take a while.  It pseudo-parses the code (some tricky
> constructs will confuse it) and builds a very simple database file.
> I think it uses Berkeley's DB file.  After that, finding all the
> occurrences of foo is a few seconds.
>
> If you want to find just definitions (like where is foo defined),
> then use ctags or etags.  There is exuberant ctags here:
>
> http://ctags.sourceforge.net/
>
> Perry Smith ( pedz@easesoftware.com )
> Ease Software, Inc. ( http://www.easesoftware.com )
>
> Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX systems
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: I want to search my project source code

From
Perry Smith
Date:
On Oct 28, 2000, at 9:41 AM, Martin Gainty wrote:

> Perry-
>
> Does cscope support PHP?

I don't think so.  Exuberant tags suppose a lot of languages but it
does not do references (I think) -- just definitions.


>
> Thanks for the link
> M--
> ----- Original Message -----
> From: "Perry Smith" <pedz@easesoftware.com>
> To: "Guy Rouillier" <guyr-ml1@burntmail.com>
> Cc: <pgsql-general@postgresql.org>
> Sent: Sunday, October 28, 2007 10:25 AM
> Subject: Re: [GENERAL] I want to search my project source code
>
>
>> On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote:
>>
>>> Matthew Wilson wrote:
>>>> I have a lot of code -- millions of lines at this point, written
>>>> over the last 5 years.  Everything is in a bunch of nested folders.
>>>> At least once a week, I want to find some code that uses a few
>>>> modules,
>>>> so I have to launch a find + grep at the top of the tree and then
>>>> wait
>>>> for it to finish.
>>>> I wonder if I could store our source code in a postgresql table and
>>>> then use full text searching to index.  Then I hope I could run a
>>>> query
>>>> where I ask for all files that use modules X, Y, and Z.
>>>
>>> DBMSs are great tools for the right job, but IMO this is not the
>>> right job.  I can't see how a database engine, with all it's
>>> transactional overhead and many other layers, will ever beat a
>>> simple grep performance-wise.  I've used Eclipse for refactoring,
>>> but having done it once, I'm sticking with grep.
>>
>> This is exactly what cscope is good for.
>>
>> http://cscope.sourceforge.net/
>>
>> I've used it since the early 90's.  I do level 3 support for really
>> big companies.  If you are an emacs fan, its hooked in to it as well.
>>
>> You want to use the -q option.  If it is a million lines of code, its
>> going to take a while.  It pseudo-parses the code (some tricky
>> constructs will confuse it) and builds a very simple database file.
>> I think it uses Berkeley's DB file.  After that, finding all the
>> occurrences of foo is a few seconds.
>>
>> If you want to find just definitions (like where is foo defined),
>> then use ctags or etags.  There is exuberant ctags here:
>>
>> http://ctags.sourceforge.net/
>>
>> Perry Smith ( pedz@easesoftware.com )
>> Ease Software, Inc. ( http://www.easesoftware.com )
>>
>> Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX
>> systems
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Re: I want to search my project source code

From
Alvaro Herrera
Date:
Tom Lane wrote:

> I wouldn't recommend trying to use a standard FTS to index code:
> code is not a natural language and the kinds of searches you usually
> want to perform are a lot different.  As an example, I glimpse for
> "foo" when looking for references to a function foo, but "^foo"
> when seeking its definition (this relies on the coding conventions
> about function layout, of course).  An FTS doesn't think start-of-line
> is significant so it can't do that.

+1.  The nice thing about a tool that understands code is that you can
query it in ways that make sense to code.  For example I can search for
"all files that include foo.h" or "all callers of function bar" or "all
occurences of the symbol baz".  I use cscope for this, which integrates
nicely into my text editor (vim), and others have told me they use
kscope which puts it inside a nice GUI window, if you care about such
things.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"I would rather have GNU than GNOT."  (ccchips, lwn.net/Articles/37595/)