Thread: searchable book database

searchable book database

From
Miguel Vaz
Date:

Hi,

I need to make a database of books. Several specific subject books that are to be searchable. 

Is it viable to have the complete book text on a database and search inside it? Or should i consider keeping only its metadata (name, author, filename, etc) on the DB, keep the book file on the HD and use some sort of search algorithm on the file? If you agree on the second option, what would you guys suggest for text file searching? Its for a web project, so how could i go about doing this? (PHP, python...)

Thanks.

MV

Re: searchable book database

From
Sandeep Srinivasa
Date:
If you dont ever need to return the complete book text to a user (which means, you only need the book text for your search indexes only), then keep the text on file and use Apache Solr to index it.

regards
Sandeep

On Fri, Aug 20, 2010 at 1:05 AM, Miguel Vaz <pagongski@gmail.com> wrote:

Hi,

I need to make a database of books. Several specific subject books that are to be searchable. 

Is it viable to have the complete book text on a database and search inside it? Or should i consider keeping only its metadata (name, author, filename, etc) on the DB, keep the book file on the HD and use some sort of search algorithm on the file? If you agree on the second option, what would you guys suggest for text file searching? Its for a web project, so how could i go about doing this? (PHP, python...)

Thanks.

MV

Re: searchable book database

From
Dann Corbit
Date:

CLucene is one possibility:

http://sourceforge.net/projects/clucene/

 

Since you are asking in the PostgreSQL group, why not use the built-in full text search:

http://www.postgresql.org/docs/8.4/static/textsearch.html

 

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sandeep Srinivasa
Sent: Thursday, August 19, 2010 10:11 PM
To: Miguel Vaz
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] searchable book database

 

If you dont ever need to return the complete book text to a user (which means, you only need the book text for your search indexes only), then keep the text on file and use Apache Solr to index it.

 

regards

Sandeep

On Fri, Aug 20, 2010 at 1:05 AM, Miguel Vaz <pagongski@gmail.com> wrote:

 

Hi,

 

I need to make a database of books. Several specific subject books that are to be searchable. 

 

Is it viable to have the complete book text on a database and search inside it? Or should i consider keeping only its metadata (name, author, filename, etc) on the DB, keep the book file on the HD and use some sort of search algorithm on the file? If you agree on the second option, what would you guys suggest for text file searching? Its for a web project, so how could i go about doing this? (PHP, python...)

 

Thanks.

 

MV

 

Re: searchable book database

From
Eduardo
Date:
On Thu, 19 Aug 2010 20:35:50 +0100
Miguel Vaz <pagongski@gmail.com> wrote:

> Hi,
>
> I need to make a database of books. Several specific subject books
> that are to be searchable.
>
> Is it viable to have the complete book text on a database and search
> inside it? Or should i consider keeping only its metadata (name,
> author, filename, etc) on the DB, keep the book file on the HD and
> use some sort of search algorithm on the file? If you agree on the
> second option, what would you guys suggest for text file searching?
> Its for a web project, so how could i go about doing this? (PHP,
> python...)
>
> Thanks.
>
> MV

Don't knopw if that's what you need but you can setup a DocManager
site. Check it at
http://wiki.docmgr.org/index.php/DocMGR_-_Document_Management and see
if it fills your needs.

HTH

Re: searchable book database

From
Miguel Vaz
Date:

Thank you all for your replies. I already had read about Lucene in its general flavour and eventually caught up about it being used with zend framework, but it seems theres a lot more out there.

Will plan the second option. Have the books as files and build some search/index/hash/super-power-ninja engine to do all the hard work behind the scenes and only deliver the pretty bits to the users. 

This wont be merely a search and find project, as it will have the search, find, analyse/treat results, etc. and then display analysis.

Apache Solr..nice one, seems very interesting. Has an API also, that maybe will allow me to plug to the Flex side of the interface.

Again, than you all for the great information.

MV


On Fri, Aug 20, 2010 at 12:09 PM, Eduardo <emorras@xroff.net> wrote:
On Thu, 19 Aug 2010 20:35:50 +0100
Miguel Vaz <pagongski@gmail.com> wrote:

> Hi,
>
> I need to make a database of books. Several specific subject books
> that are to be searchable.
>
> Is it viable to have the complete book text on a database and search
> inside it? Or should i consider keeping only its metadata (name,
> author, filename, etc) on the DB, keep the book file on the HD and
> use some sort of search algorithm on the file? If you agree on the
> second option, what would you guys suggest for text file searching?
> Its for a web project, so how could i go about doing this? (PHP,
> python...)
>
> Thanks.
>
> MV

Don't knopw if that's what you need but you can setup a DocManager
site. Check it at
http://wiki.docmgr.org/index.php/DocMGR_-_Document_Management and see
if it fills your needs.

HTH

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: searchable book database

From
Eduardo
Date:
On Thu, 19 Aug 2010 20:35:50 +0100
Miguel Vaz <pagongski@gmail.com> wrote:

> Hi,
>
> I need to make a database of books. Several specific subject books
> that are to be searchable.
>
> Is it viable to have the complete book text on a database and search
> inside it? Or should i consider keeping only its metadata (name,
> author, filename, etc) on the DB, keep the book file on the HD and
> use some sort of search algorithm on the file? If you agree on the
> second option, what would you guys suggest for text file searching?
> Its for a web project, so how could i go about doing this? (PHP,
> python...)
>
> Thanks.
>
> MV

Don't knopw if that's what you need but you can setup a DocManager
site. Check it at
http://wiki.docmgr.org/index.php/DocMGR_-_Document_Management and see
if it fills your needs.

HTH

Re: searchable book database

From
Filip Rembiałkowski
Date:
You have plenty other FTS options: postgres has built-in FTS (tsearch), and if you need something more lightweight than Solr, you can use Sphinx.



2010/8/20 Miguel Vaz <pagongski@gmail.com>

Thank you all for your replies. I already had read about Lucene in its general flavour and eventually caught up about it being used with zend framework, but it seems theres a lot more out there.

Will plan the second option. Have the books as files and build some search/index/hash/super-power-ninja engine to do all the hard work behind the scenes and only deliver the pretty bits to the users. 

This wont be merely a search and find project, as it will have the search, find, analyse/treat results, etc. and then display analysis.

Apache Solr..nice one, seems very interesting. Has an API also, that maybe will allow me to plug to the Flex side of the interface.

Again, than you all for the great information.

MV


On Fri, Aug 20, 2010 at 12:09 PM, Eduardo <emorras@xroff.net> wrote:
On Thu, 19 Aug 2010 20:35:50 +0100
Miguel Vaz <pagongski@gmail.com> wrote:

> Hi,
>
> I need to make a database of books. Several specific subject books
> that are to be searchable.
>
> Is it viable to have the complete book text on a database and search
> inside it? Or should i consider keeping only its metadata (name,
> author, filename, etc) on the DB, keep the book file on the HD and
> use some sort of search algorithm on the file? If you agree on the
> second option, what would you guys suggest for text file searching?
> Its for a web project, so how could i go about doing this? (PHP,
> python...)
>
> Thanks.
>
> MV

Don't knopw if that's what you need but you can setup a DocManager
site. Check it at
http://wiki.docmgr.org/index.php/DocMGR_-_Document_Management and see
if it fills your needs.

HTH

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/