Home > mailing lists

tsearch2 dictionary for statute cites - Mailing list pgsql-general

From	Kevin Grittner
Subject	tsearch2 dictionary for statute cites
Date	March 10, 2009 21:47:43
Msg-id	49B69999.EE98.0025.0@wicourts.gov Whole thread Raw
Responses	Re: tsearch2 dictionary for statute cites
List	pgsql-general

Tree view

I broached this topic last year[1], but the project got tabled until
now; so I raise it again.  We want to be able to search text
(extracted from character-based PDF files) which will contain legal
terms and statute cites, and we want to be able to do tsearch2
searches (under 8.3.recent).  It's clear enough how to create a
dictionary to gracefully handle the legal terms, but I'm less sure
about the statute cites.

I got one response[2], which mentioned a prefix search in the 8.4
release, and provided a link to a perl regular expression based
dictionary.  I'm wondering if anyone has feedback one either of these
techniques, and whether they might work for our needs.  I'm not sure I
adequately described our needs, so I'll fill that out a little more.

People are likely to search for statute cites, which tend to have a
hierarchical form.  I'm not sure the prefix approach will work for
this.  For example, there is a section 939.64 in the state statutes
dealing with commission of a crime while wearing a bulletproof
garment.  If someone searches for that, they should find subsections
like 939.64(1) or 939.64(2) but not different sections which start
with the same characters like 939.641 (the section on concealing
identity) or 939.645 (the section on hate crimes).  A search for
chapter 939 should return any of the above.

Of course, we want someone to be able to search on 939.64, 939.641,
and 939.645 and get documents which reference all of the above (i.e.,
to look for a document referring to a hate crime committed while
concealing identity and wearing a bulletproof garment).

Suggestions welcome on how to handle this user requirement.

-Kevin

[1] http://archives.postgresql.org/pgsql-admin/2008-06/msg00033.php
[2] http://archives.postgresql.org/pgsql-admin/2008-06/msg00034.php

pgsql-general by date:

From: Gerd König
Date: 10 March 2009, 21:36:11
Subject: panic: index siblings mismatch

From: Scott Marlowe
Date: 10 March 2009, 23:22:57
Subject: Re: panic: index siblings mismatch

tsearch2 dictionary for statute cites - Mailing list pgsql-general

Previous

Next