Re: Re: [GENERAL] Text search parser's treatment of URLs and emails - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Re: [GENERAL] Text search parser's treatment of URLs and emails
Date
Msg-id 10038.1286926290@sss.pgh.pa.us
Whole thread Raw
In response to Re: [GENERAL] Text search parser's treatment of URLs and emails  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Re: [GENERAL] Text search parser's treatment of URLs and emails
List pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
> [ sent to hackers where it belongs ]
> Thom Brown wrote:
>> It could be me being picky, but I don't regard parameters or page
>> fragments as part of the URL path.

> Wow, that is a tough one.  One the one hand, it seems nice to be able to
> split stuff out more, but on the other hand we would be making url_path
> less useful because people would need to piece things together to get
> the old behavior.  In fact to piece things together we would need to add
> '?' and '#' optionally, which seems kind of hard.  Perhaps we should
> keep url_path unchanged and add file_path that has your suggestion. 

This seems much of a piece with the existing proposal to allow
individual "words" of a URL to be reported separately:
https://commitfest.postgresql.org/action/patch_view?id=378

As I said in that thread, this could be done in a backwards-compatible
way using the tsearch parser's existing ability to report multiple
overlapping tokens out of the same piece of text.  But I'd like to see
one unified proposal and patch for this and Sushant's patch, not
independent hacks changing the behavior in the same area.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Review: Fix snapshot taking inconsistencies
Next
From: Marko Tiikkaja
Date:
Subject: Re: Review: Fix snapshot taking inconsistencies