Re: How to use full-text search URL parser to filter query results bydomain name? - Mailing list pgsql-general

From Jess Wren
Subject Re: How to use full-text search URL parser to filter query results bydomain name?
Date
Msg-id a438ad2b-4eb2-3631-b1a6-d92ee085817e@interference.cc
Whole thread Raw
In response to Re: How to use full-text search URL parser to filter query results by domain name?  (hamann.w@t-online.de)
List pgsql-general
On 4/6/19 11:42 PM, hamann.w@t-online.de wrote:
> Hi,
>
> I have no real idea about solving the complete problem, and would probably try
> something with a temp table first.
> For extracting the hostname from a url you could use
>
> select regex_replace('https?://(.*=)/.*', '\\1', url)
>
> instead of the fulltext parser
>
> Best regards
> Wolfgang

Thanks Wolfgang, I understand that I could implement a function using
regex for this, or just create an extra column/table to store the
hostname data. But there are other parts of the application where I'll
need to extract URL path, others where i'll want to extract scheme, etc.
Since postgres has builtin capabilities for parsing URLs to do alll of
this, I'd rather just use the builtin functions instead of writing them
myself using regex or having to generate a temp table each time I do a
lookup.

So although I'm aware that there are a variety of ways to extract
hostname (right now I'm just doing it in Python), I'm really most
interested in understanding how to use the builtin Postgres URL parsers
to extract host, url path, etc and how to appropriately create indexes
based on them. The documentation for the URL parser is very sparse, and
I can't find much info online either.




pgsql-general by date:

Previous
From: senor
Date:
Subject: Re: pg_upgrade --jobs
Next
From: senor
Date:
Subject: Re: pg_upgrade --jobs