Thread: DISTINCT to get distinct *substrings*?

DISTINCT to get distinct *substrings*?

From
Christoph Pingel
Date:
Hello to the list,

here's an SQL question, I hope it's not off topic. From a list of URLs I want to get only the distinct values of the *web sites* these URLs belong to, that is everything before and including the 3rd slash, and I think this should be possible within the DB. I would like to say something like

SELECT substring(attribute from '^http://[^/]*/') from pg_atp where attribute like 'http://%'

(which works) but get only the distinct values. SELECT DISTINCT ON substring.. doesn't work. Probably I haven't understood the semantics of the DISTINCT keyword. Can anybody help?

thanks in advance
Christoph

Re: DISTINCT to get distinct *substrings*?

From
"Nikolay Samokhvalov"
Date:
SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
where attribute like 'http://%';

w/o DISTINCT there should be duplicates (if any)

don't use "DISTINCT ON" at all, it's evil :-) (why?
http://chernowiki.ru/index.php?node=38#A13)

On 8/8/06, Christoph Pingel <ch.pingel@web.de> wrote:
>
> Hello to the list,
>
> here's an SQL question, I hope it's not off topic. From a list of URLs I
> want to get only the distinct values of the *web sites* these URLs belong
> to, that is everything before and including the 3rd slash, and I think this
> should be possible within the DB. I would like to say something like
>
> SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
> attribute like 'http://%'
>
> (which works) but get only the distinct values. SELECT DISTINCT ON
> substring.. doesn't work. Probably I haven't understood the semantics of the
> DISTINCT keyword. Can anybody help?
>
> thanks in advance
> Christoph
>
>


--
Best regards,
Nikolay

Re: DISTINCT to get distinct *substrings*?

From
Ben
Date:
DISTINCT ON is extremely useful when you know what you're doing. It's
postgres' version of oracle's first_value analytical function, and when
you need it, nothing else really suffices.

On Tue, 8 Aug 2006, Nikolay Samokhvalov wrote:

> SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
> where attribute like 'http://%';
>
> w/o DISTINCT there should be duplicates (if any)
>
> don't use "DISTINCT ON" at all, it's evil :-) (why?
> http://chernowiki.ru/index.php?node=38#A13)
>
> On 8/8/06, Christoph Pingel <ch.pingel@web.de> wrote:
>>
>> Hello to the list,
>>
>> here's an SQL question, I hope it's not off topic. From a list of URLs I
>> want to get only the distinct values of the *web sites* these URLs belong
>> to, that is everything before and including the 3rd slash, and I think this
>> should be possible within the DB. I would like to say something like
>>
>> SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
>> attribute like 'http://%'
>>
>> (which works) but get only the distinct values. SELECT DISTINCT ON
>> substring.. doesn't work. Probably I haven't understood the semantics of
>> the
>> DISTINCT keyword. Can anybody help?
>>
>> thanks in advance
>> Christoph
>>
>>
>
>
> --
> Best regards,
> Nikolay
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>

Re: DISTINCT to get distinct *substrings*?

From
Christoph Pingel
Date:
Am 08.08.2006 um 19:49 schrieb Nikolay Samokhvalov:

> SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
> where attribute like 'http://%';
>
> w/o DISTINCT there should be duplicates (if any)
>
> don't use "DISTINCT ON" at all, it's evil :-) (why?
> http://chernowiki.ru/index.php?node=38#A13)

Thanks for the good advice! From reading this, it seems to be a
*really* bad thing. And I didn't get it from the official
documentation. :-)

ok, SELECT DISTINCT works, and it seems that the results are ordered
(by the substring) - is this the default behaviour or just by chance
(and probably version dependent)?

best regards,
Christoph

Re: DISTINCT to get distinct *substrings*?

From
Christoph Pingel
Date:
Thanks for the input, I think I get this now. In my case, the query

SELECT DISTINCT ON (substring(attribute from '^http://[^/]*/')) attribute from pg_atp where attribute like 'http://%' 

doesn't get me just the root of the URL, but the whole URL - but only for the first row for each individual root. While 

SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp where attribute like 'http://%' 

does what I first intended - get a list of all (distinct) root URLs. Wieder was gelernt. (Learnt something again. :-)

best regards,
Christoph



Am 08.08.2006 um 20:36 schrieb Ben:

DISTINCT ON is extremely useful when you know what you're doing. It's postgres' version of oracle's first_value analytical function, and when you need it, nothing else really suffices.

On Tue, 8 Aug 2006, Nikolay Samokhvalov wrote:

SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
where attribute like 'http://%';

w/o DISTINCT there should be duplicates (if any)

don't use "DISTINCT ON" at all, it's evil :-) (why?

On 8/8/06, Christoph Pingel <ch.pingel@web.de> wrote:
Hello to the list,
here's an SQL question, I hope it's not off topic. From a list of URLs I
want to get only the distinct values of the *web sites* these URLs belong
to, that is everything before and including the 3rd slash, and I think this
should be possible within the DB. I would like to say something like
SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
attribute like 'http://%'
(which works) but get only the distinct values. SELECT DISTINCT ON
substring.. doesn't work. Probably I haven't understood the semantics of the
DISTINCT keyword. Can anybody help?
thanks in advance
Christoph


-- 
Best regards,
Nikolay

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to majordomo@postgresql.org so that your
     message can get through to the mailing list cleanly


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: DISTINCT to get distinct *substrings*?

From
Tom Lane
Date:
Christoph Pingel <ch.pingel@web.de> writes:
> Am 08.08.2006 um 19:49 schrieb Nikolay Samokhvalov:
>> don't use "DISTINCT ON" at all, it's evil :-) (why?
>> http://chernowiki.ru/index.php?node=38#A13

> Thanks for the good advice! From reading this, it seems to be a
> *really* bad thing. And I didn't get it from the official
> documentation. :-)

That page is complaining about DISTINCT ON as it was defined in 1999.
It's a lot harder to shoot yourself in the foot now:

regression=# select distinct on (ten) hundred from tenk1 order by unique2;
ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions

I don't deny that it's nonstandard and pretty ugly, but sometimes it's
just really hard to solve a problem any other way.

            regards, tom lane