Thread: Grouping Too Closely

Grouping Too Closely

From
"Thomas F. O'Connell"
Date:
I have a table that looks like this:

CREATE TABLE my_table (
    pkey serial PRIMARY KEY,
    fkey int NOT NULL REFERENCES my_other_table( pkey ),
    uid int NOT NULL REFERENCES user( pkey ),
    seq1 int,
    seq2 int
);

Basically, for each fkey that exists in my_table, there is a sequence represented by seq1, which covers every record corresponding to a given fkey. Then there is a subset of records covered by seq2, which increments over the course of a given fkey, but might span multiple records.

E.g.,

pkey | fkey | uid | seq1 | seq2
---------------------------------------
1        | 1    | 1    | 1        | 1

2        | 1    | 2    | 2        | 1

...


What I'd like to be able to do is select all records corresponding to the minimum value of seq1 for each value of seq2 corresponding to a given fkey (with a lower bound on the value of seq2).


My first attempt looked like this:


SELECT fkey, uid, seq2

FROM my_table

WHERE seq2 > 2

GROUP BY fkey, seq2, uid, seq1

HAVING seq1  = min( seq1 )


but this groups too closely to return the desired results.


My next attempt looked like this (where I use the shorthand for min in the subquery):


SELECT fkey, uid, seq2

FROM my_table AS mt1
WHERE mt1.seq2 > 2
AND ( mt1.uid, hh1.seq1 ) IN (
        SELECT mt2.player_id, mt2.order_no
        FROM my_table AS mt2
        WHERE mt2.fkey = mt1.fkey
        AND mt2.seq2 = mt1.seq2
        GROUP BY mt2.seq1, mt2.uid
        ORDER BY mt2.seq1 ASC
        LIMIT 1
)
GROUP BY mt1.holdem_game_id, mt1.holdem_round_type_id, mt1.player_id


This seems like it works, but it is abominably slow, running on the order of days across 1.5 million rows rather than the seconds (or preferably milliseconds) I'd prefer.


I have this annoying feeling that I'm overlooking a reasonably efficient in-between query.


--

Thomas F. O'Connell

Co-Founder, Information Architect

Sitening, LLC


Strategic Open Source: Open Your i™


http://www.sitening.com/

110 30th Avenue North, Suite 6

Nashville, TN 37203-6320

615-260-0005


Re: Grouping Too Closely

From
"Russell Simpkins"
Date:
I'm not sure if this is the best thing to do in all occasions, but I have 
found a great speed increase using unions over group by.

select fkey, uid, seq2 from mytable where seq2 > 2 and seq1 = ( select 
min(seq1) from mytable);
union
select fkey, uid, seq2 from mytable where seq2 > 2 and seq1 = ( select 
min(seq1) from mytable);
order by fkey, uid, seq2;

the union clause with remove your duplicates for you as you were doing with 
your group by.

using min on large tables can cause problems. you may want to do your select 
min(seq1) from mytable or even have a trigger function after insert/update 
that checks the new value against the current lowest stored in another 
table.

not sure if this helps, but i hope it does.

russ




Re: Grouping Too Closely

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> What I'd like to be able to do is select all records corresponding to
> the minimum value of seq1 for each value of seq2 corresponding to a
> given fkey (with a lower bound on the value of seq2).

I'm not sure how uid figures in, but would this do what you want?:

SELECT fkey, uid, seq2, min(seq1)
FROM my_table
WHERE seq2 > 2
GROUP BY fkey, uid, seq2
ORDER BY 1,2,3;

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200506250019
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFCvNuWvJuQZxSWSsgRAhPZAJ9ssJx1/9x1Ngr9i/fThpKkDwMq1wCg5TW0
fKHUuoUBrXx+0/5hRegpXF4=
=obPq
-----END PGP SIGNATURE-----




Re: Grouping Too Closely

From
"Thomas F. O'Connell"
Date:
This doesn't give me quite what I'm looking for because I need there
to be only one of each possible value of seq2 to be returned for each
value of fkey.

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jun 24, 2005, at 11:22 PM, Greg Sabino Mullane wrote:

>> What I'd like to be able to do is select all records corresponding to
>> the minimum value of seq1 for each value of seq2 corresponding to a
>> given fkey (with a lower bound on the value of seq2).
>
> I'm not sure how uid figures in, but would this do what you want?:
>
> SELECT fkey, uid, seq2, min(seq1)
> FROM my_table
> WHERE seq2 > 2
> GROUP BY fkey, uid, seq2
> ORDER BY 1,2,3;


Re: Grouping Too Closely

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> This doesn't give me quite what I'm looking for because I need there
> to be only one of each possible value of seq2 to be returned for each
> value of fkey.

Then perhaps just:

SELECT fkey, seq2, min(seq1)
FROM my_table
WHERE seq2 > 2
GROUP BY fkey, seq2
ORDER BY 1,2,3;

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200506250237
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCvPwJvJuQZxSWSsgRAtcHAKDzl67Va8ABP4qyNpvFtWDpjmT/iwCg3D5J
kJMkDaELNZREh5+7OyJ/FSU=
=SChT
-----END PGP SIGNATURE-----




assorted problems with intarray and other GiST contribs.

From
PFC
Date:
Hello !
I'm using postgresql 8.0.1-r2 on gentoo linux.So, here are the problems :
************************************* int_array_aggregate crashes

SELECT int_array_aggregate(id) FROM (SELECT id FROM shop.products LIMIT X)  
as foo;
This one works fine if X <= 512 and crashes postgres if X > 512.ie. if the aggregate accumulates more than 512 values
itcrashes.
 
On another example query :

SELECT int_array_aggregate(product_id),int_array_aggregate(category_id)FROM (SELECT * FROM shop.products_to_categories
LIMITN) as foo;
 
OK if N <= 8, crashes if N > 9
************************************* integer[] intersection is slow
int[] & int[]   - returns intersection of arrays
This is a useful function but it is very slow.Computing the intersection of two int[] of length 100 takes about 40
ms.inPython for instance, computing a similar intersection takes about 0.1  
 
millisecond, including building both arrays, converting them to sets, and  
computing the set intersection.Maybe some information can be extracted from the Python souce
code.*************************************ltree
 
First of all congratulations for this extremely useful datatype and  
assorted indexes, it really rocks and makes SQL tree handling from  
nightmare to fun.
It would be nice to have functions to :- know the length of a ltree (ie. number of elements)- access it like an array
(ie.get element N).This is to be able to iterate over the elements and fetch each of them to  
 
get the list of rows which make up a path upto a given object in my tree.
Also a function returning the comon prefix between two ltrees would be  
really useful.

Thanks and sorry to come bother you !Regards,PF Caillaud




Re: assorted problems with intarray and other GiST contribs.

From
Oleg Bartunov
Date:
On Sun, 26 Jun 2005, PFC wrote:

>
>     Hello !
>
>     I'm using postgresql 8.0.1-r2 on gentoo linux.
>     So, here are the problems :
>
>     ************************************
>     * int_array_aggregate crashes

I don't remember this function :)

>     * ltree
>
>     First of all congratulations for this extremely useful datatype and 
> assorted indexes, it really rocks and makes SQL tree handling from nightmare 
> to fun.
>
>     It would be nice to have functions to :
>     - know the length of a ltree (ie. number of elements)
       int4 nlevel(ltree) - returns level of the node.

>     - access it like an array (ie. get element N).
    subltree, subpath

>     This is to be able to iterate over the elements and fetch each of 
> them to get the list of rows which make up a path upto a given object in my 
> tree.
>
>     Also a function returning the comon prefix between two ltrees would 
> be really useful.
    ltree lca(ltree,ltree,...) (up to 8 arguments)    ltree lca(ltree[])    Returns Lowest Common Ancestor (lca)


Have you read README.ltree ?

>
>
>     Thanks and sorry to come bother you !
>     Regards,
>     PF Caillaud
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: assorted problems with intarray and other GiST contribs.

From
PFC
Date:
Thanks for the quick response !

>>     * int_array_aggregate crashes
>
> I don't remember this function :)
You mean it's not part of intarray contrib ?I thought so !Sorry !Then who's responsible for maintenance of it ?

>         int4 nlevel(ltree) - returns level of the node.
I had missed this one.The summer heat has burnt by brain it seems ;)
>
>>     - access it like an array (ie. get element N).
>
>      subltree, subpath
I know, but I was hoping for something like :myltree[pos]
instead of the usable, but a bit huge, and probably slower :ltree2text(subltree(myltree, pos-1,pos ));

>>     Also a function returning the comon prefix between two ltrees would be  
>> really useful.
>
>      ltree lca(ltree,ltree,...) (up to 8 arguments)
>      ltree lca(ltree[])
>      Returns Lowest Common Ancestor (lca)
I know, too, but :

test=> SELECT lca( '1.2.3'::ltree, '1.2.4'::ltree ); lca
----- 1.2
(1 ligne)

test=> SELECT lca( '1.2.3'::ltree, '1.2'::ltree ); lca
----- 1
(1 ligne)
In the case of the 'longest common prefix' I'd need the second SELECT to  
return the same as the first. This is to determine a 'parent' path that  
includes a bunch of other paths, in that case, if the parent is part of  
the original set, I get the parent's parent instead of what I want.

> Have you read README.ltree ?
Yes ! But I had missed nlevel() to my shame.
thanks !



Re: assorted problems with intarray and other GiST contribs.

From
Michael Fuhr
Date:
On Sun, Jun 26, 2005 at 06:48:42PM +0200, PFC wrote:
> 
> SELECT int_array_aggregate(id) FROM (SELECT id FROM shop.products LIMIT X)  
> as foo;
> 
>     This one works fine if X <= 512 and crashes postgres if X > 512.
>     ie. if the aggregate accumulates more than 512 values it crashes.

I don't get a crash in 8.0.3.  I do see some items regarding
contrib/intagg in the 8.0.3 and 8.0.2 Release Notes, so maybe this
problem has already been fixed.  Do you have an 8.0.3 installation
you could test?

> SELECT int_array_aggregate(product_id),
>     int_array_aggregate(category_id)
> FROM (SELECT * FROM shop.products_to_categories LIMIT N) as foo;
> 
>     OK if N <= 8, crashes if N > 9

Works for me in 8.0.3.

>     int[] & int[]   - returns intersection of arrays
> 
>     This is a useful function but it is very slow.
>     Computing the intersection of two int[] of length 100 takes about 40 
>     ms.

How are you timing the operation?  What kind of hardware are you
using?  I have a 500MHz Pentium III that computes the intersection
of two 100-element arrays in about 0.5ms; in 40ms it can do arrays
of about 1100 elements.  I'm measuring this with a PL/pgSQL function
that does the following:
   t1 := timeofday();   c := a & b;   t2 := timeofday();   dt := t2 - t1;

>     It would be nice to have functions to :
>     - know the length of a ltree (ie. number of elements)

Does nlevel() not do what you want?

>     - access it like an array (ie. get element N).

Does subpath() or subltree() not do what you want?

>     Also a function returning the comon prefix between two ltrees would 
>     be  really useful.

Does lca() not do what you want?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/