Thread: Grouping Too Closely
2 | 1 | 2 | 2 | 1
...
What I'd like to be able to do is select all records corresponding to the minimum value of seq1 for each value of seq2 corresponding to a given fkey (with a lower bound on the value of seq2).
My first attempt looked like this:
SELECT fkey, uid, seq2
FROM my_table
WHERE seq2 > 2
GROUP BY fkey, seq2, uid, seq1
HAVING seq1 = min( seq1 )
but this groups too closely to return the desired results.
My next attempt looked like this (where I use the shorthand for min in the subquery):
SELECT fkey, uid, seq2
This seems like it works, but it is abominably slow, running on the order of days across 1.5 million rows rather than the seconds (or preferably milliseconds) I'd prefer.
I have this annoying feeling that I'm overlooking a reasonably efficient in-between query.
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source: Open Your i™
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
I'm not sure if this is the best thing to do in all occasions, but I have found a great speed increase using unions over group by. select fkey, uid, seq2 from mytable where seq2 > 2 and seq1 = ( select min(seq1) from mytable); union select fkey, uid, seq2 from mytable where seq2 > 2 and seq1 = ( select min(seq1) from mytable); order by fkey, uid, seq2; the union clause with remove your duplicates for you as you were doing with your group by. using min on large tables can cause problems. you may want to do your select min(seq1) from mytable or even have a trigger function after insert/update that checks the new value against the current lowest stored in another table. not sure if this helps, but i hope it does. russ
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > What I'd like to be able to do is select all records corresponding to > the minimum value of seq1 for each value of seq2 corresponding to a > given fkey (with a lower bound on the value of seq2). I'm not sure how uid figures in, but would this do what you want?: SELECT fkey, uid, seq2, min(seq1) FROM my_table WHERE seq2 > 2 GROUP BY fkey, uid, seq2 ORDER BY 1,2,3; - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200506250019 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCvNuWvJuQZxSWSsgRAhPZAJ9ssJx1/9x1Ngr9i/fThpKkDwMq1wCg5TW0 fKHUuoUBrXx+0/5hRegpXF4= =obPq -----END PGP SIGNATURE-----
This doesn't give me quite what I'm looking for because I need there to be only one of each possible value of seq2 to be returned for each value of fkey. -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC Strategic Open Source: Open Your i™ http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Jun 24, 2005, at 11:22 PM, Greg Sabino Mullane wrote: >> What I'd like to be able to do is select all records corresponding to >> the minimum value of seq1 for each value of seq2 corresponding to a >> given fkey (with a lower bound on the value of seq2). > > I'm not sure how uid figures in, but would this do what you want?: > > SELECT fkey, uid, seq2, min(seq1) > FROM my_table > WHERE seq2 > 2 > GROUP BY fkey, uid, seq2 > ORDER BY 1,2,3;
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > This doesn't give me quite what I'm looking for because I need there > to be only one of each possible value of seq2 to be returned for each > value of fkey. Then perhaps just: SELECT fkey, seq2, min(seq1) FROM my_table WHERE seq2 > 2 GROUP BY fkey, seq2 ORDER BY 1,2,3; - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200506250237 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCvPwJvJuQZxSWSsgRAtcHAKDzl67Va8ABP4qyNpvFtWDpjmT/iwCg3D5J kJMkDaELNZREh5+7OyJ/FSU= =SChT -----END PGP SIGNATURE-----
Hello ! I'm using postgresql 8.0.1-r2 on gentoo linux.So, here are the problems : ************************************* int_array_aggregate crashes SELECT int_array_aggregate(id) FROM (SELECT id FROM shop.products LIMIT X) as foo; This one works fine if X <= 512 and crashes postgres if X > 512.ie. if the aggregate accumulates more than 512 values itcrashes. On another example query : SELECT int_array_aggregate(product_id),int_array_aggregate(category_id)FROM (SELECT * FROM shop.products_to_categories LIMITN) as foo; OK if N <= 8, crashes if N > 9 ************************************* integer[] intersection is slow int[] & int[] - returns intersection of arrays This is a useful function but it is very slow.Computing the intersection of two int[] of length 100 takes about 40 ms.inPython for instance, computing a similar intersection takes about 0.1 millisecond, including building both arrays, converting them to sets, and computing the set intersection.Maybe some information can be extracted from the Python souce code.*************************************ltree First of all congratulations for this extremely useful datatype and assorted indexes, it really rocks and makes SQL tree handling from nightmare to fun. It would be nice to have functions to :- know the length of a ltree (ie. number of elements)- access it like an array (ie.get element N).This is to be able to iterate over the elements and fetch each of them to get the list of rows which make up a path upto a given object in my tree. Also a function returning the comon prefix between two ltrees would be really useful. Thanks and sorry to come bother you !Regards,PF Caillaud
On Sun, 26 Jun 2005, PFC wrote: > > Hello ! > > I'm using postgresql 8.0.1-r2 on gentoo linux. > So, here are the problems : > > ************************************ > * int_array_aggregate crashes I don't remember this function :) > * ltree > > First of all congratulations for this extremely useful datatype and > assorted indexes, it really rocks and makes SQL tree handling from nightmare > to fun. > > It would be nice to have functions to : > - know the length of a ltree (ie. number of elements) int4 nlevel(ltree) - returns level of the node. > - access it like an array (ie. get element N). subltree, subpath > This is to be able to iterate over the elements and fetch each of > them to get the list of rows which make up a path upto a given object in my > tree. > > Also a function returning the comon prefix between two ltrees would > be really useful. ltree lca(ltree,ltree,...) (up to 8 arguments) ltree lca(ltree[]) Returns Lowest Common Ancestor (lca) Have you read README.ltree ? > > > Thanks and sorry to come bother you ! > Regards, > PF Caillaud > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Thanks for the quick response ! >> * int_array_aggregate crashes > > I don't remember this function :) You mean it's not part of intarray contrib ?I thought so !Sorry !Then who's responsible for maintenance of it ? > int4 nlevel(ltree) - returns level of the node. I had missed this one.The summer heat has burnt by brain it seems ;) > >> - access it like an array (ie. get element N). > > subltree, subpath I know, but I was hoping for something like :myltree[pos] instead of the usable, but a bit huge, and probably slower :ltree2text(subltree(myltree, pos-1,pos )); >> Also a function returning the comon prefix between two ltrees would be >> really useful. > > ltree lca(ltree,ltree,...) (up to 8 arguments) > ltree lca(ltree[]) > Returns Lowest Common Ancestor (lca) I know, too, but : test=> SELECT lca( '1.2.3'::ltree, '1.2.4'::ltree ); lca ----- 1.2 (1 ligne) test=> SELECT lca( '1.2.3'::ltree, '1.2'::ltree ); lca ----- 1 (1 ligne) In the case of the 'longest common prefix' I'd need the second SELECT to return the same as the first. This is to determine a 'parent' path that includes a bunch of other paths, in that case, if the parent is part of the original set, I get the parent's parent instead of what I want. > Have you read README.ltree ? Yes ! But I had missed nlevel() to my shame. thanks !
On Sun, Jun 26, 2005 at 06:48:42PM +0200, PFC wrote: > > SELECT int_array_aggregate(id) FROM (SELECT id FROM shop.products LIMIT X) > as foo; > > This one works fine if X <= 512 and crashes postgres if X > 512. > ie. if the aggregate accumulates more than 512 values it crashes. I don't get a crash in 8.0.3. I do see some items regarding contrib/intagg in the 8.0.3 and 8.0.2 Release Notes, so maybe this problem has already been fixed. Do you have an 8.0.3 installation you could test? > SELECT int_array_aggregate(product_id), > int_array_aggregate(category_id) > FROM (SELECT * FROM shop.products_to_categories LIMIT N) as foo; > > OK if N <= 8, crashes if N > 9 Works for me in 8.0.3. > int[] & int[] - returns intersection of arrays > > This is a useful function but it is very slow. > Computing the intersection of two int[] of length 100 takes about 40 > ms. How are you timing the operation? What kind of hardware are you using? I have a 500MHz Pentium III that computes the intersection of two 100-element arrays in about 0.5ms; in 40ms it can do arrays of about 1100 elements. I'm measuring this with a PL/pgSQL function that does the following: t1 := timeofday(); c := a & b; t2 := timeofday(); dt := t2 - t1; > It would be nice to have functions to : > - know the length of a ltree (ie. number of elements) Does nlevel() not do what you want? > - access it like an array (ie. get element N). Does subpath() or subltree() not do what you want? > Also a function returning the comon prefix between two ltrees would > be really useful. Does lca() not do what you want? -- Michael Fuhr http://www.fuhr.org/~mfuhr/