Thread: getting the ranks out of items with SHARED
Hi, in postgresql you have several possibilites to get the rank of items. A thread earlier this year shows correlated subqueries (not very performant) and other tricks and techniques to solve the ranking problem: http://archives.postgresql.org/pgsql-general/2005-05/msg00157.php The possibility to use a SHARED variable in plperl can be another nice way to get a rank of items. ( good example showing SHARED in use is at http://www.varlena.com/varlena/GeneralBits/114.php) So i wrote two functions for my purpose: CREATE OR REPLACE function ranking(int4) RETURNS int4 LANGUAGE 'plperl' AS $$ my %this; $this{'punkte'} = shift; $this{'ranking'} = $_SHARED{'prev'}{'ranking'}; # defaults to 0 $this{'count'} = $_SHARED{'prev'}{'count'} + 1; # defaults to 1 $this{'ranking'} = $this{'count'} unless $this{'punkte'} == $_SHARED{'prev'}{'punkte'}; $_SHARED{'prev'} = \%this; return $this{'ranking'}; $$; CREATE OR REPLACE FUNCTION reset_ranking() RETURNS void LANGUAGE 'plperl' AS $$ $_SHARED{'prev'} = undef; $$; Nice Thing: the function drops rankings which other ranking solutions in the given thread can't. Like this: rank | points ------------- 1 | 10 2 | 9 2 | 9 4 | 8 5 | 7 It drops rank 3 because we have to entries for second rank. It would be even nice if you can write a genric ranking() function which takes anyarray as an argument, but as far as i know you can't pass an "anyarray" to a plperl function, right? Now i can do the following in plpsql Function which updates a caching table for me and it works fine: PERFORM reset_ranking(); CREATE TEMP TABLE ranking AS SELECT *, ranking(r1.gc_gesamtpunkte) AS rank FROM ( SELECT mg_name, gc_gesamtpunkte FROM temp_gc ORDER BY gc_gesamtpunkte DESC, mg_name ASC ) AS r1 ; EXECUTE ' UPDATE temp_gc SET gc_rank = ranking.rank FROM ranking WHERE temp_gc.mg_name = ranking.mg_name; '; DROP TABLE ranking; Problems arrise when you try to do the select and update step together without any temporary table in between: PERFORM reset_ranking(); UPDATE temp_gc SET gc_rank = ranking.rank FROM ( SELECT *, ranking(r1.gc_gesamtpunkte) AS rank FROM ( SELECT mg_name, gc_gesamtpunkte FROM temp_gc ORDER BY gc_gesamtpunkte DESC, mg_name ASC ) AS r1 ) AS ranking WHERE temp_gc.mg_name = ranking.mg_name; '; I have a guess, what happens here: The order of the subselect statement is dropped by the optimizer because the optimizer doesn't see the "side-effect" of the ranking function. that's ok because using such functions isn't SQLish, i guess. Is there a way to FORCE the optimizer to keep things orders like the sql statement author wanted it? kind regards, janning
Janning Vygen <vygen@gmx.de> writes: > I have a guess, what happens here: The order of the subselect statement is > dropped by the optimizer because the optimizer doesn't see the "side-effect" > of the ranking function. That guess is wrong. I think the problem is that you are trying to update multiple rows in the same statement, which would require a "reset ranking" between each row, which this approach doesn't provide for. The whole thing looks mighty fragile in other ways; anything involving a single global variable isn't going to work nicely in very many cases. Consider casting your solution as an aggregate instead... regards, tom lane
When I was trying to connect my databse with jdbc, I got the following error message: org.postgresql.util.PSQLException: Connection rejected: FATAL: no pg_hba.conf entry for host "mydomain", user "xxxx", database "myDB", SSL off. When I run in dos console "psql myDB..." it works fine. My jdbc code is at a different at different domain from the database machine. Thanks, __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250
Tony Smith wrote: > When I was trying to connect my databse with jdbc, I > got the following error message: > > org.postgresql.util.PSQLException: Connection > rejected: FATAL: no pg_hba.conf entry for host > "mydomain", user "xxxx", database "myDB", SSL off. > You need to setup your pg_hba.conf to allow remote connections to the database for the user. http://www.postgresql.org/docs/7.4/static/client-authentication.html Sincerely, Joshua D. Drake > When I run in dos console "psql myDB..." it works > fine. > > My jdbc code is at a different at different domain > from the database machine. > > Thanks, > > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail - Find what you need with new enhanced search. > http://info.mail.yahoo.com/mail_250 > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Hi, Am Mittwoch, 13. Juli 2005 00:03 schrieb Tom Lane: > Janning Vygen <vygen@gmx.de> writes: > > I have a guess, what happens here: The order of the subselect statement > > is dropped by the optimizer because the optimizer doesn't see the > > "side-effect" of the ranking function. > > That guess is wrong. ah, and i already thought to be already on a higher level of understanding postgresql... > I think the problem is that you are trying to update multiple rows in > the same statement, which would require a "reset ranking" between each > row, which this approach doesn't provide for. no thats not the point, i guess (which might be wrong again) but i still don't understand why it doesn't work: this is my important query named *Q* := SELECT *, ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank FROM ( SELECT mg_name, gc_gesamtpunkte, gc_gesamtsiege FROM temp_gc ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC ) AS r1 this way it works: CREATE TEMP TABLE ranking AS *Q*; EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;'; and this way it doesn't: UPDATE temp_gc SET gc_rank = ranking.rank FROM (*Q*) ranking WHERE temp_gc.mg_name = ranking.mg_name; ; i want to update multiple rows, but the all data in table temp_gc doesnt need a reset of the ranking. > The whole thing looks mighty fragile in other ways; anything involving a > single global variable isn't going to work nicely in very many cases. > Consider casting your solution as an aggregate instead... I know that this is not the best solution but it is the fastest. A corrolated subquery with aggregates takes ages in opposite to the ranking function solution. But by the time of writing i have a new problem with my solution posted today with subject "strange error with temp table: pg_type_typname_nsp_index" kind regards, janning
Janning Vygen <vygen@gmx.de> writes: > this way it works: > CREATE TEMP TABLE ranking AS *Q*; > EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank > FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;'; > and this way it doesn't: > UPDATE temp_gc > SET gc_rank = ranking.rank > FROM (*Q*) > ranking > WHERE temp_gc.mg_name = ranking.mg_name; It's difficult to be sure without looking at EXPLAIN output, but I would guess that the second query is being done with a plan that involves multiple scans of "*Q*", and that's confusing your function. regards, tom lane
Janning Vygen <vygen@planwerk6.de> writes: > and this is the combined statement: > UPDATE temp_gc > SET gc_rank = ranking.rank > FROM ( > SELECT > *, > ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank > FROM ( > SELECT > mg_name, > gc_gesamtpunkte, > gc_gesamtsiege > FROM temp_gc > ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC > ) AS r1 > ) AS ranking > WHERE temp_gc.mg_name = ranking.mg_name; > to me it looks like the call to ranking() is moved from the subquery plan > "upwards". but i really can't interpret this explain output regarding to > "where" the ranking funcion is called. Yeah, I think you are exactly right. The EXPLAIN output doesn't show targetlist contents (EXPLAIN VERBOSE would, but not very readably :-(), but the small difference in the estimated costs seems to indicate that the function evaluation is not happening at the level of the "subquery scan" plan node in the combined plan. This is consistent with the planner's behavior in general. It will honor the "order by" in the sense that the output of the "r1" subquery is delivered to the upper plan level in that order, but it feels no compunction about flattening the "ranking" subquery into the outer query, and after that you cannot be certain about the order in which the executions of the ranking() function happen. What you need is to prevent the flattening of the intermediate subquery. The current favorite technique is to insert a dummy OFFSET clause: UPDATE temp_gc SET gc_rank = ranking.rank FROM ( SELECT *, ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank FROM ( SELECT mg_name, gc_gesamtpunkte, gc_gesamtsiege FROM temp_gc ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC ) AS r1 OFFSET 0 ) AS ranking WHERE temp_gc.mg_name = ranking.mg_name; although you could also do it by inserting an ORDER BY at that level. BTW, there's been some discussion of preventing flattening of subqueries whose SELECT lists contain volatile functions. If we did that then declaring ranking() as volatile would be enough to avoid the problem. I've been hesitant to make the change because I'm concerned about the probable loss of optimization in cases where the function is labeled volatile by default, merely because the author didn't think about how to mark it. And in any case "volatile" doesn't really describe the issue with your function... > my problem is that getting a rank out of items is very expensive with > aggregate functions, I'm unconvinced --- I don't see a reason that you can't implement it exactly the same way as an aggregate. The only difference is that the state value is an aggregate state value instead of a global variable. There might be some extra palloc overhead, but nothing worse. regards, tom lane
Am Mittwoch, 13. Juli 2005 15:35 schrieb Tom Lane: > Janning Vygen <vygen@gmx.de> writes: > > this way it works: > > > > CREATE TEMP TABLE ranking AS *Q*; > > EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank > > FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;'; > > > > and this way it doesn't: > > > > UPDATE temp_gc > > SET gc_rank = ranking.rank > > FROM (*Q*) > > ranking > > WHERE temp_gc.mg_name = ranking.mg_name; > > It's difficult to be sure without looking at EXPLAIN output, but I would > guess that the second query is being done with a plan that involves > multiple scans of "*Q*", and that's confusing your function. > > regards, tom lane here you are. both versions with explain output first version creates temp table (explain no 1) and updates afterwards (explain no. 2). second version combines both (explain no.3 ) [whats the best way to post explain output? My mailclient wraps the output. i hope it is still readable] no 1 ***** first create temp table ***** QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Subquery Scan r1 (cost=58.54..70.99 rows=830 width=44) (actual time=0.186..0.403 rows=7 loops=1) -> Sort (cost=58.54..60.62 rows=830 width=44) (actual time=0.106..0.137 rows=7 loops=1) Sort Key: gc_gesamtpunkte, gc_gesamtsiege, mg_name -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830 width=44) (actual time=0.015..0.052 rows=7 loops=1) Total runtime: 0.470 ms (5 Zeilen) no 2 **** update statement references temp table **** QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Merge Join (cost=127.70..193.49 rows=4109 width=70) (actual time=0.221..0.404 rows=7 loops=1) Merge Cond: ("outer".mg_name = "inner".mg_name) -> Sort (cost=58.54..60.62 rows=830 width=66) (actual time=0.110..0.142 rows=7 loops=1) Sort Key: temp_gc.mg_name -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830 width=66) (actual time=0.013..0.055 rows=7 loops=1) -> Sort (cost=69.16..71.63 rows=990 width=36) (actual time=0.089..0.119 rows=7 loops=1) Sort Key: ranking.mg_name -> Seq Scan on ranking (cost=0.00..19.90 rows=990 width=36) (actual time=0.006..0.042 rows=7 loops=1) Total runtime: 0.525 ms (9 Zeilen) no 3 **** combined update statement **** QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=167.70..232.14 rows=3445 width=78) (actual time=0.455..0.774 rows=7 loops=1) Merge Cond: ("outer".mg_name = "inner".mg_name) -> Sort (cost=58.54..60.62 rows=830 width=66) (actual time=0.111..0.142 rows=7 loops=1) Sort Key: temp_gc.mg_name -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830 width=66) (actual time=0.016..0.057 rows=7 loops=1) -> Sort (cost=109.16..111.23 rows=830 width=44) (actual time=0.248..0.281 rows=7 loops=1) Sort Key: r1.mg_name -> Subquery Scan r1 (cost=58.54..68.92 rows=830 width=44) (actual time=0.102..0.201 rows=7 loops=1) -> Sort (cost=58.54..60.62 rows=830 width=44) (actual time=0.092..0.125 rows=7 loops=1) Sort Key: gc_gesamtpunkte, gc_gesamtsiege, mg_name -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830 width=44) (actual time=0.008..0.045 rows=7 loops=1) Total runtime: 0.886 ms (12 Zeilen) and this is the combined statement: UPDATE temp_gc SET gc_rank = ranking.rank FROM ( SELECT *, ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank FROM ( SELECT mg_name, gc_gesamtpunkte, gc_gesamtsiege FROM temp_gc ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC ) AS r1 ) AS ranking WHERE temp_gc.mg_name = ranking.mg_name; to me it looks like the call to ranking() is moved from the subquery plan "upwards". but i really can't interpret this explain output regarding to "where" the ranking funcion is called. my problem is that getting a rank out of items is very expensive with aggregate functions, so i try to do a trick here which is not very relational indeed and this of course can lead to trouble. Maybe i have to rethink the whole stuff. (because i get some temp table errors anyway as mentioned in the other thread) Kind regards, Janning Vygen