Re: plperl (7.5) - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: plperl (7.5) |
Date | |
Msg-id | Pine.GSO.4.58.0407121034070.21749@ra.sai.msu.su Whole thread Raw |
In response to | Re: plperl (7.5) (elein <elein@varlena.com>) |
List | pgsql-hackers |
Several years ago me and Teodor have proposed partial sort for top-k ranking result. We have even submitted a very crude patch, but it was rejected. We use partial sorting extensively in our external application and found it's very useful. There are many papers recently published about top-k problem in database theory, but I don't know if any database has such optimization. Oleg On Sun, 11 Jul 2004, elein wrote: > I'll try these in plperl when I get a chance, but there is still > a scalability problem. > > There was some work done at UCB a while back (post-postgres) > to try to speed up queries by making the first n rows available > quickly. This is based on the googlish idea that people want > results fast and sometimes only want to see the first few results. > > (I saw the return n rows fast before I heard of the google algorithm, > by some years, though.) (It may have been part of hellerstein, et al's > estimated aggregrate capabilities. It was around that time anyway.) > > If any function is going to return a gazillion rows > there ought to be a way of doing that block by block. > > When a query uses a cursor, is the entire result set always > materialized? If so, perhaps this is just the way it is > for postgres. If not, in the future perhaps there can be a > way to do this. I do not know that part of the source, however. > And I suspect any sorted query would need to be materialized > anyway. > > The other piece of the scalability/speed problem is copying > the data from the materialized result set into the return > structure. Theoretically this requires twice the memory of > the result set. Is that actually true in reality? > > Obviously these comments are not for 7.5 nor are they > make or break issues. > > --elein > > > On Sun, Jul 11, 2004 at 11:17:19AM -0400, Tom Lane wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > On Sat, Jul 10, 2004 at 09:18:28PM -0700, elein wrote: > > >> The new plperl returns sets by having > > >> the function return an array. > > > > > I think RETURN NEXT does the same thing anyway ... they just store > > > tuples in a Tuplestore and then the whole thing is returned. So the > > > function actually doesn't return until the whole function is done. > > > > However, it's likely that the tuplestore infrastructure can deal > > comfortably with sets far larger than a Perl array can. (For one thing, > > it will swap tuples out to a temp file on disk once the set size exceeds > > work_mem.) I think elein's concern is justified, unless someone can > > produce a test case showing that plperl actually performs OK with a > > large result set. > > > > As a simple test for plpgsql's speed with such things, I tried > > > > create function seq(int) returns setof int as ' > > begin > > for i in 1..$1 loop > > return next i; > > end loop; > > return; > > end' language plpgsql; > > > > regression=# \timing > > Timing is on. > > regression=# select count(*) from seq(100000); > > count > > -------- > > 100000 > > (1 row) > > > > Time: 396.524 ms > > regression=# select count(*) from seq(1000000); > > count > > --------- > > 1000000 > > (1 row) > > > > Time: 3615.115 ms > > regression=# select count(*) from seq(10000000); > > count > > ---------- > > 10000000 > > (1 row) > > > > Time: 40356.972 ms > > > > My Perl is too rusty to immediately whip out the equivalent incantation > > in plperl; would someone like to compare the timings on their own machine? > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 7: don't forget to increase your free space map settings > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-hackers by date: