Re: plperl (7.5) - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: plperl (7.5)
Date
Msg-id Pine.GSO.4.58.0407121034070.21749@ra.sai.msu.su
Whole thread Raw
In response to Re: plperl (7.5)  (elein <elein@varlena.com>)
List pgsql-hackers
Several years ago me and Teodor have proposed partial sort for
top-k ranking result. We have even submitted a very crude patch, but it was
rejected. We use partial sorting extensively in our external application
and found it's very useful. There are many papers recently published about
top-k problem in database theory, but I don't know if any database has
such optimization.
Oleg
On Sun, 11 Jul 2004, elein wrote:

> I'll try these in plperl when I get a chance, but there is still
> a scalability problem.
>
> There was some work done at UCB a while back (post-postgres)
> to try to speed up queries by making the first n rows available
> quickly.  This is based on the googlish idea that people want
> results fast and sometimes only want to see the first few results.
>
> (I saw the return n rows fast before I heard of the google algorithm,
> by some years, though.) (It may have been part of hellerstein, et al's
> estimated aggregrate capabilities. It was around that time anyway.)
>
> If any function is going to return a gazillion rows
> there ought to be a way of doing that block by block.
>
> When a query uses a cursor, is the entire result set always
> materialized?  If so, perhaps this is just the way it is
> for postgres.  If not, in the future perhaps there can be a
> way to do this.  I do not know that part of the source, however.
> And I suspect any sorted query would need to be materialized
> anyway.
>
> The other piece of the scalability/speed problem is copying
> the data from the materialized result set into the return
> structure. Theoretically this requires twice the memory of
> the result set.  Is that actually true in reality?
>
> Obviously these comments are not for 7.5 nor are they
> make or break issues.
>
> --elein
>
>
> On Sun, Jul 11, 2004 at 11:17:19AM -0400, Tom Lane wrote:
> > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > > On Sat, Jul 10, 2004 at 09:18:28PM -0700, elein wrote:
> > >> The new plperl returns sets by having
> > >> the function return an array.
> >
> > > I think RETURN NEXT does the same thing anyway ... they just store
> > > tuples in a Tuplestore and then the whole thing is returned.  So the
> > > function actually doesn't return until the whole function is done.
> >
> > However, it's likely that the tuplestore infrastructure can deal
> > comfortably with sets far larger than a Perl array can.  (For one thing,
> > it will swap tuples out to a temp file on disk once the set size exceeds
> > work_mem.)  I think elein's concern is justified, unless someone can
> > produce a test case showing that plperl actually performs OK with a
> > large result set.
> >
> > As a simple test for plpgsql's speed with such things, I tried
> >
> > create function seq(int) returns setof int as '
> > begin
> >   for i in 1..$1 loop
> >     return next i;
> >   end loop;
> > return;
> > end' language plpgsql;
> >
> > regression=# \timing
> > Timing is on.
> > regression=# select count(*) from seq(100000);
> >  count
> > --------
> >  100000
> > (1 row)
> >
> > Time: 396.524 ms
> > regression=# select count(*) from seq(1000000);
> >   count
> > ---------
> >  1000000
> > (1 row)
> >
> > Time: 3615.115 ms
> > regression=# select count(*) from seq(10000000);
> >   count
> > ----------
> >  10000000
> > (1 row)
> >
> > Time: 40356.972 ms
> >
> > My Perl is too rusty to immediately whip out the equivalent incantation
> > in plperl; would someone like to compare the timings on their own machine?
> >
> >         regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: don't forget to increase your free space map settings
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Bug: psql misquotes constraints
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: Bug: psql misquotes constraints