Re: suggestions to improve postgresql suitability for - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: suggestions to improve postgresql suitability for
Date
Msg-id Pine.LNX.4.56.0307241443090.15688@sablons.ensmp.fr
Whole thread Raw
List pgsql-hackers
Hello,

> > FOR tuple IN
> >   SELECT area, type, month, amount FROM client, invoice WHERE id=client
> > LOOP
> >   UPDATE tmp SET amount=amount+tuple.amount, count=count+1
> >     WHERE area=tuple.area AND type=tuple.type AND month=tuple.month
> > END LOOP;
> > ...
> >
> > It is very SLOOOOOOOOOOOWWWWWWWWW...
>
> It will be, first you're doing the same join that generates the large
> result set you were complaining about in the plain SQL example

Sure. I need to pay for the join from the application point of view.

I was 'complaining' about the 'external sort', not the join which is
necessary, and on the fact that the join result is stored as a temporary
very big table rather than being processed/returned on the fly.

That was really my point, although maybe not so clear because of my poor
English: return data on the fly if possible, and skip sort if possible.

> and then you're looping over it generating a delete/insert for every
> tuple in that result set.

Sure. This test was to avoid the sort. I need the join.

> > IV) Basic client side (JDBC, DBI, libpq)
> > ----------------------------------------
> >
> > Then I wrote the same stuff on the client side in java with JDBC, perl
> > with DBI and C with libpq, by browsing the above SELECT in a simple
> > loop and aggregating the data directly in the language. In all 3
> > cases, the process attempts to allocate the full result of the client
> > and invoice join in memory... a **very** bad idea indeed!
>
> But what about doing that in the server?

Sure. That was my first idea, and is what I did in the previous two
examples with SQL and PL/pgSQL?

> I thought it necessary for the result set to be generated before any
> data can be returned, in the general case and in your grouped by example
> specifically.

You're right for the group by if it is not computed on the fly.

I'm not that sure about the general case.

Say you have a subselect, it is an implementation choice (and maybe a
simplification) to first store the result of the subselect and then apply
the second select on it. One can think of having the subselect 'piped'
into the select so as to avoid the intermediate storage requirements.

In my example, I want to browse the result of one select, but I really
don't think that it is useful to store the result of the select on disk
before sending the results to the client. It just see that as an
implementation and optimisation issue.

Most client/serveur API are designed to allow this type of 'on the fly'
implementation, as for example you cannot have the number of rows returned
by a SELECT in Java/JDBC or Perl/DBI (short by scanning the result and
counting the rows), although you can have this information in libpq...

> The latter is only because if you're not using the hash aggregates then
> the sort is required and that of course requires all the result data to
> be known.

Sure for the sort. But no sort, no need.

> > (2) the PL/pgSQL interpreter would be a great deal faster.
>
> It did what you told it to do.

I told it to avoid the sort, not to be slow;-)

Thanks for all your comments,

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Patrick Welche
Date:
Subject: Re: cvs problem
Next
From: Fabien COELHO
Date:
Subject: Re: suggestions to improve postgresql suitability for