Thread: Maybe a strange question, but: "How long may a cursor live?"

Maybe a strange question, but: "How long may a cursor live?"

From
Christian Fritze
Date:
Hello everybody...

I'm working on a web based application (using gnuJSP / JDBC)
that needs to do queries like

SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2)
FROM table1 WHERE textattrib2 >= 'foo' GROUP BY textattrib2 ORDER BY
textattrib2 ASC LIMIT somelimit;

with table1 holding roughly 80000 rows (probably growing some day),
textattrib2 is indexed of course.

Unfortunately the query above is intolerably slow (up to 30 seconds or
so, 2 would be OK...).

Just declaring a cursor for

SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2)
FROM table1 GROUP BY textattrib2 ORDER BY textattrib2 ASC;

didn't help much: fetching my entries from such a cursor is very
fast indeed, but the *first* access to the cursor is still much
too slow.

So I'm wondering about alternatives. Maybe somebody can give a comment
on these:

1. pooling cursors

   I've been playing with the idea of writing a bean that lives
   in the JSP application scope (and gets started with apache).
   That bean would then declare a pool of cursors for the query
   shown above and each client (read: session scope bean...) could
   request one for its session. A (very) crude prototype of this is
   running right now and yes: it is *very* fast. I'm worried however
   about the effects of having a number of cursors/transactions hanging
   around for a potentially very long time. Will temporary files eat
   my disk space away in a few hours? What about RAM and CPU ressources?
   Restarting apache and postgres (and that cursor pool with it) every
   night and having a few minutes of downtime would be not a big problem
   here... But still I'm feeling quite uncomfortable with the whole affair.


2. splitting up table1 by substr(textattrib2, 1, 2)

   This would result in smaller (faster) tables table1_aa, table1_ab,...
   BUT: in case table1_fo contains less than somelimit entries
   satisfying the query I want to union these entries with those
   from table1_fp,... until I reach somelimit. So I have to store
   information somewhere on what table comes next in the chain.
   Taking into account that entries are not distributed evenly
   with respect to substr(textattrib2, 1, 2) and that I don't
   want to have too many entries in each table to keep performance
   high... I guess I would pretty soon end up implementing trees
   (and tree traversals). I must confess that I'm somewhat shying
   away from that work ;-) (Could probably be easier if PL/pgSQL would
   allow for 'computed' tablenames?)


3. a *good* solution I've not found yet?

   Maybe I'm just thinking too complicated???
   Any other ideas to this problem?




Thanks in advance...
Christian Fritze




--
"The sky above the port was the color of television,
 tuned to a dead channel."
                                         -- W.G. --



Re: Maybe a strange question, but: "How long may a cursor live?"

From
Tim Kientzle
Date:
A couple of ideas for you:

Experiment with doing the GROUP BY within your code.
Depending on a number of factors, it's sometimes faster.

Experiment with doing the ORDER BY within your code.
I've seen several cases where pulling the data into
memory and sorting there was much, much faster than
having the database do the sorting.

Experiment with moving the text_cat and func1 operations
into your Java code (rather than asking the DB to do them).
This may or may not help, depending on a number of factors.

The startup cost you're seeing is basically PG generating
all of the results, GROUP BY, and ORDER BY and tucking
the final answer away somewhere temporarily.  The cursor
then just steps through this temporary table.

You'd probably gain a lot by building such a temporary
table explicitly.  That is, create a new "summary" table
and periodically do the SELECT below and put the results
into the temporary table.  (You can even use a separate
program outside of your web application that's run from 'cron')
That way, you avoid the overhead of summarizing 80,000 results
on every query.  I've used this approach to provide rapid access
to web site logs (tens of millions of records summarized down
to a few thousand entries).

You mention that table1 has 80000 rows but didn't mention how
many rows there were after the aggregation (that is, how
many distinct values of textattrib2 there were).  For high-performance
web applications, I've been just storing full data results in
memory.  You've talked about using a bean to keep a DB cursor
around; why not just store the results?  This looks a lot like
a simple SortedMap from textattrib2 to textattrib1/count pairs.
That's easy to store in memory.  And it's hard to beat the speed.

            - Tim Kientzle


Christian Fritze <The.Finn@sprawl.de> writes:
>
> I'm working on a web based application (using gnuJSP / JDBC)
> that needs to do queries like
>
> SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2)
> FROM table1 WHERE textattrib2 >= 'foo' GROUP BY textattrib2 ORDER BY
> textattrib2 ASC LIMIT somelimit;
>
> with table1 holding roughly 80000 rows (probably growing some day),
> textattrib2 is indexed of course.
>
> Unfortunately the query above is intolerably slow (up to 30 seconds or
> so, 2 would be OK...).

Re: Re: Maybe a strange question, but: "How long may a cursor live?"

From
Alfred Perlstein
Date:
> Christian Fritze <The.Finn@sprawl.de> writes:
> >
> > I'm working on a web based application (using gnuJSP / JDBC)
> > that needs to do queries like
> >
> > SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2)
> > FROM table1 WHERE textattrib2 >= 'foo' GROUP BY textattrib2 ORDER BY
> > textattrib2 ASC LIMIT somelimit;
> >
> > with table1 holding roughly 80000 rows (probably growing some day),
> > textattrib2 is indexed of course.
> >
> > Unfortunately the query above is intolerably slow (up to 30 seconds or
> > so, 2 would be OK...).

* Tim Kientzle <kientzle@acm.org> [001109 17:21] wrote:
> A couple of ideas for you:
>
> Experiment with doing the GROUP BY within your code.
> Depending on a number of factors, it's sometimes faster.
>
> Experiment with doing the ORDER BY within your code.
> I've seen several cases where pulling the data into
> memory and sorting there was much, much faster than
> having the database do the sorting.

You can increase postgresql's performance by tuning the amount
of shared memory it allocates as well as how much memory it
will use for "sort buffers"

-B 32768 (~256MB shared segment)
-o "-S 65534"  (increases size of sort buffers, not sure how much though)

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."