Thread: Re: [SQL] OFFSET impact on Performance???

Re: [SQL] OFFSET impact on Performance???

From

"Merlin Moncure"

Date:

24 January 2005, 15:31:17

Alex wrote:
> How do you create a temporary view that has only a small subset of the
> data from the DB init?  (Links to docs are fine - I can read ;).  My
> query isn't all that complex, and my number of records might be from
> 10 to 2k depending on how I implement it.

Well, you can't.  My point was that the traditional query/view approach
is often more appropriate for these cases.

Cursors are really designed to provide an in-transaction working set.
Because of this, they provide the luxury of absolute addressing which is
normally impossible in SQL.

Queries allow for relative addressing, in other words 'fetch me the next
c of x based on y'.  This is a good thing, because it forces the
application developer to consider changes that happen from other users
while browsing a dataset.  Applications that don't use transactions
should not provide any guarantees about the data in between queries like
the number of records matching a certain criteria.  This is a trap that
many developers fall into, especially when coming from flat file
databases that use to allow this.  This puts particularly nasty
constraints on web application developers who are unable to hold a
transaction between page refreshes.  However this just a variant of SQL
developer trap #2, which is that you are not supposed to hold a
transaction open waiting for user input.

In your particular case IMO what you really need is a materialized view.
Currently, it is possible to rig them up in a fashion with plgsql that
may or may not meet your requirements.  Given some careful thought,
mat-views can be used to solve all kinds of nasty performance related
issues (and it all boils down to performance, otherwise we'd all just
use limit/offset).

Merlin

Re: [SQL] OFFSET impact on Performance???

From

Christopher Browne

Date:

27 January 2005, 05:50:33

In an attempt to throw the authorities off his trail, merlin.moncure@rcsonline.com ("Merlin Moncure") transmitted:
> Alex wrote:
>> How do you create a temporary view that has only a small subset of the
>> data from the DB init?  (Links to docs are fine - I can read ;).  My
>> query isn't all that complex, and my number of records might be from
>> 10 to 2k depending on how I implement it.
>
> Well, you can't.  My point was that the traditional query/view
> approach is often more appropriate for these cases.

Actually, you can if you assume you can "temporarily materialize" that
view.

You take the initial query and materialize it into a temporary table
which can then be used to browse "detail."

Thus, suppose you've got a case where the selection criteria draw in
8000 objects/transactions, of which you only want to fit 20/page.

It's ugly and slow to process the 15th page, and you essentially
reprocess the whole set from scratch each time:

  select [details] from [big table] where [criteria]
    order by [something]
    offset 280 limit 20;

Instead, you might start out by doing:

  select [key fields] into temp table my_query
  from [big table] where [criteria];

  create index my_query_idx on my_query(interesting fields);

With 8000 records, the number of pages in the table will correspond
roughly to the number of bytes per record which is probably pretty
small.

Then, you use a join on my_query to pull the bits you want:

  select [big table.details] from [big table],
  [select * from my_query order by [something] offset 280 limit 20]
  where [join criteria between my_query and big table]
  order by [something];

For this to be fast is predicated on my_query being compact, but that
should surely be so.

The big table may be 20 million records; for the result set to be even
vaguely browsable means that my_query ought to be relatively small so
you can pull subsets reasonably efficiently.

This actually has a merit over looking at a dynamic, possibly-changing
big table that you won't unexpectedly see the result set changing
size.

This strikes me as a pretty slick way to handle "data warehouse-style"
browsing...
--
output = ("cbbrowne" "@" "gmail.com")
http://www.ntlug.org/~cbbrowne/oses.html
The first cup of coffee recapitulates phylogeny.