Changed to: how to solve the get next 100 records problem - Mailing list pgsql-sql

From Alain
Subject Changed to: how to solve the get next 100 records problem
Date
Msg-id 428B6263.5050402@pobox.com
Whole thread Raw
In response to Re: Does Postgresql have a similar pseudo-column "ROWNUM" as  (Ragnar Hafstað <gnari@simnet.is>)
Responses Re: Changed to: how to solve the get next 100 records problem  (Ragnar Hafstað <gnari@simnet.is>)
List pgsql-sql
Hi Ragnar (and others),

I found something that is both fast and simple (program side):

Allways assuming that: pkey is a primary key and skey is a sort key, and
there exists an index on (skey,pkey)

first select is
select ... from tab ORDER by skey,pkey LIMIT 100;

subsequent selects are
(select ... from tab WHERE skey=skey_last AND pkey>pkey_last                     ORDER BY skey,pkey LIMIT 100)
UNION
(select ... from tab WHERE skey>skey_last                     ORDER BY skey,pkey LIMIT 100)
ORDER BY skey,pkey LIMIT 100;

The catch is that if the first select would have more than 100 records 
and was limited to 100, the second select's data is completeply 
discarted by the 3rd limit!

The only strange thing is that without the 3rd order by, the order is 
wrong. I didn't expect it because each select is created ordered. Is it 
expected that UNION mixes it all up? (using postgre 7.4.1)

The 3rd order by is not indexed, but it operates in a memory table of no 
more than 200 so it is fast too.

Please comment on this. I tested  and it worked but I really new to sql 
and I feel insecure...

Thanks,
Alain




>>
>>>>[how to solve the get next 100 records problem]
> 
>>BUT, I think that this is close to a final solution, I made some 
>>preliminary test ok. Please tell me what you think about this.
>>
>>Fisrt let's state that I am reading records to put on a screen (in a 
>>Table/Grid). I separated the problem is *3* parts
>>
>>-first select is as above:
>>select ... from tab ORDER by skey,pkey LIMIT 100;
>>
>>-second method for next 100:
>>select ... from tab WHERE skey>=skey_last
>>                     ORDER BY skey,pkey
>>                     LIMIT 100;
>>but here I test for repetitions using pkey and discard them
>>
>>-now if I get all repetitions or the last 100 have the same skey with 
>>the second method, I use
>>select ... from tab WHERE skey=skey_last AND pkey>pkey_last
>>                     ORDER BY skey,pkey
>>                     LIMIT 100;
>>until I get an empty response, then I go back to the second method.
> 
> 
> if your distribution is such that those skeys that have > 100 records
> tend to have a lot more, you might have a higher limit for this case.
> 
> 
> 
>>All queries are extremely fast with 6000000 records and it looks like 
>>the few redundant or empty queries (but very fast) will not be a problem.
>>
>>What is your opinion about this (apart that it is a bit complex :) ??
> 
> 
> 
> looks fine
> 
> gnari
> 
> 
> 
> 


pgsql-sql by date:

Previous
From: Ragnar Hafstað
Date:
Subject: Re: Does Postgresql have a similar pseudo-column "ROWNUM" as
Next
From: KÖPFERL Robert
Date:
Subject: Meaning of ERROR: tuple concurrently updated