Thread: Paged Query
I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page. On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client. If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view. But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down? Thanks in advance Hermann
What language are you using? Usually there is iterator with chunked fetch option (like setFetchSize in java jdbc). So you are passing query without limit and then read as many results as you need. Note that query plan in this case won't be optimized for your limit and I don't remember if postgres has "optimize for N rows" statement option.
Also, if your statement is ordered by some key, you can use general paging technique when you rerun query with "key>max_prev_value" filter to get next chunk.
Середа, 4 липня 2012 р. користувач Hermann Matthes <hermann.matthes@web.de> написав:
> I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page.
> On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client.
> If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view.
> But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down?
>
> Thanks in advance
> Hermann
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
Best regards,
Vitalii Tymchyshyn
Also, if your statement is ordered by some key, you can use general paging technique when you rerun query with "key>max_prev_value" filter to get next chunk.
Середа, 4 липня 2012 р. користувач Hermann Matthes <hermann.matthes@web.de> написав:
> I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page.
> On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client.
> If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view.
> But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down?
>
> Thanks in advance
> Hermann
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
Best regards,
Vitalii Tymchyshyn
Hermann Matthes wrote: > I want to implement a "paged Query" feature, where the user can enter in > a dialog, how much rows he want to see. After displaying the first page > of rows, he can can push a button to display the next/previous page. > On database level I could user "limit" to implement this feature. My > problem now is, that the user is not permitted to view all rows. For > every row a permission check is performed and if permission is granted, > the row is added to the list of rows sent to the client. > If for example the user has entered a page size of 50 and I use "limit > 50" to only fetch 50 records, what should I do if he is only permitted > to see 20 of these 50 records? There may be more records he can view. > But if I don't use "limit", what happens if the query would return > 5,000,000 rows? Would my result set contain 5,000,000 rows or would the > performance of the database go down? Selecting all 5000000 rows would consume a lot of memory wherever they are cached. Also, it might lead to bad response times (with an appropriate LIMIT clause, the server can choose a plan that returns the first few rows quickly). I assume that there is some kind of ORDER BY involved, so that the order of rows displayed is not random. I have two ideas: - Try to integrate the permission check in the query. It might be more efficient, and you could just use LIMIT and OFFSET like you intended. - Select some more rows than you want to display on one page, perform the permission checks. Stop when you reach the end or have enough rows. Remember the sort key of the last row processed. When the next page is to be displayed, use the remembered sort key value to SELECT the next rows. Yours, Laurenz Albe
On Wed, Jul 4, 2012 at 6:25 AM, Hermann Matthes <hermann.matthes@web.de> wrote:
I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page.
On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client.
If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view.
But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down?
Sounds like your permission check is not implemented in the database. If it were, those records would be excluded and the OFFSET-LIMIT combo would be your solution. Also appears that you have access to the application. If so, I would recommend implementing the permission check in the database. Much cleaner from a query & pagination standpoint.
An alternative is to have the application complicate the query with the appropriate permission logic excluding the unviewable records from the final ORDER BY-OFFSET-LIMIT. This will give you an accurate page count.
IMHO, the worst alternative is to select your max page size, exclude rows the user cannot see, rinse and repeat until you have your records per page limit. Whatever you're ordering on will serve as the page number. Issue with this solution is you may not have an accurate page count.
Luck.
-Greg
Hi Hermann,
Well,
Not clear how you get rows for user without paging?
If it is some query:
SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid)
Paging would be:
SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid) LIMIT NoOfRecords OFFSET page*NoOfRecords
Kind Regards,
Misa
2012/7/4 Hermann Matthes <hermann.matthes@web.de>
I want to implement a "paged Query" feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page.
On database level I could user "limit" to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client.
If for example the user has entered a page size of 50 and I use "limit 50" to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view.
But if I don't use "limit", what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down?
Thanks in advance
Hermann
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Use cursors. By far the most flexible. offset/limit have their down sides.
On 07/09/2012 07:55 PM, Gregg Jaskiewicz wrote: > Use cursors. > By far the most flexible. offset/limit have their down sides. Do do cursors. Keeping a cursor open across user think time has resource costs on the database. It doesn't necessarily require keeping the transaction open (with hold cursors) but it's going to either require a snapshot to be retained or the whole query to be executed by the DB and stored somewhere. Then the user goes away on a week's holiday and leaves their PC at your "next" button. All in all, limit/offset have better bounded and defined costs, albeit not very nice ones. -- Craig Ringer
On 07/09/2012 07:02 AM, Craig Ringer wrote: > Do do cursors. Did you mean "Do not use cursors" here? > Then the user goes away on a week's holiday and leaves their PC at > your "next" button. This exactly. Cursors have limited functionality that isn't directly disruptive to the database in general. At the very least, the transaction ID reservation necessary to preserve a cursor long-term can wreak havoc on your transaction ID wraparound if you have a fairly busy database. I can't think of a single situation where either client caching or LIMIT/OFFSET can't supplant it with better risk levels and costs. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Mon, Jul 9, 2012 at 6:22 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
A good solution to this general problem is "hitlists." I wrote about this concept before:
http://archives.postgresql.org/pgsql-performance/2010-05/msg00058.php
Craig James (the other Craig)
On 07/09/2012 07:02 AM, Craig Ringer wrote:Do do cursors.
Did you mean "Do not use cursors" here?Then the user goes away on a week's holiday and leaves their PC at
your "next" button.
This exactly. Cursors have limited functionality that isn't directly disruptive to the database in general. At the very least, the transaction ID reservation necessary to preserve a cursor long-term can wreak havoc on your transaction ID wraparound if you have a fairly busy database. I can't think of a single situation where either client caching or LIMIT/OFFSET can't supplant it with better risk levels and costs.
A good solution to this general problem is "hitlists." I wrote about this concept before:
http://archives.postgresql.org/pgsql-performance/2010-05/msg00058.php
Craig James (the other Craig)
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Mon, Jul 9, 2012 at 8:16 AM, Craig James <cjames@emolecules.com> wrote:
A good solution to this general problem is "hitlists." I wrote about this concept before:
http://archives.postgresql.org/pgsql-performance/2010-05/msg00058.php
I implemented this exact strategy in our product years ago. Our queries were once quite complicated involving many nested sub-SELECT's and several JOIN's per SELECT. The basics of our implementation now consists of
1. A table tracking all "cache" tables. A cache table is a permanent table once represented as one of the former sub-SELECT's. The table includes the MD5 hash of the query used to create the table, time created, query type (helps to determine expire time), and a comment field to help in debugging.
2. Simple logic checking for the existence of the cache table and creating it if it does not.
3. Using one or many of the named cache tables in the final query using ORDER BY-LIMIT-OFFSET in a CURSOR.
4. One scheduled backend process to clear the "expired" cache tables based on the query type.
Reason for the CURSOR is to execute once to get a tally of records for pagination purposes then rewind and fetch the right "page".
Highly recommended.
-Greg
2012/7/9 Gregg Jaskiewicz <gryzman@gmail.com>
Use cursors.
By far the most flexible. offset/limit have their down sides.
Well, I am not aware what down sides there are in LIMIT OFFSET what does not exist in any other solutions for paged queries... But agree there always must be some compromise between flexibility and response time (as long user "have" impression he works "immediatly" so is query executed in 1ms od 1s - not important...)
Query must be parsed and executed (inside DB, before returns results... - so this time is unavoidable) Cursors will ensure just to take (executed results) 1 by 1 from DB,,, OK in Cursor scenario parse and Execute is done just once... But execution plans are cached - though I don't see big downside if it is executed thousands times... you will notice in Pg that second query is much faster then 1st one...
So if you need to go straight forward form page 1 to page 576 (in situations bellow 100 pages - 50 rows by page - no point to discuss performance... You can get all rows from DB at once and do "paging" in client side in memory) - I agree response will be a bit slower in LIMIT/OFFSET case, however not sure in CURSOR scenario it will be much faster, to be more worth then many others limits of Cursors in General... (Personally I have not used them more then 7 years - Really don't see need for them todays when hardware have more and more power...)
From my experience users even very rare go to ending pages... easier to them would be to sort data by field to get those rows in very first pages...
Kind Regards,
Misa
On 07/09/2012 01:41 PM, Misa Simic wrote: > > > From my experience users even very rare go to ending pages... easier > to them would be to sort data by field to get those rows in very first > pages... > > Yeah, the problem really is that most client code wants to know how many pages there are, even if it only wants one page right now. cheers andrew
On Mon, Jul 9, 2012 at 1:46 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
Yeah, the problem really is that most client code wants to know how many pages there are, even if it only wants one page right now.
On 07/09/2012 01:41 PM, Misa Simic wrote:
From my experience users even very rare go to ending pages... easier to them would be to sort data by field to get those rows in very first pages...
FWIW, I wrote a little about getting the numbered results along with total result count in one query[1]. The suggestions in comments to use CTE provided even better performance.
[1] http://gurjeet-tech.blogspot.com/2011/02/pagination-of-results-in-postgres.html
Best regards,
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Hi Andrew, Sure... We are sending data in Json to clients { total:6784, data:[50 rows for desired page] } SELECT count(*) FROM table - without where, without joins used to have bad performance... However, in real scenario we have never had the case without joins and where... Join columns are always indexed, and we always try to put indexes on columns what will mostly be used in where in usual queries... So far we haven't met performance problems... But to be honest with you, total info very rarely in our responses is bigger then 10k, and mainly is less then 1k... what is really small number todays.. (even tables have few million rows, but restrictions always reduce "desired" total data on less then 1000...) When users want to work on something on every day basis... Usually they want "immediatly", things, what are just for them...draft things on what they worked in last few days, or assigned just to them etc etc... When they need to pass trough some process once a month... And performance is "slow" - usually they don't bother... Every day tasks is what is important and what we care about to have good performance... In very rarely cases, when we know, performance must be slow from many reasons - we are lying :) - return first page, (hopefully with data what user looking for), and return 1000 as total... Return result to user, and async run CalculateTotalForThisCaseAndCache it... On first next request for the same thing (but other page) if calculation is done, return results from cache (with real total number)... But it is really on very exceptional basis then on regular... Cheers Misa Sent from my Windows Phone From: Andrew Dunstan Sent: 09/07/2012 19:47 To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Paged Query On 07/09/2012 01:41 PM, Misa Simic wrote: > > > From my experience users even very rare go to ending pages... easier > to them would be to sort data by field to get those rows in very first > pages... > > Yeah, the problem really is that most client code wants to know how many pages there are, even if it only wants one page right now. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On 07/09/2012 09:22 PM, Shaun Thomas wrote: > On 07/09/2012 07:02 AM, Craig Ringer wrote: > >> Do do cursors. > > Did you mean "Do not use cursors" here? > Oops. "So do cursors". >> Then the user goes away on a week's holiday and leaves their PC at >> your "next" button. > > This exactly. Cursors have limited functionality that isn't directly > disruptive to the database in general. At the very least, the > transaction ID reservation necessary to preserve a cursor long-term > can wreak havoc on your transaction ID wraparound if you have a fairly > busy database. I can't think of a single situation where either client > caching or LIMIT/OFFSET can't supplant it with better risk levels and > costs. > My ideal is a cursor with timeout. If I could use a cursor but know that the DB would automatically expire the cursor and any associated resources after a certain inactivity period (_not_ total life, inactivity) that'd be great. Or, for that matter, a cursor the DB could expire when it began to get in the way. I'm surprised more of the numerous tools that use LIMIT and OFFSET don't instead use cursors that they hold for a short time, then drop if there's no further activity and re-create next time there's interaction from the user. ORMs that tend to use big joins would particularly benefit from doing this. I suspect the reason is that many tools - esp ORMs, web frameworks, etc - try to be portable between DBs, and cursors are a high-quirk-density area in SQL RDBMSs, not to mention unsupported by some DBs. Pity, though. There's nothing wrong with using a cursor so long as you don't hang onto it over user think-time without also setting a timeout of some kind to destroy it in the background. -- Craig Ringer
On 07/10/2012 06:24 AM, Misa Simic wrote: > Hi Andrew, > > Sure... We are sending data in Json to clients > > { > total:6784, > data:[50 rows for desired page] > } > > SELECT count(*) FROM table - without where, without joins used to have > bad performance... However, in real scenario we have never had the case > without joins and where... Join columns are always indexed, and we > always try to put indexes on columns what will mostly be used in where > in usual queries... When/if you do need a count of a single table without any filters, a common trick is to use table statistics to return an approximation. If your autovaccum is running regularly it's usually a very good approximation, too. Sounds like this hack may become unnecessary in 9.2 though. -- Craig Ringer
On Mon, Jul 9, 2012 at 4:50 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: > > > When/if you do need a count of a single table without any filters, a common > trick is to use table statistics to return an approximation. If your > autovaccum is running regularly it's usually a very good approximation, too. > > Sounds like this hack may become unnecessary in 9.2 though. Index only scans in 9.2 are nice, but they don't fundamentally change this type of thing. Cheers, Jeff
Hi Andrew, Sure... We are sending data in Json to clients { total:6784, data:[50 rows for desired page] } SELECT count(*) FROM table - without where, without joins used to have bad performance... However, in real scenario we have never had the case without joins and where... Join columns are always indexed, and we always try to put indexes on columns what will mostly be used in where in usual queries... So far we haven't met performance problems... But to be honest with you, total info very rarely in our responses is bigger then 10k, and mainly is less then 1k... what is really small number todays.. (even tables have few million rows, but restrictions always reduce "desired" total data on less then 1000...) When users want to work on something on every day basis... Usually they want "immediatly", things, what are just for them...draft things on what they worked in last few days, or assigned just to them etc etc... When they need to pass trough some process once a month... And performance is "slow" - usually they don't bother... Every day tasks is what is important and what we care about to have good performance... In very rarely cases, when we know, performance must be slow from many reasons - we are lying :) - return first page, (hopefully with data what user looking for), and return 1000 as total... Return result to user, and async run CalculateTotalForThisCaseAndCache it... On first next request for the same thing (but other page) if calculation is done, return results from cache (with real total number)... But it is really on very exceptional basis then on regular... Cheers Misa Sent from my Windows Phone From: Andrew Dunstan Sent: 09/07/2012 19:47 To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Paged Query On 07/09/2012 01:41 PM, Misa Simic wrote: > > > From my experience users even very rare go to ending pages... easier > to them would be to sort data by field to get those rows in very first > pages... > > Yeah, the problem really is that most client code wants to know how many pages there are, even if it only wants one page right now. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Понеділок, 9 липня 2012 р. користувач Misa Simic <misa.simic@gmail.com> написав:
>
>
> 2012/7/9 Gregg Jaskiewicz <gryzman@gmail.com>
>>
>> Use cursors.
>> By far the most flexible. offset/limit have their down sides.
>
>
> Well, I am not aware what down sides there are in LIMIT OFFSET what does not exist in any other solutions for paged queries...
where key > last-previous-key order by key
--
Best regards,
Vitalii Tymchyshyn
Понеділок, 9 липня 2012 р. користувач Misa Simic <misa.simic@gmail.com> написав:
>
>
> 2012/7/9 Gregg Jaskiewicz <gryzman@gmail.com>
>>
>> Use cursors.
>> By far the most flexible. offset/limit have their down sides.
>
>
> Well, I am not aware what down sides there are in LIMIT OFFSET what does not exist in any other solutions for paged queries...
'where key > last-value order by key limit N' is much better in performance for large offsets.
p.s. Sorry for previous email- hit send too early.
--
Best regards,
Vitalii Tymchyshyn