Thread: Re: Why LIMIT and OFFSET are commutative

Re: Why LIMIT and OFFSET are commutative

From
Lew
Date:
Andrus wrote:
> I found that
>
> SELECT * FROM foo
> ORDER BY bar
> OFFSET n
> LIMIT m
>
> and
>
> SELECT * FROM foo
> ORDER BY bar
> LIMIT m
> OFFSET n
>
> produce always same results.
>
> Why ?
> OFFSET and LIMIT operations are NOT commutative in general.

Under what interpretation would the results differ?

<http://www.postgresql.org/docs/8.2/interactive/queries-limit.html>
> If a limit count is given, no more than that many rows will be returned
> (but possibly less, if the query itself yields less rows).
...
> OFFSET says to skip that many rows before beginning to return rows.

Why would the position of either clause matter, assuming the clause is in a
legal position?

In both your examples, the LIMIT is 'm', so you will get 'm' rows.

In both your examples, the OFFSET is 'n', so you will skip 'n' rows before
returning those 'm' rows.

I see no inconsistency.

Also, neither LIMIT nor OFFSET is a binary operator, so the term "commutative"
has to be understood metaphorically at best.  What exactly do you mean by
"commutative"?

--
Lew
This post contained three requests for answers.

Re: Why LIMIT and OFFSET are commutative

From
Lew
Date:
Andrus wrote:
>> Under what interpretation would the results differ?
>
> Results must differ for easy creation of LinQ-PostgreSQL driver.
> If results are always the same , PostgreSQL should not allow to use both
> order of clauses.

Um, no, and your original post had nothing to do with LINQ.

My answer was only about SQL.

Your LINQ problems have to be solved in LINQ.

> Nicholas explains:
>
>    Assuming the ordering is the same on each of them (because Skip and Take
> make no sense without ordering, LINQ to SQL will create an order for you,
> which irritates me to no end, but that's a separate thread), they will
> produce different results.

Irrelevant.  The SQL definitions of these clauses is the SQL definition.  LINQ
is a separate issue.

Again, the *SQL* definition says, quoting the PG docs at
<http://www.postgresql.org/docs/8.2/interactive/queries-limit.html>
> If both OFFSET and LIMIT appear, then OFFSET rows are skipped
> *before starting to count the LIMIT rows that are returned.*
(emphasis added)

That's SQL, my friend.  OFFSET first, then LIMIT.  Irrespective of the order
in the query statement.  It is what it is.  SQL doesn't depend on LINQ for its
semantics.

--
Lew

Re: Why LIMIT and OFFSET are commutative

From
Lew
Date:
Andrus wrote:
> Why Postgres does not throw error when SELECT ... LIMIT ... OFFSET is used ?
> That's not sql and should cause error.

Good point, it is not standard SQL.  Postgres varies from the standard in
several ways, some of them useful.  (INSERT ... RETURNING is a useful one.)
No SQL RDBMS follows the standard, AFAIK.  LIMIT / OFFSET is a common
enhancement, but being non-standard, appears in different forms in different
dialects.

However, the LIMIT / OFFSET idiom *is* Postgres SQL, it *is* documented and
therefore it should *not* cause an error when used.  That would just be whacky.

Given that the idiom is documented and does work in Postgres's particular
dialect of SQL, it then must work exactly as described in Postgres's
documentation of its particular dialect of SQL.  And, hey, presto!  It does.

I agree that they should document the extra flexibility it provides in the
order of its clauses.  I find it hard to assess such flexibility as a bad
thing; I tend to appreciate it.

They do hint at it in one place - Section VI, Chapter I, _SELECT_, the LIMIT
clause,
<http://www.postgresql.org/docs/8.2/interactive/sql-select.html#SQL-LIMIT>
>  The LIMIT clause consists of two independent sub-clauses:
>
> LIMIT { /count/ | ALL }
> OFFSET /start/
>
> /count/ specifies the maximum number of rows to return,
> while /start/ specifies the number of rows to skip
> before starting to return rows. When both are specified,
> /start/ rows are skipped before starting to count the
> /count/ rows to be returned.

The word "independent" indicates at least the either may appear without the
other, and implies that the order of the clauses doesn't matter.

Note also that the behavior of the clause is *exactly* as documented.  You
really have to stop resisting that, now, and accept it.

--
Lew

Re: Why LIMIT and OFFSET are commutative

From
"Andrus"
Date:
> Under what interpretation would the results differ?

Results must differ for easy creation of LinQ-PostgreSQL driver.
If results are always the same , PostgreSQL should not allow to use both
order of clauses.

Nicholas explains:

   Assuming the ordering is the same on each of them (because Skip and Take
make no sense without ordering, LINQ to SQL will create an order for you,
which irritates me to no end, but that's a separate thread), they will
produce different results.

    Say your query will produce the ordered set {1, 2, 3}.  Let n = 1, m =
2.

    The first query:

var query = query.Skip(n).Take(m);

converted to SELECT ... OFFSET n LIMIT m

    Will return the ordered set {2, 3}, while the second query:

var query = query.Take(m).Skip(n);

converted to SELECT ... LIMIT m OFFSET n

    Will return the ordered set {2}.

    The reason for this is that in the first query, the Skip method skips
one element, then takes the remaining two, while in the second query, the
first two elements are taken, and then the first one is skipped.

> <http://www.postgresql.org/docs/8.2/interactive/queries-limit.html>
>> If a limit count is given, no more than that many rows will be returned
>> (but possibly less, if the query itself yields less rows).
> ...
>> OFFSET says to skip that many rows before beginning to return rows.
>
> Why would the position of either clause matter, assuming the clause is in
> a legal position?
> In both your examples, the LIMIT is 'm', so you will get 'm' rows.
>
> In both your examples, the OFFSET is 'n', so you will skip 'n' rows before
> returning those 'm' rows.
>
> I see no inconsistency.

Different results - the first gives results (zero-based) n to n+m-1.
The second gives results 0 to Min(n-1, m-1).

> Also, neither LIMIT nor OFFSET is a binary operator, so the term
> "commutative" has to be understood metaphorically at best.  What exactly
> do you mean by "commutative"?

I meant result should depend on the order of OFFSET and LIMIT clauses are
present in SELECT clause.

SELECT ... OFFSET n LIMIT m
SELECT ... LIMIT m OFFSET n

should return different results in sime cases.

Filtering and ordering are effectively orthogonal. LIMIT and OFFSET are
clearly *not* orthogonal.

Otherwise I see no way to implement efficient LinQ-PostgreSQL driver.


Andrus.




Re: Why LIMIT and OFFSET are commutative

From
"Andrus"
Date:
> That's SQL, my friend.  OFFSET first, then LIMIT.  Irrespective of the
> order
> in the query statement.  It is what it is.  SQL doesn't depend on LINQ for
> its semantics.

SQL requires strong order of all cases. Postgres syntax help about OFFSET /
LIMIT  also
describes only single order.

Actually Postgres allows order of OFFSET / LIMIT clauses swapped without any
error.

Why Postgres does not throw error when SELECT ... LIMIT ... OFFSET is used ?
That's not sql and should cause error.

Andrus.



Re: Why LIMIT and OFFSET are commutative

From
Gregory Stark
Date:
"Andrus" <kobruleht2@hot.ee> writes:

>> Under what interpretation would the results differ?
>
> Results must differ for easy creation of LinQ-PostgreSQL driver.
> If results are always the same , PostgreSQL should not allow to use both
> order of clauses.
>
> Nicholas explains:
>
>    Assuming the ordering is the same on each of them (because Skip and Take
> make no sense without ordering, LINQ to SQL will create an order for you,
> which irritates me to no end, but that's a separate thread), they will
> produce different results.
>
>     Say your query will produce the ordered set {1, 2, 3}.  Let n = 1, m =
> 2.
>
>     The first query:
>
> var query = query.Skip(n).Take(m);
>
> converted to SELECT ... OFFSET n LIMIT m
>
>     Will return the ordered set {2, 3}, while the second query:
>
> var query = query.Take(m).Skip(n);
>
> converted to SELECT ... LIMIT m OFFSET n

You should use subqueries if you want to do that. Take() and Skip() can take a
query and rewrite it as:

SELECT * FROM (old query) LIMIT n
or
SELECT * FROM (old query) OFFSET n

So you'll end up with a query like

SELECT * FROM (SELECT * FROM (query) LIMIT n) OFFSET n

or vice versa.

Or alternatively do the arithmetic. If there's already an offset in the query
structure when Skip() is called then add that amount to the offset. I'm
assuming your methods are called on some kind of object which can store
arbitrary state and not simply on a query string.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Why LIMIT and OFFSET are commutative

From
Erik Jones
Date:
On Nov 26, 2007, at 5:29 AM, Andrus wrote:

>> Under what interpretation would the results differ?
>
> Results must differ for easy creation of LinQ-PostgreSQL driver.
> If results are always the same , PostgreSQL should not allow to use
> both
> order of clauses.
>
> Nicholas explains:
>
>    Assuming the ordering is the same on each of them (because Skip
> and Take
> make no sense without ordering, LINQ to SQL will create an order
> for you,
> which irritates me to no end, but that's a separate thread), they will
> produce different results.
>
>     Say your query will produce the ordered set {1, 2, 3}.  Let n =
> 1, m =
> 2.
>
>     The first query:
>
> var query = query.Skip(n).Take(m);
>
> converted to SELECT ... OFFSET n LIMIT m
>
>     Will return the ordered set {2, 3}, while the second query:
>
> var query = query.Take(m).Skip(n);
>
> converted to SELECT ... LIMIT m OFFSET n
>
>     Will return the ordered set {2}.
>
>     The reason for this is that in the first query, the Skip method
> skips
> one element, then takes the remaining two, while in the second
> query, the
> first two elements are taken, and then the first one is skipped.

Nice.  Yet another example of an Object-Relational impedance
mismatch.  SQL is declarative, not procedural.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Why LIMIT and OFFSET are commutative

From
Marco Colombo
Date:
Andrus wrote:
>> Under what interpretation would the results differ?
>
> Results must differ for easy creation of LinQ-PostgreSQL driver.
> If results are always the same , PostgreSQL should not allow to use both
> order of clauses.
>
> Nicholas explains:
>
>    Assuming the ordering is the same on each of them (because Skip and Take
> make no sense without ordering, LINQ to SQL will create an order for you,
> which irritates me to no end, but that's a separate thread), they will
> produce different results.
>
>     Say your query will produce the ordered set {1, 2, 3}.  Let n = 1, m =
> 2.
>
>     The first query:
>
> var query = query.Skip(n).Take(m);
>
> converted to SELECT ... OFFSET n LIMIT m
>
>     Will return the ordered set {2, 3}, while the second query:
>
> var query = query.Take(m).Skip(n);
>
> converted to SELECT ... LIMIT m OFFSET n
>
>     Will return the ordered set {2}.
>
>     The reason for this is that in the first query, the Skip method skips
> one element, then takes the remaining two, while in the second query, the
> first two elements are taken, and then the first one is skipped.

This semantics implies subqueries.

In SQL LIMIT and OFFSET refer to the whole query (that's why in PG you
can swap them). If you want to think OO, both are _attributes_ for the
query object, whose default values OFFSET 1 and LIMIT *inf*.

They are not _operators_ on the query result, as you seem to imply. The
Take() and Skip() should not be real methods, they should just set
internal instance variables.

(Pardon my pythonic syntax - I know zero of LinQ)

 q = Query("SELECT ...") # create a new query object
 q.limit = 2         # object properties
 q.offset = 1
 result = q.execute()     # perform the query

using wrapper methods:

 q = Query("SELECT ...") # create a new query object
 q.limit(2)
 q.offset(1)
 result = q.execute()

Methods allow a more pythonic way (the same you use):

 result = Query("SELECT ...").offset(1).limit(2).execute()

which matches quite closely the SQL syntax, if you have those methods
return "self".

@Erik Jones
There's no real object-relational impedance mismatch here.
The above is pure OO, yet:

 result = Query("SELECT ...").limit(2).offset(1).execute()
                              ^ methods swapped
is perfectly equivalent.

Setting properties on an instance object is "commutative" in general,
unless the set_property operation has side-effects, which I wouldn't
call good programming style.

The OP just maps LIMIT and OFFSET into operations ("OFFSET and LIMIT
_operations_ are NOT commutative in general") instead of object
properties. Once you do the correct mapping, objects behave like queries
in SQL.

As Gregory Stark pointed out, if you want LIMIT and OFFSET to work as
"operators", you need to nest queries.

Let's drop the idea there's SQL behind the scene, and let's think of a
more abstract DB model:

q = Query("SELECT ...").execute()
# this executes the query, and returs an object you can perform other
queries on

q = q.limit(2).execute()
# again, the query is executed, and the result set is in turn querable

q = q.offset(1).execute()

Now, we can just make the execute() method implied. With this new
semantics, we have:

q1 = Query("SELECT ...")
q1 = q.limit(2)
q1 = q.offset(1)

q2 = Query("SELECT ...")
q2 = q.offset(1)
q2 = q.limit(2)

and the results differ. I think that's what the OP meant.

But we needed to drop the idea of SQL behind the scene because there's
no way in SQL to directly query the result set from a previous query.
Here there's "impedance mismatch". The closest thing is subqueries, but
you don't store intermediate results anywhere, like we do above with
objects.

One could implement the above by executing the query every time, but
that's a nightmare for performance on big tables. The only way to have
decent performance is to do "lazy" execution of the query, and use
subqueries as Gregory suggested.

Please note that there are ORM tools that do that. SQLAlchemy even
allows you to build a query (much of the above is valid sqlalchemy) and
then treat it as a list, even using array slices which would generate
the convenient LIMIT/OFFSET clauses automagically:

q = session.query(Table)
q1 = q[1:5]   # adds LIMIT 4 OFFSET 2

of course the query is actually executed only when you start using the
results.

.TM.