Thread: Support for Limit in Update, Insert...

Support for Limit in Update, Insert...

From
"Cristian Prieto"
Date:
Would be any future support for limit in update/insert queries? so you could do something like
 
update table1 set col1=value1 limit 1000;
 
would update just the first 1000 rows in the table. I've been playing a little with the SPI and I get the SPI already has the support for limit the number of changes or elements in a query (the SPI_execute function).
 
Any comments?
 
Thanks a lot

Re: Support for Limit in Update, Insert...

From
Tom Lane
Date:
"Cristian Prieto" <cristian@clickdiario.com> writes:
> Would be any future support for limit in update/insert queries? so you =
> could do something like
> update table1 set col1=3Dvalue1 limit 1000;
> would update just the first 1000 rows in the table.

That seems like a spectacularly bad idea, considering that you could
have no guarantees about *which* 1000 rows get updated.

            regards, tom lane

Re: Support for Limit in Update, Insert...

From
"Cristian Prieto"
Date:
Maybe the first 1000 rows based in the primary index????

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Cristian Prieto" <cristian@clickdiario.com>
Cc: <pgsql-general@postgresql.org>
Sent: Thursday, September 08, 2005 6:05 PM
Subject: Re: [GENERAL] Support for Limit in Update, Insert...


> "Cristian Prieto" <cristian@clickdiario.com> writes:
>> Would be any future support for limit in update/insert queries? so you =
>> could do something like
>> update table1 set col1=3Dvalue1 limit 1000;
>> would update just the first 1000 rows in the table.
>
> That seems like a spectacularly bad idea, considering that you could
> have no guarantees about *which* 1000 rows get updated.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match

Re: Support for Limit in Update, Insert...

From
Alvaro Herrera
Date:
On Thu, Sep 08, 2005 at 07:19:34PM -0600, Cristian Prieto wrote:
> Maybe the first 1000 rows based in the primary index????

No, this is not a satisfactory answer, because 1. it's possible that
there's no primary key at all, or 2. said index may not get used for the
execution of the update.  Maybe something else I'm forgetting right now.

This has been discussed before, and rejected.  Please see the archives.

--
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"Java is clearly an example of a money oriented programming"  (A. Stepanov)

Re: Support for Limit in Update, Insert...

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> This has been discussed before, and rejected.  Please see the archives.

For SELECT, both LIMIT and OFFSET are only well-defined in the presence
of an ORDER BY clause.  (One could argue that we should reject them when
no ORDER BY, but given that the database isn't getting changed as a side
effect, that's probably too anal-retentive.  When the database *is*
going to be changed, however, I for one like well-defined results.)

If this proposal included adding an ORDER BY to UPDATE/DELETE, then it
would at least be logically consistent.  I have not seen the use-case
for it though.  In any case you can usually get the equivalent result
with something like

    UPDATE foo SET ...
    WHERE pkey IN (SELECT pkey FROM foo ORDER BY ... LIMIT ...);

            regards, tom lane

Re: Support for Limit in Update, Insert...

From
Csaba Nagy
Date:
Well, I do have a use case for it.

Context:

We have data coming in from web requests, which must be fast, so we just
insert them in temporary tables without any verification. Then they are
periodically processed by a background task, but even that one will
process just a chunk at a time to avoid long running queries and the
possible socket timeouts bundled with them. Now for identifying a chunk
we use a "chunkid" field in those temporary tables, which is initially
null. When a chunk is selected for processing, we update the chunkid
field with the next value of a sequence, and then all further processing
has a where clause which selects only records with that chunkid.

Use case:

To set the chunkid only for 1000 rows, we actually don't care which
ones. The idea is to uniquely mark a chunk of data, we really don't care
which rows are selected in each chunk, they will be processed all
eventually.

Of course right now we do it by something similar with what you
proposed, using a subselect with a limit clause, I wonder if a simple
update with limit could be faster ?

Cheers,
Csaba.

On Fri, 2005-09-09 at 04:49, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > This has been discussed before, and rejected.  Please see the archives.
>
> For SELECT, both LIMIT and OFFSET are only well-defined in the presence
> of an ORDER BY clause.  (One could argue that we should reject them when
> no ORDER BY, but given that the database isn't getting changed as a side
> effect, that's probably too anal-retentive.  When the database *is*
> going to be changed, however, I for one like well-defined results.)
>
> If this proposal included adding an ORDER BY to UPDATE/DELETE, then it
> would at least be logically consistent.  I have not seen the use-case
> for it though.  In any case you can usually get the equivalent result
> with something like
>
>     UPDATE foo SET ...
>     WHERE pkey IN (SELECT pkey FROM foo ORDER BY ... LIMIT ...);
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


Re: Support for Limit in Update, Insert...

From
"Jim C. Nasby"
Date:
On Thu, Sep 08, 2005 at 10:49:25PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > This has been discussed before, and rejected.  Please see the archives.
>
> For SELECT, both LIMIT and OFFSET are only well-defined in the presence
> of an ORDER BY clause.  (One could argue that we should reject them when
> no ORDER BY, but given that the database isn't getting changed as a side
> effect, that's probably too anal-retentive.  When the database *is*
> going to be changed, however, I for one like well-defined results.)
>
> If this proposal included adding an ORDER BY to UPDATE/DELETE, then it
> would at least be logically consistent.  I have not seen the use-case
> for it though.  In any case you can usually get the equivalent result
> with something like
>
>     UPDATE foo SET ...
>     WHERE pkey IN (SELECT pkey FROM foo ORDER BY ... LIMIT ...);

BTW, this is a case where using ctid would make sense, though you can't:

decibel=# update rrs set parent=parent+1 where ctid in (select ctid from
rrs order by rrs_id limit 1);
ERROR:  could not identify an ordering operator for type tid
HINT:  Use an explicit ordering operator or modify the query.
ERROR:  could not identify an ordering operator for type tid
HINT:  Use an explicit ordering operator or modify the query.
decibel=#

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Support for Limit in Update, Insert...

From
"Jim C. Nasby"
Date:
On Fri, Sep 09, 2005 at 06:42:10PM -0500, Jim C. Nasby wrote:
> On Thu, Sep 08, 2005 at 10:49:25PM -0400, Tom Lane wrote:
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > > This has been discussed before, and rejected.  Please see the archives.
> >
> > For SELECT, both LIMIT and OFFSET are only well-defined in the presence
> > of an ORDER BY clause.  (One could argue that we should reject them when
> > no ORDER BY, but given that the database isn't getting changed as a side
> > effect, that's probably too anal-retentive.  When the database *is*
> > going to be changed, however, I for one like well-defined results.)
> >
> > If this proposal included adding an ORDER BY to UPDATE/DELETE, then it
> > would at least be logically consistent.  I have not seen the use-case
> > for it though.  In any case you can usually get the equivalent result
> > with something like
> >
> >     UPDATE foo SET ...
> >     WHERE pkey IN (SELECT pkey FROM foo ORDER BY ... LIMIT ...);
>
> BTW, this is a case where using ctid would make sense, though you can't:
>
> decibel=# update rrs set parent=parent+1 where ctid in (select ctid from
> rrs order by rrs_id limit 1);
> ERROR:  could not identify an ordering operator for type tid
> HINT:  Use an explicit ordering operator or modify the query.
> ERROR:  could not identify an ordering operator for type tid
> HINT:  Use an explicit ordering operator or modify the query.
> decibel=#

Actually, after trying this, curiosity took hold:
(Note that it's not actually safe to use ctid like this)

decibel=# explain analyze select * from rrs where ctid='(0,3)';
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Tid Scan on rrs  (cost=0.00..4.01 rows=1 width=66) (actual time=0.072..0.076 rows=1 loops=1)
   Filter: (ctid = '(0,3)'::tid)
 Total runtime: 0.265 ms
(3 rows)

decibel=#

Shouldn't there be an access method that goes directly to the specified
ctid instead of doing a seqscan? Even on a small table it seems this
would be faster than a seqscan.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461