Thread: functional index not used, looping simpler query just faster

functional index not used, looping simpler query just faster

From

Ivan Sergio Borgonovo

Date:

10 July 2008, 06:40:54

I've this:

CREATE TABLE catalog_brands
(
  brandid serial NOT NULL,
  "name" character varying(64) NOT NULL,
  delivery smallint NOT NULL DEFAULT (24 * 15),
  deliverymessage character varying(64),
  brandtypeid integer,
  brandgroupid integer,
  CONSTRAINT catalog_brands_pkey PRIMARY KEY (brandid),
  CONSTRAINT catalog_brands_brandgroupid_fkey FOREIGN KEY
(brandgroupid) REFERENCES catalog_brandgroup (brandgroupid) MATCH
SIMPLE ON UPDATE NO ACTION ON DELETE SET NULL,
  CONSTRAINT catalog_brands_brandtypeid_fkey FOREIGN KEY
(brandtypeid) REFERENCES catalog_brandtype (brandtypeid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE SET NULL
);

CREATE INDEX catalog_brands_name_index
  ON catalog_brands
  USING btree
  (upper(name::text));

CREATE TABLE catalog_items
(
  itemid bigint NOT NULL,
  brand integer NOT NULL,
  name character varying(256) NOT NULL,
/* snip */
  datainserimento timestamp without time zone,
  dapub smallint,
  CONSTRAINT catalog_items_statusid_fkey FOREIGN KEY (statusid)
      REFERENCES catalog_item_status (statusid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE INDEX catalog_items_brands_index
  ON catalog_items
  USING btree
  (upper(brands::text));

CREATE UNIQUE INDEX catalog_items_itemsid_index
  ON catalog_items
  USING btree
  (itemid);
ALTER TABLE catalog_items CLUSTER ON catalog_items_itemsid_index;

catalog_items contains ~ 650K records
catalog_brands 44 records

Now I try this:

explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
  from catalog_items i1
  inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
  where i1.ItemID in (
  select i2.ItemID from catalog_items i2
    inner join catalog_brands b2 on upper(i2.brands)=upper(b2.name)
    where i1.brands=i2.brands
    and i2.dataPub>(now() - interval '8 month') and
i2.datainserimento>(now() - interval '6 month') order by
i2.datainserimento desc limit 3);


And I got this:

"Nested Loop  (cost=0.00..6383568361.87 rows=74378 width=82)"
"  ->  Seq Scan on catalog_brands b1  (cost=0.00..1.44 rows=44 width=18)"
"  ->  Index Scan using catalog_items_brands_index on catalog_items i1  (cost=0.00..145081069.53 rows=1690 width=82)"
"        Index Cond: (upper((i1.brands)::text) = upper(("outer".name)::text))"
"        Filter: (subplan)"
"        SubPlan"
"          ->  Limit  (cost=42906.81..42906.82 rows=1 width=16)"
"                ->  Sort  (cost=42906.81..42906.82 rows=1 width=16)"
"                      Sort Key: i2.datainserimento"
"                      ->  Nested Loop  (cost=0.00..42906.80 rows=1 width=16)"
"                            Join Filter: (upper(("outer".brands)::text) = upper(("inner".name)::text))"
"                            ->  Seq Scan on catalog_items i2  (cost=0.00..42904.59 rows=1 width=34)"
"                                  Filter: ((($0)::text = (brands)::text) AND (datapub > (now() - '8 mons'::interval))
AND(datainserimento > (now() - '6 mons'::interval)))" 
"                            ->  Seq Scan on catalog_brands b2  (cost=0.00..1.44 rows=44 width=18)"

I never waited enough to see results from the above... several
minutes over a 2xXeon 4Gb ram.


A simpler
select name, brands from catalog_items where
upper(brands)=upper('LARGEST GROUP') order by datainserimento desc
limit 3;

finishes in few seconds. Iterating over 44 groups does look to be
much faster than the more complicated query.

"Limit  (cost=9503.62..9503.63 rows=3 width=74)"
"  ->  Sort  (cost=9503.62..9512.08 rows=3381 width=74)"
"        Sort Key: datainserimento"
"        ->  Bitmap Heap Scan on catalog_items  (cost=29.84..9305.44 rows=3381 width=74)"
"              Recheck Cond: (upper((brands)::text) = 'CAMBRIDGE UNIVERSITY PRESS'::text)"
"              ->  Bitmap Index Scan on catalog_items_brands_index  (cost=0.00..29.84 rows=3381 width=0)"
"                    Index Cond: (upper((brands)::text) = 'CAMBRIDGE UNIVERSITY PRESS'::text)"

Even
select count(*), i1.brands from catalog_items i1
    inner join catalog_brands b1 on
upper(b1.name)=upper(i1.brands)
    group by i1.brands order by count(*)

takes from few seconds to less than 1 sec.

I could actually loop inside plpgsql but... well I'd like to
understand how things work.


--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: functional index not used, looping simpler query just faster

From

Martijn van Oosterhout

Date:

10 July 2008, 06:50:15

On Thu, Jul 10, 2008 at 11:40:40AM +0200, Ivan Sergio Borgonovo wrote:
> I've this:

What's basically killing you is this condition:
>   select i2.ItemID from catalog_items i2
>     inner join catalog_brands b2 on upper(i2.brands)=upper(b2.name)
>     where i1.brands=i2.brands   <*********
>     and i2.dataPub>(now() - interval '8 month') and

Is not indexable. Hence the seqscan, which makes everything slow.
In your "faster" version you test against a condition which *is*
indexable, hence it's faster.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

signature.asc

expected O^2 looks line K^O, index problem not involved: [was] looping simpler query just faster

From

Ivan Sergio Borgonovo

Date:

10 July 2008, 09:19:42

On Thu, 10 Jul 2008 11:50:01 +0200
Martijn van Oosterhout <kleptog@svana.org> wrote:

> On Thu, Jul 10, 2008 at 11:40:40AM +0200, Ivan Sergio Borgonovo
> wrote:
> > I've this:
>
> What's basically killing you is this condition:
> >   select i2.ItemID from catalog_items i2
> >     inner join catalog_brands b2 on
> > upper(i2.brands)=upper(b2.name) where i1.brands=i2.brands
> > <********* and i2.dataPub>(now() - interval '8 month') and
>
> Is not indexable. Hence the seqscan, which makes everything slow.
> In your "faster" version you test against a condition which *is*
> indexable, hence it's faster.

I changed to

where upper(i1.brands)=upper(i2.brands)

"Nested Loop  (cost=0.00..1393962681.78 rows=74378 width=82)"
"  ->  Seq Scan on catalog_brands b1  (cost=0.00..1.44 rows=44 width=18)"
"  ->  Index Scan using catalog_items_brands_index on catalog_items i1  (cost=0.00..31680940.43 rows=1690 width=82)"
"        Index Cond: (upper((i1.brands)::text) = upper(("outer".name)::text))"
"        Filter: (subplan)"
"        SubPlan"
"          ->  Limit  (cost=9366.40..9366.41 rows=1 width=16)"
"                ->  Sort  (cost=9366.40..9366.41 rows=1 width=16)"
"                      Sort Key: i2.datainserimento"
"                      ->  Nested Loop  (cost=29.84..9366.39 rows=1 width=16)"
"                            ->  Bitmap Heap Scan on catalog_items i2  (cost=29.84..9364.61 rows=1 width=34)"
"                                  Recheck Cond: (upper(($0)::text) = upper((brands)::text))"
"                                  Filter: ((datapub > (now() - '8 mons'::interval)) AND (datainserimento > (now() - '6
mons'::interval)))"
"                                  ->  Bitmap Index Scan on catalog_items_brands_index  (cost=0.00..29.84 rows=3381
width=0)"
"                                        Index Cond: (upper(($0)::text) = upper((brands)::text))"
"                            ->  Seq Scan on catalog_brands b2  (cost=0.00..1.77 rows=1 width=18)"
"                                  Filter: (upper(($0)::text) = upper((name)::text))"

but it still perform badly.
Skipping one of the two
join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
doesn't improve anything...
even skipping some conditions, that I thought would actually make
the query faster, restricting the rows to sort etc...

and i2.dataPub>(now() - interval '8 month') and
i2.datainserimento>(now() - interval '6 month')

didn't improve the speed.

And the sum of times it takes to execute the simpler statement for
each brands even without waiting the end of the above statements is
at least 1 order of magnitude faster than the more complicated query.

catalog_brands is a quite small table so
->  Seq Scan on catalog_brands b2 (cost=0.00..1.77 rows=1 width=18)"
 Filter: (upper(($0)::text) = upper((name)::text))"

shouldn't be a problem

and it seems that even the index is not playing such a big part

since this that doesn't use the index
select name, brands from catalog_items where brands='CAMBRIDGE
UNIVERSITY PRESS' order by datainserimento desc limit 3

takes less than 1 sec.

I'd say that having 44 groups and since the largest takes always
less then 1 sec with the simpler query... there should be something
else wrong with the above query that takes > 3 min.

Infact:
create temp table groupeditor as select i1.ItemID, i1.brands, i1.name, i1.dataPub, i1.datainserimento
  from catalog_items i1
  inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name);
create index groupeditor_brands_idx on groupeditor (brands);
create index groupeditor_ItemID_idx on groupeditor (ItemID);

explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
  from groupeditor i1 where
  i1.ItemID in (
  select i2.ItemID from groupeditor i2
    where
    i1.brands=i2.brands
    and i2.dataPub>(now() - interval '8 month') and i2.datainserimento>(now() - interval '6 month')
    order by i2.datainserimento desc
  limit 3);

"Seq Scan on groupeditor i1  (cost=0.00..197133363.99 rows=68583 width=1048)"
"  Filter: (subplan)"
"  SubPlan"
"    ->  Limit  (cost=1437.15..1437.16 rows=3 width=16)"
"          ->  Sort  (cost=1437.15..1437.34 rows=76 width=16)"
"                Sort Key: datainserimento"
"                ->  Bitmap Heap Scan on groupeditor i2  (cost=7.40..1434.78 rows=76 width=16)"
"                      Recheck Cond: (($0)::text = (brands)::text)"
"                      Filter: ((datapub > (now() - '8 mons'::interval)) AND (datainserimento > (now() - '6
mons'::interval)))"
"                      ->  Bitmap Index Scan on groupeditor_brands_idx  (cost=0.00..7.40 rows=686 width=0)"
"                            Index Cond: (($0)::text = (brands)::text)"

Creating the temp table takes up 3 sec, creating the indexes 3
sec... and the new query... still forever...

Killing
i2.dataPub>(now() - interval '8 month') and
i2.datainserimento>(now() - interval '6 month')
and moving it in the creation of the temp table made the above run
in... 5 sec roughly... what took most was table and indexes
creation... the query took 61ms.
I could follow a more scientific measurement method (cache etc...)
but still it looks pretty impressive...

even more... dropping the indexes on the temp table even after
restarting the server still make the whole process stay below 5 sec.

As soon as groupeditor get larger (increasing the considered
interval) the query get slower and slower... much more than O^2 and
surely still not as fast as it would be to use several simpler
statements.
It still looks like a good idea to create a temp table so I'll have
to sort over a smaller set... but still I'm puzzled.
The subquery technique still perform awfully compared to the sums of
times taken by simpler queries.

Debian stable, pg 8.1

BTW what's going to happen to the indexes related to a temp table?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: expected O^2 looks line K^O, index problem not involved: [was] looping simpler query just faster

From

Martijn van Oosterhout

Date:

10 July 2008, 10:53:05

On Thu, Jul 10, 2008 at 02:19:30PM +0200, Ivan Sergio Borgonovo wrote:
> On Thu, 10 Jul 2008 11:50:01 +0200
> Martijn van Oosterhout <kleptog@svana.org> wrote:

Hmm, I just studied your query to determine what you're trying to do.
As I understand it:

For each item
  Determine the brand
  Get the top three items for this brand
  If this item is one of them, display it

This is pretty inefficient but I can't see an efficient way to do it
either. I suppose one thing to try would be a multicolumn index on
(brand,datainserimento) to avoid the sort step. Also, the table b1 in
the original query is redundant.

It's the fact that you want the top three items that makes it
difficult, not sure how to deal with that.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

signature.asc

Re: functional index not used, looping simpler query just faster

From

Tom Lane

Date:

10 July 2008, 11:47:01

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
> Now I try this:

> explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
>   from catalog_items i1
>   inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
>   where i1.ItemID in (
>   select i2.ItemID from catalog_items i2
>     inner join catalog_brands b2 on upper(i2.brands)=upper(b2.name)
>     where i1.brands=i2.brands
>     and i2.dataPub>(now() - interval '8 month') and
> i2.datainserimento>(now() - interval '6 month') order by
> i2.datainserimento desc limit 3);

This sub-select is non optimizable because you've got an outer reference
in it, which compels re-evaluating it at every row of the outer query.
Try recasting as

explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
  from catalog_items i1
  inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
  where (i1.ItemID, i1.brands) in (
  select i2.ItemID, i2.brands from catalog_items i2
    inner join catalog_brands b2 on upper(i2.brands)=upper(b2.name)
    where
    i2.dataPub>(now() - interval '8 month') and
i2.datainserimento>(now() - interval '6 month') order by
i2.datainserimento desc limit 3);


            regards, tom lane

Re: expected O^2 looks line K^O, index problem not involved: [was] looping simpler query just faster

From

Ivan Sergio Borgonovo

Date:

10 July 2008, 11:56:47

On Thu, 10 Jul 2008 15:52:54 +0200
Martijn van Oosterhout <kleptog@svana.org> wrote:

> On Thu, Jul 10, 2008 at 02:19:30PM +0200, Ivan Sergio Borgonovo
> wrote:
> > On Thu, 10 Jul 2008 11:50:01 +0200
> > Martijn van Oosterhout <kleptog@svana.org> wrote:

> Hmm, I just studied your query to determine what you're trying to
> do. As I understand it:
>
> For each item
>   Determine the brand
>   Get the top three items for this brand
>   If this item is one of them, display it
>
> This is pretty inefficient but I can't see an efficient way to do
> it either. I suppose one thing to try would be a multicolumn index
> on (brand,datainserimento) to avoid the sort step. Also, the table
> b1 in the original query is redundant.

> It's the fact that you want the top three items that makes it
> difficult, not sure how to deal with that.

I'm not concerned about the fact that it is not "easy"... I'm
concerned about the fact that small changes to the query produce
unexpected results in performances.

I'd say that the filter on

and i2.dataPub>(now() - interval '8 month') and
i2.datainserimento>(now() - interval '6 month')

shouldn't play an important role... or at least have the same weight
on performance if used to build up a temporary table or directly in
the subquery.
I thought that indexes were going to play an important role but
still they don't.

I haven't been able to come up with a single statement that can get
the top N row by group in PostgreSQL that can compete with a set of
simpler statements.

There are a lot of examples pointing to a couple of standard
solution on MySQL and MS SQL[1] (the other standard solution uses
HAVING).
I didn't benchmark the same SQL on these 2 other DB but I think I
could assume that if they were performing so badly they wouldn't be
proposed so frequently.

Considering it is pretty trivial to write a stored procedure that
create a temp table, create some indexes on it, loops over groups
and pick up the top N record and that system proved to perform quite
well I'm still curious to know if there is postgresql way that
performs comparably with the just described method.

It would be nice if I could exploit further the fact that I'm going
to sort multiple times the temp table since the kernel of the
function will be something like

for _group in select group from grouptable loop
  select name, group from table where group=_group sort by
    datainserimento limit 3;
  return next;
end loop;
return;

I think building the temp table already sorted could result in some
performance boost but that would depend on implementation details.

[1] excluding the ones that involve non standard functions

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: functional index not used, looping simpler query just faster

From

Ivan Sergio Borgonovo

Date:

10 July 2008, 12:46:14

On Thu, 10 Jul 2008 10:46:53 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
> > Now I try this:
>
> > explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
> >   from catalog_items i1
> >   inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
> >   where i1.ItemID in (
> >   select i2.ItemID from catalog_items i2
> >     inner join catalog_brands b2 on
> > upper(i2.brands)=upper(b2.name) where i1.brands=i2.brands
> >     and i2.dataPub>(now() - interval '8 month') and
> > i2.datainserimento>(now() - interval '6 month') order by
> > i2.datainserimento desc limit 3);
>
> This sub-select is non optimizable because you've got an outer
> reference in it, which compels re-evaluating it at every row of
> the outer query. Try recasting as
>
> explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
>   from catalog_items i1
>   inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
>   where (i1.ItemID, i1.brands) in (
>   select i2.ItemID, i2.brands from catalog_items i2
>     inner join catalog_brands b2 on upper(i2.brands)=upper(b2.name)
>     where
>     i2.dataPub>(now() - interval '8 month') and
> i2.datainserimento>(now() - interval '6 month') order by
> i2.datainserimento desc limit 3);

It's not doing what was doing the previous.

I know the concept of the previous one was correct since once I
placed stuff in a temp I finally got results in a reasonable time.
Yours is returning 3 records and not 3 records for each brands and I
know there are more than 3 record that satisfy the query.

the inner query doesn't have any relationship with the outer... so
it returns 3 records and the outer just pick up the same returned
record.

Were you trying to write something different?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

{SOLVED?] Re: functional index not used, looping simpler query just faster

From

Ivan Sergio Borgonovo

Date:

10 July 2008, 14:51:46

On Thu, 10 Jul 2008 10:46:53 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> This sub-select is non optimizable because you've got an outer
> reference in it, which compels re-evaluating it at every row of
> the outer query. Try recasting as

> explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
>   from catalog_items i1
>   inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
>   where (i1.ItemID, i1.brands) in (
>   select i2.ItemID, i2.brands from catalog_items i2
>     inner join catalog_brands b2 on upper(i2.brands)=upper(b2.name)
>     where
>     i2.dataPub>(now() - interval '8 month') and
> i2.datainserimento>(now() - interval '6 month') order by
> i2.datainserimento desc limit 3);

I came up with this. I'm still curious to know if this could be done
efficiently with just one query.

create table catalog_topbybrands (
                ItemID bigint not null,
                Code varchar(32) not null,
                Name varchar(256) not null,
                Brands varchar(1024),
                Authors varchar(1024),
                ISBN varchar(100),
                dataInserimento timestamp,
                dataPub timestamp
            );
create table catalog_topbybrands_working (
                ItemID bigint not null,
                Brands varchar(1024),
                dataInserimento timestamp,
                dataPub timestamp
            );


create or replace function TopByBrands()
    returns void
    as
    $$
    declare
     _row catalog_brands%ROWTYPE;
    begin
     truncate table catalog_topbybrands;
     truncate table catalog_topbybrands_working;
     insert into catalog_topbybrands_working
      (ItemID, Brands, dataInserimento, dataPub)
      select i.ItemID, i.Brands, dataInserimento, dataPub from
       catalog_items i
       inner join catalog_brands b on upper(b.Name)=upper(i.Brands)
       where
        i.dataPub>(now() - interval '18 month')
        and i.dataInserimento>(now() - interval '8 month')
        and i.dataPub is not null and i.dataInserimento is not null
        order by i.dataInserimento, i.dataPub;
     for _row in (select * from catalog_brands) loop
      insert into catalog_topbybrands
       (ItemID, Code, Name, Brands, Authors, ISBN, dataInserimento, dataPub)
       select i.ItemID, i.Code, i.Name, i.Brands, i.Authors, i.ISBN, i.dataInserimento, i.dataPub
       from catalog_topbybrands_working w
       join catalog_items i on i.ItemID=w.ItemID
       where upper(w.Brands)=upper(_row.name)
       order by dataInserimento desc,  dataPub desc limit 3;
     end loop;
     return;
    end;
    $$ language plpgsql volatile;

just a working prototype. In fact considering that once filtered by
date etc... the temp table is very small it may perform better
avoiding last join in the last insert.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Top N within groups?

From

Klint Gore

Date:

11 July 2008, 00:27:46

[was {SOLVED?] Re: [GENERAL] functional index not used, looping simpler
query just faster]
Ivan Sergio Borgonovo wrote:
> I'm still curious to know if this could be done
> efficiently with just one query.
>
[thinking out loud]
Can someone familiar with the source for DISTINCT ON comment on how hard
it would be to add another parameter to return more than one row?

e.g.
To do TOP 1 within an unknown number of groups
   select distinct on (groupid) groupid, identifier, count(*)
   from somequery
   group by groupid, identifier
   order by 1,3 desc,2;

I'm thinking that, for the top 3 within each group, something like
   select distinct on (groupid) FOR 3 groupid, identifier, count(*)
   from somequery
   group by groupid, identifier
   order by 1,3 desc,2;

For Ivan's case, groupid = brand, identifer = item. The where clause
applies the date limits.

klint.

--
Klint Gore
Database Manager
Sheep CRC
A.G.B.U.
University of New England
Armidale NSW 2350

Ph: 02 6773 3789
Fax: 02 6773 3266
EMail: kgore4@une.edu.au

Re: Top N within groups?

From

Martijn van Oosterhout

Date:

11 July 2008, 04:48:07

On Fri, Jul 11, 2008 at 01:24:28PM +1000, Klint Gore wrote:
> [thinking out loud]
> Can someone familiar with the source for DISTINCT ON comment on how hard
> it would be to add another parameter to return more than one row?

From a programming point of view, it wouldn't be too hard. However,
deciding what syntax to use is a another question entirely. I think
your suggestion of FOR isn't good (ambiguous syntax, 'for' could be a
column name), but I can't think of a better one.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

signature.asc

Re: Top N within groups?

From

hubert depesz lubaczewski

Date:

12 July 2008, 12:07:33

On Fri, Jul 11, 2008 at 01:24:28PM +1000, Klint Gore wrote:
> Can someone familiar with the source for DISTINCT ON comment on how hard
> it would be to add another parameter to return more than one row?

you can make top-n per group quite easily using the technique described
here:
http://www.depesz.com/index.php/2007/08/17/rownum-anyone-cumulative-sum-in-one-query/

best regards,

depesz

Re: Top N within groups?

From

Klint Gore

Date:

14 July 2008, 02:27:39

Martijn van Oosterhout wrote:
> On Fri, Jul 11, 2008 at 01:24:28PM +1000, Klint Gore wrote:
> > [thinking out loud]
> > Can someone familiar with the source for DISTINCT ON comment on how hard
> > it would be to add another parameter to return more than one row?
>
> From a programming point of view, it wouldn't be too hard. However,
> deciding what syntax to use is a another question entirely. I think
> your suggestion of FOR isn't good (ambiguous syntax, 'for' could be a
> column name), but I can't think of a better one.
>
'for' is a reserved key word so can't be a column name.

http://www.postgresql.org/docs/8.3/interactive/sql-keywords-appendix.html

postgres=# create table foo (for int);
ERROR:  syntax error at or near "for"

The worst I think you could get would be
  select distinct on ("for") for 4 "for" from table4 for update;

but even then, I think the parser could work out what you want.

klint.

--
Klint Gore
Database Manager
Sheep CRC
A.G.B.U.
University of New England
Armidale NSW 2350

Ph: 02 6773 3789
Fax: 02 6773 3266
EMail: kgore4@une.edu.au

how to found a variable is in a aggregation or not?

From

Yi Zhao

Date:

15 July 2008, 08:09:24

I want to check a variable is in a aggregattion or not, so I create a
function as below:

create or replace function anytest(val text) returns boolean as $$
begin
    perform 1 where quote_literal(val) in ('hello', 'world', 'test');
    if not found then
        return false;
    else
        return true;
    end if;
end;
$$ language plpgsql;

but when I used, I got the result below, why?

test=# select anytest('world111');
 anytest
---------
 f
(1 row)

test=# select anytest('world');
 anytest
---------
 f
(1 row)


any help is appreciated.

regards,
Zy

Re: how to found a variable is in a aggregation or not?

From

"Pavel Stehule"

Date:

15 July 2008, 08:31:45

Hello

in this case you must not use quoting

postgres=# create or replace function anytest(val text) returns boolean as $$
begin
       perform 1 where val in ('hello', 'world', 'test');
       if not found then
               return false;
       else
               return true;
       end if;
end;
$$ language plpgsql;
CREATE FUNCTION
Time: 3,342 ms
postgres=# select anytest('hello');
 anytest
---------
 t
(1 row)

Time: 42,034 ms
postgres=# select anytest('helloa');
 anytest
---------
 f
(1 row)

Time: 0,468 ms
postgres=#

you have to use quoting only together dynamic sql, etc EXECUTE statement

regards
Pavel Stehule

2008/7/15 Yi Zhao <yi.zhao@alibaba-inc.com>:
> I want to check a variable is in a aggregattion or not, so I create a
> function as below:
>
> create or replace function anytest(val text) returns boolean as $$
> begin
>        perform 1 where quote_literal(val) in ('hello', 'world', 'test');
>        if not found then
>                return false;
>        else
>                return true;
>        end if;
> end;
> $$ language plpgsql;
>
> but when I used, I got the result below, why?
>
> test=# select anytest('world111');
>  anytest
> ---------
>  f
> (1 row)
>
> test=# select anytest('world');
>  anytest
> ---------
>  f
> (1 row)
>
>
> any help is appreciated.
>
> regards,
> Zy
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: how to found a variable is in a aggregation or not?

From

Yi Zhao

Date:

15 July 2008, 10:29:53

it's works,
thanks a lot!

regards,
Yi
On Tue, 2008-07-15 at 13:30 +0200, Pavel Stehule wrote:
> Hello
>
> in this case you must not use quoting
>
> postgres=# create or replace function anytest(val text) returns boolean as $$
> begin
>        perform 1 where val in ('hello', 'world', 'test');
>        if not found then
>                return false;
>        else
>                return true;
>        end if;
> end;
> $$ language plpgsql;
> CREATE FUNCTION
> Time: 3,342 ms
> postgres=# select anytest('hello');
>  anytest
> ---------
>  t
> (1 row)
>
> Time: 42,034 ms
> postgres=# select anytest('helloa');
>  anytest
> ---------
>  f
> (1 row)
>
> Time: 0,468 ms
> postgres=#
>
> you have to use quoting only together dynamic sql, etc EXECUTE statement
>
> regards
> Pavel Stehule
>
> 2008/7/15 Yi Zhao <yi.zhao@alibaba-inc.com>:
> > I want to check a variable is in a aggregattion or not, so I create a
> > function as below:
> >
> > create or replace function anytest(val text) returns boolean as $$
> > begin
> >        perform 1 where quote_literal(val) in ('hello', 'world', 'test');
> >        if not found then
> >                return false;
> >        else
> >                return true;
> >        end if;
> > end;
> > $$ language plpgsql;
> >
> > but when I used, I got the result below, why?
> >
> > test=# select anytest('world111');
> >  anytest
> > ---------
> >  f
> > (1 row)
> >
> > test=# select anytest('world');
> >  anytest
> > ---------
> >  f
> > (1 row)
> >
> >
> > any help is appreciated.
> >
> > regards,
> > Zy
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >
>

Re: how to found a variable is in a aggregation or not?

From

"Pavel Stehule"

Date:

15 July 2008, 10:46:59

so this code is little bit ugly

you can write faster code

create or replace function anytest(val text)
returns boolean as $$
begin
  return val in ('hello', 'world','test');
end;
$$ language plpgsql immutable strict;

Pavel

2008/7/15 Yi Zhao <yi.zhao@alibaba-inc.com>:
> I want to check a variable is in a aggregattion or not, so I create a
> function as below:
>
> create or replace function anytest(val text) returns boolean as $$
> begin
>        perform 1 where quote_literal(val) in ('hello', 'world', 'test');
>        if not found then
>                return false;
>        else
>                return true;
>        end if;
> end;
> $$ language plpgsql;
>
> but when I used, I got the result below, why?
>
> test=# select anytest('world111');
>  anytest
> ---------
>  f
> (1 row)
>
> test=# select anytest('world');
>  anytest
> ---------
>  f
> (1 row)
>
>
> any help is appreciated.
>
> regards,
> Zy
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: how to found a variable is in a aggregation or not?

From

Yi Zhao

Date:

15 July 2008, 22:07:06

yes!!!!

It's better obviously,

thanks:D

Yi
On Tue, 2008-07-15 at 15:46 +0200, Pavel Stehule wrote:
> so this code is little bit ugly
>
> you can write faster code
>
> create or replace function anytest(val text)
> returns boolean as $$
> begin
>   return val in ('hello', 'world','test');
> end;
> $$ language plpgsql immutable strict;
>
> Pavel
>
> 2008/7/15 Yi Zhao <yi.zhao@alibaba-inc.com>:
> > I want to check a variable is in a aggregattion or not, so I create a
> > function as below:
> >
> > create or replace function anytest(val text) returns boolean as $$
> > begin
> >        perform 1 where quote_literal(val) in ('hello', 'world', 'test');
> >        if not found then
> >                return false;
> >        else
> >                return true;
> >        end if;
> > end;
> > $$ language plpgsql;
> >
> > but when I used, I got the result below, why?
> >
> > test=# select anytest('world111');
> >  anytest
> > ---------
> >  f
> > (1 row)
> >
> > test=# select anytest('world');
> >  anytest
> > ---------
> >  f
> > (1 row)
> >
> >
> > any help is appreciated.
> >
> > regards,
> > Zy
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >
>