Thread: Many joins: monthly summaries S-L--O--W

Many joins: monthly summaries S-L--O--W

From

Michael Glaesmann

Date:

21 October 2003, 09:10:34

Hello all!

I've been working with PostgreSQL now for about a year and have really
enjoyed it. And I've been able to impress the boss with very simple
PHP/PostgreSQL web applications, so that's been great too!

Nearly all I've done has been very straightforward and basic: simple
selects from a couple of different tables joined by where clauses. But
now I'd like to do something a bit more complex: show sales per month,
current inventory, and expected duration of inventory at present sales
rates for each item we have. All in one relation :)

Here are the relevant relations

products (code TEXT PK, name TEXT)

orders (product_code TEXT FK products(id), date DATE, branch_id INTEGER
FK branches(id), qty INTEGER)

inventory (product_code INTEGER FK products(id), date DATE, qty INTEGER)

What I'm trying to make is a table with columns
product_code,
product_name,
jan03_qty,...,
half1_qty,
jul03_qty, ...
sep03_qty,
half2_qty,
total_qty,
inv_qty,
inv_date,
est_inv_qty,
months_remaining

where product_name is products(name), jan03_qty is the quantity of
sales in January, 2003, half1_qty is quantity of sales from January
through June, 2003, inv_qty is the latest inventory data we have,
inv_date is the date of that inventory data, est_inv_qty is the
estimated current inventory based on inv_qty and sales since inv_date,
and months_remaining is an estimate of how many months the estimated
inventory will last at average sales rates (calculated using the
average monthly sales for the previous months).

I've got something that works, but it's *extremely* slow. It takes
about 10 minutes running on a 500MHz G4 Mac with 1GB RAM, running OS
10.2.8 and PostgreSQL 7.3.3. Here's what I've done:

select
    products.code as product_code,
    products.name as product_name,
    jan03.qty as jan03_qty,
    feb03.qty as feb03_qty ...
    inv.qty as inv_qty,
    est_inv.qty as est_inv_qty,
    months_remaining::numeric(8,1)
from products
    left join (
        select products.code as product_code, sum(qty) as qty
        from orders, products
        where products.code = orders.id and
        date between '2003-01-01' and '2003-01-31'
        group by product_code) as
    jan03 on (jan03.product_code = products.code)
    left join (
        select products.code as product_code, sum(qty) as qty
        from orders, products
        where products.code = orders.id and
        date between '2003-02-01' and '2003-02-28'
        group by product_code) as
    feb03 on (feb03.product_code = products.code)
    left join
-- repeat through total_qty
    total on (total.product_code = products.code)
    left join (

-- this is where it get's hairy

        select
                     est_inventory.product_code as product_code,
            est_inventory.qty,
                     monthly_averages.monthly_average,
                     (est_inventory.qty/monthly_average)::numeric(10,1)
                as months_remaining
                 from (
                     select
                             inventory.product_code as product_code,
                inventory.qty - coalesce(orders.qty,0) as qty
                         from (
                                 select product_code, date, qty
                                 from current_inventory_view
                                 ) as
                inventory
                             left outer join (
                                 select orders.product_code as product_code,
                    sum(orders.qty) as qty
                                 from (
                                         select product_code, date, qty
                                         from current_inventory_view
                                         ) as
                    inventory,
                    orders
                                 where
                                     orders.date > inventory.date and
                                     orders.product_code = inventory.product_code
                                 group by orders.product_code
                    ) as
                orders on (inventory.product_code = orders.product_code)
                         ) as
            est_inventory
                     left outer join (
                         select
                             product_code as product_code,
                             sum(qty)/ageinmonths(timestamp '9/30/2003',
timestamp '1/1/2003')                 as monthly_average
                         from orders
                         where date between '1/1/2003' and '9/30/2003'
                         group by product_code
                         ) as
            monthly_averages on
                (est_inventory.product_code = monthly_averages.product_code)
                     where monthly_average > 0
                     ) as
    remaining on (remaining.product_code = products.code)
             left join (
               select distinct product_code, date, qty
                 from current_inventory_view order by date desc
                 ) as
    inventory on (inventory.product_code = products.code)
    ; -- finally

ageinmonths is an sql function that returns a double precision float,
the number of months between $1 and $2, defined as follows
    select
    12 * date_part ('year',age($1, $2)) +
    date_part('month',age($1,$2)) +
    date_part('day'), age($1, $2))/30;

current_inventory_view, showing the most recent inventory qty and date,
is defined as
    SELECT inv.date, inv.product_code, inv.qty FROM inventory inv, (SELECT
max(inventory.date) AS date, inventory.product_code FROM inventory
GROUP BY inventory.product_code) curr_inv WHERE ((inv.date =
curr_inv.date) AND (inv.product_code = curr_inv.product_code));

orders is indexed on date and product_code (and branch_id, but that
shouldn't matter)
inventory is indexed on date
products is unindexed besides the order_pkey index that's
auto-generated.

I'm using left joins because not all items sell each month. Items that
have zero sales would fall out if I used WHERE clauses.

Now, I've done an EXPLAIN on the monstrosity and it comes to just over
3 pages printed at 8 pt on A3 landscape. It's much more than I can take
in. I've looked through the EXPLAIN documentation and am willing to
learn (which is good, because I definitely need to).

I tried a version of this, truncating the 'hairy' part. It definitely
ran faster, taking less than 5 minutes.

Also, this query requires more upkeep than I'd like. I modify it every
time I want to add a new month. It seems like I should be able to use a
custom function so I can just put in the date span I'd like to cover.

I've started to work on a function and have run into some problems (see
separate post: Custom function problems if you're interested )

Any suggestion would be appreciated. It seems like this should be
something that's easy, and I'm approaching it wrong. If I'm completely
on the wrong track, I'd love to know! Also, places I should/could look
for ways to accomplish this, that'd be great.

If you've made it to here, thanks for your perseverance!  : )

Regards,
Michael Glaesemann
grzm myrealbox com

Re: Many joins: monthly summaries S-L--O--W

From

Josh Berkus

Date:

21 October 2003, 13:49:33

Micheal,

> where product_name is products(name), jan03_qty is the quantity of
> sales in January, 2003, half1_qty is quantity of sales from January
> through June, 2003, inv_qty is the latest inventory data we have,
> inv_date is the date of that inventory data, est_inv_qty is the
> estimated current inventory based on inv_qty and sales since inv_date,
> and months_remaining is an estimate of how many months the estimated
> inventory will last at average sales rates (calculated using the
> average monthly sales for the previous months).

Well, I wouldn't recommend your left outer join approach.   You're making the
query do far too much work.   There are a number of different ways to solve
the "crosstab" problem, and the outer join method is only really good for
small data sets.

I'd suggest instead that you use the "aggregate grid" method:

Construct a table like this, called month_xtab:

month_no    jan_ct  feb_ct  mar_ct ....       half_ct       half2_ct
1        1        0        0        1        0
2        0        1        0        1        0
3        0        0        1        1        0
...
12        0        0        0        0        1

Then you can do monthly crosstabs like:

SELECT item, SUM(no_sold * jan_ct) as jan_qty, SUM(no_sold * feb_ct) as
feb_qty .... SUM (no_sold * half_ct) as half_qty, SUM(no_sold) as tot_qty
FROM sales, month_xtab
WHERE (extract(month from sales_date) = month_no and extract(year from
sales_date) = 2003)

This is much, much faster than the outer join method for large numbers of
columns.  For better performance, make an index on extract(month from
sales_date).

This doesn't solve your whole query problem, but it's a good start.

This solution, and other tips, can be found in Joe Celko's "SQL for Smarties"

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Many joins: monthly summaries S-L--O--W

From

Tom Lane

Date:

21 October 2003, 14:28:09

Michael Glaesmann <grzm@myrealbox.com> writes:
> [ a very messy query ]

It does seem like that is a big query with a small query struggling to
get out.  I don't have any immediate suggestions, but I wonder whether
you wouldn't profit by reading something about how to do crosstabs and
statistics in SQL.  Joe Celko's book "SQL For Smarties" is an invaluable
resource for hard problems in SQL (be sure to get the 2nd edition).
Also, if you decide that a crosstab would help, take a look at
the crosstab functions in contrib/tablefunc.  (Celko's book only covers
SQL-standard solutions, not methods that depend on nonstandard features,
so he's at a disadvantage when covering crosstab methods.)

Also, I do have a suggestion for this:

> current_inventory_view, showing the most recent inventory qty and date,
> is defined as
>     SELECT inv.date, inv.product_code, inv.qty FROM inventory inv, (SELECT
> max(inventory.date) AS date, inventory.product_code FROM inventory
> GROUP BY inventory.product_code) curr_inv WHERE ((inv.date =
> curr_inv.date) AND (inv.product_code = curr_inv.product_code));

If you don't mind using a Postgres-specific feature, you should be able
to make this view faster by using DISTINCT ON.  Look at the "weather
reports" example in the SELECT reference page.

            regards, tom lane

Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Michael Glaesmann

Date:

22 October 2003, 14:08:33

I've implemented the aggregate grid method, building a "xtab" table as
outlined. The orders relation has about 300K records, which isn't that
big from the numbers thrown around on these lists. The basic query
(just the monthly columns, none of the inventory mess) took about 2
minutes. That's going through the 300K records and creating monthly
totals. Something tells me there's room for improvement, so I'll keep
trying. (Adding the inventory joins slows it back down to about 10
minutes, so there's a lot of room for improvement there, but one thing
at a time, right?)

On Wednesday, Oct 22, 2003, at 01:48 Asia/Tokyo, Josh Berkus wrote:

<excerpt>I'd suggest instead that you use the "aggregate grid" method:

</excerpt><<snip/>

<excerpt>This is much, much faster than the outer join method for
large numbers of

columns.  For better performance, make an index on extract(month from

sales_date).

</excerpt>

Searching for ways to improve performance, I tried to create a index
on the extract function, but for some reason I couldn't get it to
work. Following the documentation for CREATE INDEX and EXTRACT, I tried

CREATE INDEX dborders_extract_month_idx ON dborders (EXTRACT(MONTH
from date));

which gave me

ERROR:  parser: parse error at or near "(" at character 61

I also tried

CREATE INDEX dborders_extract_month_idx ON dborders EXTRACT(MONTH from
date);

ERROR:  parser: parse error at or near "EXTRACT" at character 53

and just for good measure, not that I thought it work

CREATE INDEX dborders_extract_month_idx on dborders EXTRACT(MONTH from
(date));

ERROR:  parser: parse error at or near "EXTRACT" at character 53

What am I missing from the CREATE statement? It seems in line with the
documentation for CREATE INDEX:

<fontfamily><param>Courier</param>CREATE [ UNIQUE ] INDEX
<italic>index_name </italic>ON <italic>table </italic>[ USING
<italic>acc_method </italic>] ( <italic>func_name
</italic>(<italic>column </italic>[, ... ]) [ <italic>ops_name
</italic>] )

    [ WHERE <italic>predicate </italic>]</fontfamily>

and the example given in 8.5 Functional Indexes of the User's Guide:

<fontfamily><param>Courier</param>CREATE INDEX test1_lower_col1_idx ON
test1 (lower(col1));

I know this isn't the solution to all my problems, but I am a bit
curious why my CREATE INDEX isn't working.

Further debugging attempts:

Try a different fuction.

CREATE INDEX dborders_date_trunc_idx on dborders
(date_trunc('month',date));

This failed with

ERROR:  parser: parse error at or near "'month'" at character 62

I even renamed the 'date' column (type timestamp with timezone) to
'datetime' on the off chance that using and SQL key word (though not a
PostgreSQL key word) was causing some problem. Same errors.

What is that simple thing I'm overlooking? Any ideas what else I
should check? (I did a search on 'functional index' in the list
archives but kept getting timed out :(

Michael

</fontfamily>
I've implemented the aggregate grid method, building a "xtab" table as
outlined. The orders relation has about 300K records, which isn't that
big from the numbers thrown around on these lists. The basic query
(just the monthly columns, none of the inventory mess) took about 2
minutes. That's going through the 300K records and creating monthly
totals. Something tells me there's room for improvement, so I'll keep
trying. (Adding the inventory joins slows it back down to about 10
minutes, so there's a lot of room for improvement there, but one thing
at a time, right?)

On Wednesday, Oct 22, 2003, at 01:48 Asia/Tokyo, Josh Berkus wrote:

> I'd suggest instead that you use the "aggregate grid" method:
<snip/>
> This is much, much faster than the outer join method for large numbers
> of
> columns.  For better performance, make an index on extract(month from
> sales_date).

Searching for ways to improve performance, I tried to create a index on
the extract function, but for some reason I couldn't get it to work.
Following the documentation for CREATE INDEX and EXTRACT, I tried

CREATE INDEX dborders_extract_month_idx ON dborders (EXTRACT(MONTH from
date));

which gave me
ERROR:  parser: parse error at or near "(" at character 61

I also tried
CREATE INDEX dborders_extract_month_idx ON dborders EXTRACT(MONTH from
date);
ERROR:  parser: parse error at or near "EXTRACT" at character 53

and just for good measure, not that I thought it work

CREATE INDEX dborders_extract_month_idx on dborders EXTRACT(MONTH from
(date));
ERROR:  parser: parse error at or near "EXTRACT" at character 53

What am I missing from the CREATE statement? It seems in line with the
documentation for CREATE INDEX:
CREATE [ UNIQUE ] INDEX index_name ON table [ USING acc_method ] (
func_name (column [, ... ]) [ ops_name ] )
     [ WHERE predicate ]

and the example given in 8.5 Functional Indexes of the User's Guide:
CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));

I know this isn't the solution to all my problems, but I am a bit
curious why my CREATE INDEX isn't working.

Further debugging attempts:
Try a different fuction.
CREATE INDEX dborders_date_trunc_idx on dborders
(date_trunc('month',date));
This failed with
ERROR:  parser: parse error at or near "'month'" at character 62

I even renamed the 'date' column (type timestamp with timezone) to
'datetime' on the off chance that using and SQL key word (though not a
PostgreSQL key word) was causing some problem. Same errors.

What is that simple thing I'm overlooking? Any ideas what else I should
check? (I did a search on 'functional index' in the list archives but
kept getting timed out :(

Michael

Re: Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Tom Lane

Date:

22 October 2003, 14:45:43

Michael Glaesmann <grzm@myrealbox.com> writes:
> Searching for ways to improve performance, I tried to create a index on
> the extract function, but for some reason I couldn't get it to work.
> Following the documentation for CREATE INDEX and EXTRACT, I tried

> CREATE INDEX dborders_extract_month_idx ON dborders (EXTRACT(MONTH from
> date));
> ERROR:  parser: parse error at or near "(" at character 61

You can't do that in pre-7.4 releases; the syntax of a functional index
can only be "ON table (func(col1,col2,...))" --- that is, a simple,
standard-notation function applied to one or more columns of the table.
So to do this, you'd need to create an intermediate function along
the lines of "month_trunc(date)"; and you'd have to use it in your
queries as well as in the index definition.

7.4 is more flexible though --- it will take the above as long as you
put an extra set of parentheses in there...

            regards, tom lane

Re: Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Josh Berkus

Date:

22 October 2003, 14:56:46

Michael,

> What is that simple thing I'm overlooking? Any ideas what else I should
> check? (I did a search on 'functional index' in the list archives but
> kept getting timed out :(

Oh, sorry.  There's an implementation issue with funcional indexes, where they
can't take parameters other than column names.  So you need to do:

CREATE FUNCTION get_month (
    TIMESTAMPTZ ) RETURNS INTEGER AS
' SELECT EXTRACT(MONTH from $1); '
LANGUAGE sql IMMUTABLE STRICT;

Then do

CREATE INDEX dborders_date_trunc_idx on dborders
(get_month(date));


--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Michael Glaesmann

Date:

22 October 2003, 15:35:34

On Thursday, Oct 23, 2003, at 02:44 Asia/Tokyo, Tom Lane wrote:

> Michael Glaesmann <grzm@myrealbox.com> writes:
>> CREATE INDEX dborders_extract_month_idx ON dborders (EXTRACT(MONTH
>> from
>> date));
>> ERROR:  parser: parse error at or near "(" at character 61
>
> You can't do that in pre-7.4 releases; the syntax of a functional index
> can only be "ON table (func(col1,col2,...))" --- that is, a simple,
> standard-notation function applied to one or more columns of the table.
> So to do this, you'd need to create an intermediate function along
> the lines of "month_trunc(date)"; and you'd have to use it in your
> queries as well as in the index definition.

On Thursday, Oct 23, 2003, at 02:53 Asia/Tokyo, Josh Berkus wrote:
> Oh, sorry.  There's an implementation issue with funcional indexes,
> where they
> can't take parameters other than column names.  So you need to do:
>
> CREATE FUNCTION get_month (
>     TIMESTAMPTZ ) RETURNS INTEGER AS
> ' SELECT EXTRACT(MONTH from $1); '
> LANGUAGE sql IMMUTABLE STRICT;

Thanks, Tom and Josh! Added a type cast of the extract (which returns a
double precision) and it's all good.

Tom commented:

> 7.4 is more flexible though --- it will take the above as long as you
> put an extra set of parentheses in there...

I took a gander at the documentation for 7.4beta. I can tell it's been
reorganized. There's no longer a specific section on functional indexes
that I can see, though there is mention of it in the SQL CREATE INDEX
entry
<http://developer.postgresql.org/docs/postgres/sql-createindex.html>
The 7.3.2 documents I have say that there cannot be multicolumn
functional indexes, though there's no mention of this in the new
documentation. Does this mean this proscription has been lifted?

Thanks again for your help!

Michael

Re: Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Tom Lane

Date:

22 October 2003, 16:18:36

Michael Glaesmann <grzm@myrealbox.com> writes:
> On Thursday, Oct 23, 2003, at 02:44 Asia/Tokyo, Tom Lane wrote:
>> 7.4 is more flexible though --- it will take the above as long as you
>> put an extra set of parentheses in there...

> I took a gander at the documentation for 7.4beta. I can tell it's been
> reorganized. There's no longer a specific section on functional indexes
> that I can see, though there is mention of it in the SQL CREATE INDEX
> entry
> <http://developer.postgresql.org/docs/postgres/sql-createindex.html>
> The 7.3.2 documents I have say that there cannot be multicolumn
> functional indexes, though there's no mention of this in the new
> documentation. Does this mean this proscription has been lifted?

Yes.  Any column of an index can now be an expression; the former
functional-index capability is now just a special case of "expressional
indexes".  For syntactic reasons we had to require an extra pair of
parens around expressions in the CREATE INDEX statement --- although
it proved possible to not require these when the expression looks like
a standard-syntax function call, thus providing backwards compatibility
with the old functional-index syntax.  See
http://developer.postgresql.org/docs/postgres/indexes-expressional.html
as well as the CREATE INDEX reference page.

I'm not particularly happy with the phrase "expressional index", and
would like to think of something else to use before the 7.4 docs go
final.  Any ideas?

            regards, tom lane

Re: Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Josh Berkus

Date:

22 October 2003, 16:23:29

Michael,

> <http://developer.postgresql.org/docs/postgres/sql-createindex.html>
> The 7.3.2 documents I have say that there cannot be multicolumn
> functional indexes, though there's no mention of this in the new
> documentation. Does this mean this proscription has been lifted?

Yes.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Michael Glaesmann

Date:

22 October 2003, 21:33:43

On Thursday, Oct 23, 2003, at 04:17 Asia/Tokyo, Tom Lane wrote:

<excerpt>I'm not particularly happy with the phrase "expressional
index", and

would like to think of something else to use before the 7.4 docs go

final.  Any ideas?

</excerpt>

Though it might be nice to be creative and name it something like
"Bill", for a defined language like SQL I think it's best to work
within the framework already used to describe the language. Looking in
the PostgreSQL docs (which I'm sure you've done), I find this
definition of "value expression"

<fontfamily><param>Lucida Grande</param><bigger>A value expression is
one of the following:

- A constant or literal value; see
<color><param>0000,0000,6666</param>Section 1.1.2 </color>.

- A column reference.

- A positional parameter reference, in the body of a function
declaration.

- An operator invocation.

- A function call.

- An aggregate expression.

- A type cast.

- A scalar subquery.

- Another value expression in parentheses, useful to group
subexpressions and override precedence.

Preceding this is a paragraph mentioning that value expressions are
also called "scalar expressions" or just "expressions". I was
surprised to find column references are included in this list, but
thinking about it, a column has a scalar value, which is what any
function or operator returns as well. According to the above
description of "value expression", a normal column index is just a
special case of the more general "expression" index.

(Actually, I'm a bit confused with the definition of value expression.
Does column reference here mean value in the same row (and same
table)? That's the only way I can see getting a scalar value. If it
means reference in the constraint meaning, it's a set of allowed
values, rather than a specific one, which definitely wouldn't work for
an index. Perhaps the use of the word "reference" here is
inappropriate, though I haven't looked through the rest of the
documentation to see how the term "column reference" is used. My bad,
I know. I'll definitely accept criticism for not doing my homework
thoroughly.)

To me, an operator is just a special class of function that doesn't
use the more general func(arg [, arg]) construction. 2 + 2 could just
as easily be add(2,2) — and maybe it is already defined this way as
well. I haven't checked.

However, I can see that it might be useful to distinguish between
functions and column references. One term I thought about was "derived
index". Another was "evaluated index". But both of these terms seem to
introduce unnecessary language.

My final thought would be to continue using "functional index", noting
that operators are a special class of function. Out of the above
description of "value expression", I believe only functions and
operators are allowed as expressions in the CREATE INDEX syntax,
correct? (Besides referring to a column.)

What do you think?

Michael</bigger></fontfamily>On Thursday, Oct 23, 2003, at 04:17 Asia/Tokyo, Tom Lane wrote:
> I'm not particularly happy with the phrase "expressional index", and
> would like to think of something else to use before the 7.4 docs go
> final.  Any ideas?

Though it might be nice to be creative and name it something like
"Bill", for a defined language like SQL I think it's best to work
within the framework already used to describe the language. Looking in
the PostgreSQL docs (which I'm sure you've done), I find this
definition of "value expression"

A value expression is one of the following:
- A constant or literal value; see Section 1.1.2 .
- A column reference.
- A positional parameter reference, in the body of a function
declaration.
- An operator invocation.
- A function call.
- An aggregate expression.
- A type cast.
- A scalar subquery.
- Another value expression in parentheses, useful to group
subexpressions and override precedence.

Preceding this is a paragraph mentioning that value expressions are
also called "scalar expressions" or just "expressions". I was surprised
to find column references are included in this list, but thinking about
it, a column has a scalar value, which is what any function or operator
returns as well. According to the above description of "value
expression", a normal column index is just a special case of the more
general "expression" index.

(Actually, I'm a bit confused with the definition of value expression.
Does column reference here mean value in the same row (and same table)?
That's the only way I can see getting a scalar value. If it means
reference in the constraint meaning, it's a set of allowed values,
rather than a specific one, which definitely wouldn't work for an
index. Perhaps the use of the word "reference" here is inappropriate,
though I haven't looked through the rest of the documentation to see
how the term "column reference" is used. My bad, I know. I'll
definitely accept criticism for not doing my homework thoroughly.)

To me, an operator is just a special class of function that doesn't use
the more general func(arg [, arg]) construction. 2 + 2 could just as
easily be add(2,2) — and maybe it is already defined this way as well.
I haven't checked.

However, I can see that it might be useful to distinguish between
functions and column references. One term I thought about was "derived
index". Another was "evaluated index". But both of these terms seem to
introduce unnecessary language.

My final thought would be to continue using "functional index", noting
that operators are a special class of function. Out of the above
description of "value expression", I believe only functions and
operators are allowed as expressions in the CREATE INDEX syntax,
correct? (Besides referring to a column.)

What do you think?

Michael

Re: Functional index problems. (Was: Many joins: monthly summaries S-L--O--W)

From

Tom Lane

Date:

22 October 2003, 23:54:39

Michael Glaesmann <grzm@myrealbox.com> writes:
> My final thought would be to continue using "functional index", noting
> that operators are a special class of function. Out of the above
> description of "value expression", I believe only functions and
> operators are allowed as expressions in the CREATE INDEX syntax,
> correct? (Besides referring to a column.)

Actually, you can use any scalar-valued expression that does not contain
a sub-SELECT or aggregate function.  (Aggregates would be meaningless
--- what are you aggregating over?  The prohibition of sub-SELECTs is
partly because we don't do planning for index expressions, which could
be fixed if anyone were sufficiently motivated, and partly because it
seems quite unlikely that the result of a SELECT could reasonably be
considered immutable.  We do *not* want to get into updating indexes for
reasons other than updates of the indexed row.)

I see your point that "functional index" is still a valid description,
but I'm inclined to bow to Peter's position that it has another meaning
that could distract people.  Presently I'm going to go with "expression
index" unless someone can provide a better choice.

            regards, tom lane