Thread: Query performance

Query performance

From
"Bill"
Date:

Actually, I have some queries that are slow, however I was wondering if you could help me write a query that is rather simple, but I, as a true database novice, can't seem to conjure.  So we have stocks, as I have previously said, and I have a huge table which contains all of the opening and closing prices of some stocks from each day.  What I like to do, in English, for each stock in each day is find a ratio: abs(closing-opening)/opening.  Then I would like to average all of the ratios of each day of each individual stock together to find a final ratio for each stock, then I would like to find the highest average, to find the best performing stock.  So what query can I use, and (as is appropriate for this group), how can it be optimized to run the fastest?

 

Re: Query performance

From
Richard Huxton
Date:
Bill wrote:
> Actually, I have some queries that are slow, however I was wondering if you
> could help me write a query that is rather simple, but I, as a true database
> novice, can't seem to conjure.  So we have stocks, as I have previously
> said, and I have a huge table which contains all of the opening and closing
> prices of some stocks from each day.

Schemas, Bill - show us your table definitions so people can see exactly
where they stand.


--
   Richard Huxton
   Archonet Ltd

Re: Query performance

From
"Bill"
Date:
Ok....so here lies the output of oclh (i.e "\d oclh")

                    Table "public.oclh"
 Column |         Type          |           Modifiers
--------+-----------------------+-------------------------------
 symbol | character varying(10) | not null default ''
 date   | date                  | not null default '0001-01-01'
 open   | numeric(12,2)         | not null default '0.00'
 close  | numeric(12,2)         | not null default '0.00'
 low    | numeric(12,2)         | not null default '0.00'
 high   | numeric(12,2)         | not null default '0.00'
Indexes: symbol_2_oclh_index btree (symbol, date),
         symbol_oclh_index btree (symbol, date)

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Richard Huxton
Sent: Monday, June 28, 2004 4:14 AM
To: Bill
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query performance

Bill wrote:
> Actually, I have some queries that are slow, however I was wondering if
you
> could help me write a query that is rather simple, but I, as a true
database
> novice, can't seem to conjure.  So we have stocks, as I have previously
> said, and I have a huge table which contains all of the opening and
closing
> prices of some stocks from each day.

Schemas, Bill - show us your table definitions so people can see exactly
where they stand.


--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Re: Query performance

From
Richard Huxton
Date:
Bill wrote:
> Ok....so here lies the output of oclh (i.e "\d oclh")
>
>                     Table "public.oclh"
>  Column |         Type          |           Modifiers
> --------+-----------------------+-------------------------------
>  symbol | character varying(10) | not null default ''
>  date   | date                  | not null default '0001-01-01'
>  open   | numeric(12,2)         | not null default '0.00'
>  close  | numeric(12,2)         | not null default '0.00'
>  low    | numeric(12,2)         | not null default '0.00'
>  high   | numeric(12,2)         | not null default '0.00'
> Indexes: symbol_2_oclh_index btree (symbol, date),
>          symbol_oclh_index btree (symbol, date)

Well, I'm not sure why the two indexes on the same columns, and I'm not
sure it makes sense to have defaults for _any_ of the columns there.

So - you want:
1. ratio = abs(closing-opening)/opening
2. average = all the ratios of each day of each stock
3. Highest average

Well, I don't know what you mean by #2, but #1 is just:

SELECT
   symbol,
   "date",
   abs(close - open)/open AS ratio
FROM
   oclh
GROUP BY
   symbol, date;

I'd probably fill in a summary table with this and use that as the basis
for your further queries. Presumably from "yesterday" back, the
ratios/averages won't change.

--
   Richard Huxton
   Archonet Ltd

Re: Query performance

From
"Bill"
Date:
Ok, thanks.  So let me explain the query number 2 as this is the more
difficult to write.  So I have a list of stocks, this table contains the
price of all of the stocks at the open and close date.  Ok, now we have a
ratio from query (1) that returns at least a very rough index of the daily
performance of a given stock, with each ratio representing the stock's
performance in one day.  Now we need to average this with the same stock's
ratio every day, to get a total average for each stock contained in the
database.  Now I would simply like to find a ratio like this that represents
the average of every stock in the table and simply find the greatest ratio.
Sorry about the lousy explanation before, is this a bit better?

Here is an example if needed.

Say we have a stock by the name of YYY

I know, due to query 1 that stock YYY has a abs(close-open)/open price ratio
of for example, 1.3 on Dec 1 and (for simplicity let's say we only have two
dates) and Dec 2 the ratio for YYY is 1.5. So the query averages and gets
1.4.  Now it needs to do this for all of the stocks in the table and sort by
increasing ratio.


Thanks.


-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Richard Huxton
Sent: Tuesday, June 29, 2004 3:38 AM
To: Bill
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query performance

Bill wrote:
> Ok....so here lies the output of oclh (i.e "\d oclh")
>
>                     Table "public.oclh"
>  Column |         Type          |           Modifiers
> --------+-----------------------+-------------------------------
>  symbol | character varying(10) | not null default ''
>  date   | date                  | not null default '0001-01-01'
>  open   | numeric(12,2)         | not null default '0.00'
>  close  | numeric(12,2)         | not null default '0.00'
>  low    | numeric(12,2)         | not null default '0.00'
>  high   | numeric(12,2)         | not null default '0.00'
> Indexes: symbol_2_oclh_index btree (symbol, date),
>          symbol_oclh_index btree (symbol, date)

Well, I'm not sure why the two indexes on the same columns, and I'm not
sure it makes sense to have defaults for _any_ of the columns there.

So - you want:
1. ratio = abs(closing-opening)/opening
2. average = all the ratios of each day of each stock
3. Highest average

Well, I don't know what you mean by #2, but #1 is just:

SELECT
   symbol,
   "date",
   abs(close - open)/open AS ratio
FROM
   oclh
GROUP BY
   symbol, date;

I'd probably fill in a summary table with this and use that as the basis
for your further queries. Presumably from "yesterday" back, the
ratios/averages won't change.

--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: Query performance

From
Richard Huxton
Date:
Bill wrote:
> Ok, thanks.  So let me explain the query number 2 as this is the more
> difficult to write.  So I have a list of stocks, this table contains the
> price of all of the stocks at the open and close date.  Ok, now we have a
> ratio from query (1) that returns at least a very rough index of the daily
> performance of a given stock, with each ratio representing the stock's
> performance in one day.  Now we need to average this with the same stock's
> ratio every day, to get a total average for each stock contained in the
> database.  Now I would simply like to find a ratio like this that represents
> the average of every stock in the table and simply find the greatest ratio.
> Sorry about the lousy explanation before, is this a bit better?
>
> Here is an example if needed.
>
> Say we have a stock by the name of YYY
>
> I know, due to query 1 that stock YYY has a abs(close-open)/open price ratio
> of for example, 1.3 on Dec 1 and (for simplicity let's say we only have two
> dates) and Dec 2 the ratio for YYY is 1.5. So the query averages and gets
> 1.4.  Now it needs to do this for all of the stocks in the table and sort by
> increasing ratio.

Well, the simplest would be something like:

CREATE VIEW my_ratios AS SELECT ...(select details we used for #1
previously)

Query #1 then becomes:
SELECT * FROM my_ratios;

Then you could do:
SELECT
   symbol,
   avg(ratio) as ratio_avg
FROM
   my_ratios
GROUP BY
   symbol
ORDER BY
   avg(ratio)
;

Now, in practice, I'd probably create a symbol_ratio table and fill that
one day at a time. Then #2,#3 would be easier.

--
   Richard Huxton
   Archonet Ltd

Re: Query performance

From
Bruno Wolff III
Date:
On Tue, Jun 29, 2004 at 12:33:51 -0500,
  Bill <bill@math.uchicago.edu> wrote:
> Ok, thanks.  So let me explain the query number 2 as this is the more
> difficult to write.  So I have a list of stocks, this table contains the
> price of all of the stocks at the open and close date.  Ok, now we have a
> ratio from query (1) that returns at least a very rough index of the daily
> performance of a given stock, with each ratio representing the stock's
> performance in one day.  Now we need to average this with the same stock's
> ratio every day, to get a total average for each stock contained in the
> database.  Now I would simply like to find a ratio like this that represents
> the average of every stock in the table and simply find the greatest ratio.
> Sorry about the lousy explanation before, is this a bit better?

You can do something like:

SELECT symbol, avg((open-close)/open) GROUP BY symbol
  ORDER BY avg((open-close)/open) DESC LIMIT 1;

If you aren't interested in the variance of the daily change, it seems like
you would be best off using the opening price for the first day you have
recorded for the stock and the closing price on the last day and looking
at the relative change.

Re: Query performance

From
"Bill"
Date:
Thanks this query works for what I want.  So here is an output of the
explain analyze:
                                                                     QUERY
PLAN
----------------------------------------------------------------------------
------------------------------------------------------------------------
 Limit  (cost=2421582.59..2421582.65 rows=25 width=29) (actual
time=1985800.32..1985800.44 rows=25 loops=1)
   ->  Sort  (cost=2421582.59..2424251.12 rows=1067414 width=29) (actual
time=1985800.31..1985800.35 rows=26 loops=1)
         Sort Key: avg(((open - "close") / (open + 1::numeric)))
         ->  Aggregate  (cost=2200163.04..2280219.09 rows=1067414 width=29)
(actual time=910291.94..1984972.93 rows=22362 loops=1)
               ->  Group  (cost=2200163.04..2253533.74 rows=10674140
width=29) (actual time=910085.96..1105064.28 rows=10674140 loops=1)
                     ->  Sort  (cost=2200163.04..2226848.39 rows=10674140
width=29) (actual time=910085.93..988909.94 rows=10674140 loops=1)
                           Sort Key: symbol
                           ->  Seq Scan on oclh  (cost=0.00..228404.40
rows=10674140 width=29) (actual time=20.00..137720.61 rows=10674140 loops=1)
 Total runtime: 1986748.44 msec
(9 rows)

Can I get any better performance?

Thanks.

-----Original Message-----
From: Bruno Wolff III [mailto:bruno@wolff.to]
Sent: Tuesday, June 29, 2004 2:52 PM
To: Bill
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query performance

On Tue, Jun 29, 2004 at 12:33:51 -0500,
  Bill <bill@math.uchicago.edu> wrote:
> Ok, thanks.  So let me explain the query number 2 as this is the more
> difficult to write.  So I have a list of stocks, this table contains the
> price of all of the stocks at the open and close date.  Ok, now we have a
> ratio from query (1) that returns at least a very rough index of the daily
> performance of a given stock, with each ratio representing the stock's
> performance in one day.  Now we need to average this with the same stock's
> ratio every day, to get a total average for each stock contained in the
> database.  Now I would simply like to find a ratio like this that
represents
> the average of every stock in the table and simply find the greatest
ratio.
> Sorry about the lousy explanation before, is this a bit better?

You can do something like:

SELECT symbol, avg((open-close)/open) GROUP BY symbol
  ORDER BY avg((open-close)/open) DESC LIMIT 1;

If you aren't interested in the variance of the daily change, it seems like
you would be best off using the opening price for the first day you have
recorded for the stock and the closing price on the last day and looking
at the relative change.


Re: Query performance

From
Rod Taylor
Date:
> Can I get any better performance?

You can try bumping your sort memory way up (for this query only).

Another method would be to cluster the table by the symbol column
(eliminates the expensive sort).

If you could run a very simple calculation against open & close numbers
to eliminate a majority of symbols early, that would be useful as well.



Re: Query performance

From
"Mischa Sandberg"
Date:
Usually, when you post a request like this, you should provide something a little more concrete (the CREATE TABLE statement for that table, with
Since you didn't, I'll posit something that sounds like what you're using, and take a stab at your problem.
 
TABLE Prices (
    stock    VARCHAR(9)
    ,asof         DATE,
    ,opening   MONEY
    ,closing    MONEY
    ,PRIMARY KEY (stock, asof)
    )
 
SELECT    stock, AVG((closing-opening)/opening) as ratio
FROM    Prices
GROUP BY stock
ORDER BY ratio DESC LIMIT 10;    -- top 10 best-performing stocks.
 

Actually, I have some queries that are slow, however I was wondering if you could help me write a query that is rather simple, but I, as a true database novice, can't seem to conjure.  So we have stocks, as I have previously said, and I have a huge table which contains all of the opening and closing prices of some stocks from each day.  What I like to do, in English, for each stock in each day is find a ratio: abs(closing-opening)/opening.  Then I would like to average all of the ratios of each day of each individual stock together to find a final ratio for each stock, then I would like to find the highest average, to find the best performing stock.  So what query can I use, and (as is appropriate for this group), how can it be optimized to run the fastest?