Thread: Query performance
Actually, I have some queries that are slow, however I was wondering if you could help me write a query that is rather simple, but I, as a true database novice, can't seem to conjure. So we have stocks, as I have previously said, and I have a huge table which contains all of the opening and closing prices of some stocks from each day. What I like to do, in English, for each stock in each day is find a ratio: abs(closing-opening)/opening. Then I would like to average all of the ratios of each day of each individual stock together to find a final ratio for each stock, then I would like to find the highest average, to find the best performing stock. So what query can I use, and (as is appropriate for this group), how can it be optimized to run the fastest?
Bill wrote: > Actually, I have some queries that are slow, however I was wondering if you > could help me write a query that is rather simple, but I, as a true database > novice, can't seem to conjure. So we have stocks, as I have previously > said, and I have a huge table which contains all of the opening and closing > prices of some stocks from each day. Schemas, Bill - show us your table definitions so people can see exactly where they stand. -- Richard Huxton Archonet Ltd
Ok....so here lies the output of oclh (i.e "\d oclh") Table "public.oclh" Column | Type | Modifiers --------+-----------------------+------------------------------- symbol | character varying(10) | not null default '' date | date | not null default '0001-01-01' open | numeric(12,2) | not null default '0.00' close | numeric(12,2) | not null default '0.00' low | numeric(12,2) | not null default '0.00' high | numeric(12,2) | not null default '0.00' Indexes: symbol_2_oclh_index btree (symbol, date), symbol_oclh_index btree (symbol, date) -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Richard Huxton Sent: Monday, June 28, 2004 4:14 AM To: Bill Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query performance Bill wrote: > Actually, I have some queries that are slow, however I was wondering if you > could help me write a query that is rather simple, but I, as a true database > novice, can't seem to conjure. So we have stocks, as I have previously > said, and I have a huge table which contains all of the opening and closing > prices of some stocks from each day. Schemas, Bill - show us your table definitions so people can see exactly where they stand. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Bill wrote: > Ok....so here lies the output of oclh (i.e "\d oclh") > > Table "public.oclh" > Column | Type | Modifiers > --------+-----------------------+------------------------------- > symbol | character varying(10) | not null default '' > date | date | not null default '0001-01-01' > open | numeric(12,2) | not null default '0.00' > close | numeric(12,2) | not null default '0.00' > low | numeric(12,2) | not null default '0.00' > high | numeric(12,2) | not null default '0.00' > Indexes: symbol_2_oclh_index btree (symbol, date), > symbol_oclh_index btree (symbol, date) Well, I'm not sure why the two indexes on the same columns, and I'm not sure it makes sense to have defaults for _any_ of the columns there. So - you want: 1. ratio = abs(closing-opening)/opening 2. average = all the ratios of each day of each stock 3. Highest average Well, I don't know what you mean by #2, but #1 is just: SELECT symbol, "date", abs(close - open)/open AS ratio FROM oclh GROUP BY symbol, date; I'd probably fill in a summary table with this and use that as the basis for your further queries. Presumably from "yesterday" back, the ratios/averages won't change. -- Richard Huxton Archonet Ltd
Ok, thanks. So let me explain the query number 2 as this is the more difficult to write. So I have a list of stocks, this table contains the price of all of the stocks at the open and close date. Ok, now we have a ratio from query (1) that returns at least a very rough index of the daily performance of a given stock, with each ratio representing the stock's performance in one day. Now we need to average this with the same stock's ratio every day, to get a total average for each stock contained in the database. Now I would simply like to find a ratio like this that represents the average of every stock in the table and simply find the greatest ratio. Sorry about the lousy explanation before, is this a bit better? Here is an example if needed. Say we have a stock by the name of YYY I know, due to query 1 that stock YYY has a abs(close-open)/open price ratio of for example, 1.3 on Dec 1 and (for simplicity let's say we only have two dates) and Dec 2 the ratio for YYY is 1.5. So the query averages and gets 1.4. Now it needs to do this for all of the stocks in the table and sort by increasing ratio. Thanks. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Richard Huxton Sent: Tuesday, June 29, 2004 3:38 AM To: Bill Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query performance Bill wrote: > Ok....so here lies the output of oclh (i.e "\d oclh") > > Table "public.oclh" > Column | Type | Modifiers > --------+-----------------------+------------------------------- > symbol | character varying(10) | not null default '' > date | date | not null default '0001-01-01' > open | numeric(12,2) | not null default '0.00' > close | numeric(12,2) | not null default '0.00' > low | numeric(12,2) | not null default '0.00' > high | numeric(12,2) | not null default '0.00' > Indexes: symbol_2_oclh_index btree (symbol, date), > symbol_oclh_index btree (symbol, date) Well, I'm not sure why the two indexes on the same columns, and I'm not sure it makes sense to have defaults for _any_ of the columns there. So - you want: 1. ratio = abs(closing-opening)/opening 2. average = all the ratios of each day of each stock 3. Highest average Well, I don't know what you mean by #2, but #1 is just: SELECT symbol, "date", abs(close - open)/open AS ratio FROM oclh GROUP BY symbol, date; I'd probably fill in a summary table with this and use that as the basis for your further queries. Presumably from "yesterday" back, the ratios/averages won't change. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Bill wrote: > Ok, thanks. So let me explain the query number 2 as this is the more > difficult to write. So I have a list of stocks, this table contains the > price of all of the stocks at the open and close date. Ok, now we have a > ratio from query (1) that returns at least a very rough index of the daily > performance of a given stock, with each ratio representing the stock's > performance in one day. Now we need to average this with the same stock's > ratio every day, to get a total average for each stock contained in the > database. Now I would simply like to find a ratio like this that represents > the average of every stock in the table and simply find the greatest ratio. > Sorry about the lousy explanation before, is this a bit better? > > Here is an example if needed. > > Say we have a stock by the name of YYY > > I know, due to query 1 that stock YYY has a abs(close-open)/open price ratio > of for example, 1.3 on Dec 1 and (for simplicity let's say we only have two > dates) and Dec 2 the ratio for YYY is 1.5. So the query averages and gets > 1.4. Now it needs to do this for all of the stocks in the table and sort by > increasing ratio. Well, the simplest would be something like: CREATE VIEW my_ratios AS SELECT ...(select details we used for #1 previously) Query #1 then becomes: SELECT * FROM my_ratios; Then you could do: SELECT symbol, avg(ratio) as ratio_avg FROM my_ratios GROUP BY symbol ORDER BY avg(ratio) ; Now, in practice, I'd probably create a symbol_ratio table and fill that one day at a time. Then #2,#3 would be easier. -- Richard Huxton Archonet Ltd
On Tue, Jun 29, 2004 at 12:33:51 -0500, Bill <bill@math.uchicago.edu> wrote: > Ok, thanks. So let me explain the query number 2 as this is the more > difficult to write. So I have a list of stocks, this table contains the > price of all of the stocks at the open and close date. Ok, now we have a > ratio from query (1) that returns at least a very rough index of the daily > performance of a given stock, with each ratio representing the stock's > performance in one day. Now we need to average this with the same stock's > ratio every day, to get a total average for each stock contained in the > database. Now I would simply like to find a ratio like this that represents > the average of every stock in the table and simply find the greatest ratio. > Sorry about the lousy explanation before, is this a bit better? You can do something like: SELECT symbol, avg((open-close)/open) GROUP BY symbol ORDER BY avg((open-close)/open) DESC LIMIT 1; If you aren't interested in the variance of the daily change, it seems like you would be best off using the opening price for the first day you have recorded for the stock and the closing price on the last day and looking at the relative change.
Thanks this query works for what I want. So here is an output of the explain analyze: QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------------------------------------ Limit (cost=2421582.59..2421582.65 rows=25 width=29) (actual time=1985800.32..1985800.44 rows=25 loops=1) -> Sort (cost=2421582.59..2424251.12 rows=1067414 width=29) (actual time=1985800.31..1985800.35 rows=26 loops=1) Sort Key: avg(((open - "close") / (open + 1::numeric))) -> Aggregate (cost=2200163.04..2280219.09 rows=1067414 width=29) (actual time=910291.94..1984972.93 rows=22362 loops=1) -> Group (cost=2200163.04..2253533.74 rows=10674140 width=29) (actual time=910085.96..1105064.28 rows=10674140 loops=1) -> Sort (cost=2200163.04..2226848.39 rows=10674140 width=29) (actual time=910085.93..988909.94 rows=10674140 loops=1) Sort Key: symbol -> Seq Scan on oclh (cost=0.00..228404.40 rows=10674140 width=29) (actual time=20.00..137720.61 rows=10674140 loops=1) Total runtime: 1986748.44 msec (9 rows) Can I get any better performance? Thanks. -----Original Message----- From: Bruno Wolff III [mailto:bruno@wolff.to] Sent: Tuesday, June 29, 2004 2:52 PM To: Bill Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query performance On Tue, Jun 29, 2004 at 12:33:51 -0500, Bill <bill@math.uchicago.edu> wrote: > Ok, thanks. So let me explain the query number 2 as this is the more > difficult to write. So I have a list of stocks, this table contains the > price of all of the stocks at the open and close date. Ok, now we have a > ratio from query (1) that returns at least a very rough index of the daily > performance of a given stock, with each ratio representing the stock's > performance in one day. Now we need to average this with the same stock's > ratio every day, to get a total average for each stock contained in the > database. Now I would simply like to find a ratio like this that represents > the average of every stock in the table and simply find the greatest ratio. > Sorry about the lousy explanation before, is this a bit better? You can do something like: SELECT symbol, avg((open-close)/open) GROUP BY symbol ORDER BY avg((open-close)/open) DESC LIMIT 1; If you aren't interested in the variance of the daily change, it seems like you would be best off using the opening price for the first day you have recorded for the stock and the closing price on the last day and looking at the relative change.
> Can I get any better performance? You can try bumping your sort memory way up (for this query only). Another method would be to cluster the table by the symbol column (eliminates the expensive sort). If you could run a very simple calculation against open & close numbers to eliminate a majority of symbols early, that would be useful as well.
Actually, I have some queries that are slow, however I was wondering if you could help me write a query that is rather simple, but I, as a true database novice, can't seem to conjure. So we have stocks, as I have previously said, and I have a huge table which contains all of the opening and closing prices of some stocks from each day. What I like to do, in English, for each stock in each day is find a ratio: abs(closing-opening)/opening. Then I would like to average all of the ratios of each day of each individual stock together to find a final ratio for each stock, then I would like to find the highest average, to find the best performing stock. So what query can I use, and (as is appropriate for this group), how can it be optimized to run the fastest?