Thread: difficulties with time based queries

From:
"Rainer Mager"
Date:

I have a somewhat large table (more than 100 million rows) that contains log data with start_time and end_time columns. When I try to do queries on this table I always find them slower than what I need and what I believe should be possible.

 

For example, I limited the following query to just a single day and it still is much slower than what I would expect. In reality I need to do queries that span a few weeks.

 

explain analyze select * from ad_log where date(start_time) < date('2009-03-31') and date(start_time) >= date('2009-03-30');

 

 

Bitmap Heap Scan on ad_log  (cost=73372.57..3699152.24 rows=2488252 width=32) (actual time=49792.862..64611.255 rows=2268490 loops=1)

   Recheck Cond: ((date(start_time) < '2009-03-31'::date) AND (date(start_time) >= '2009-03-30'::date))

   ->  Bitmap Index Scan on ad_log_date_all  (cost=0.00..72750.51 rows=2488252 width=0) (actual time=49776.332..49776.332 rows=2268490 loops=1)

         Index Cond: ((date(start_time) < '2009-03-31'::date) AND (date(start_time) >= '2009-03-30'::date))

 Total runtime: 65279.352 ms

 

 

The definition of the table is:

 

   Column   |            Type             |                         Modifiers

------------+-----------------------------+------------------------------------------------------------

 ad_log_id  | integer                     | not null default nextval('ad_log_ad_log_id_seq'::regclass)

 channel    | integer                     | not null

 player     | integer                     | not null

 ad         | integer                     | not null

 start_time | timestamp without time zone |

 end_time   | timestamp without time zone |

Indexes:

    "ad_log_pkey" PRIMARY KEY, btree (ad_log_id)

    "ad_log_unique" UNIQUE, btree (channel, player, ad, start_time, end_time)

    "ad_log_ad" btree (ad)

    "ad_log_ad_date" btree (ad, date(start_time))

    "ad_log_channel" btree (channel)

    "ad_log_channel_date" btree (channel, date(start_time))

    "ad_log_date_all" btree (date(start_time), channel, player, ad)

    "ad_log_player" btree (player)

    "ad_log_player_date" btree (player, date(start_time))

Foreign-key constraints:

    "ad_log_ad_fkey" FOREIGN KEY (ad) REFERENCES ads(id)

    "ad_log_channel_fkey" FOREIGN KEY (channel) REFERENCES channels(id)

    "ad_log_player_fkey" FOREIGN KEY (player) REFERENCES players_history(id)

Triggers:

    rollup_ad_logs_daily AFTER INSERT ON ad_log FOR EACH ROW EXECUTE PROCEDURE rollup_ad_logs_daily()

 

 

Any suggestions would be appreciated.

 

--Rainer

From:
David Wilson
Date:

On Sun, Apr 5, 2009 at 7:26 PM, Rainer Mager <> wrote:
> Bitmap Heap Scan on ad_log  (cost=73372.57..3699152.24 rows=2488252
> width=32) (actual time=49792.862..64611.255 rows=2268490 loops=1)
>
>    Recheck Cond: ((date(start_time) < '2009-03-31'::date) AND
> (date(start_time) >= '2009-03-30'::date))
>
>    ->  Bitmap Index Scan on ad_log_date_all  (cost=0.00..72750.51
> rows=2488252 width=0) (actual time=49776.332..49776.332 rows=2268490
> loops=1)
>
>          Index Cond: ((date(start_time) < '2009-03-31'::date) AND
> (date(start_time) >= '2009-03-30'::date))
>
>  Total runtime: 65279.352 ms

The stats look good and it's using a viable index for your query. What
kind of hardware is this on, and what are the relevant postgresql.conf
lines? (Or, for that matter, what does iostat say while this query's
running?)

--
- David T. Wilson


From:
PFC
Date:

> When I try to do queries on this
> table I always find them slower than what I need and what I believe
> should be possible.
>
>    ->  Bitmap Index Scan on ad_log_date_all  (cost=0.00..72750.51
> rows=2488252 width=0) (actual time=49776.332..49776.332 rows=2268490
> loops=1)
>
>          Index Cond: ((date(start_time) < '2009-03-31'::date) AND
> (date(start_time) >= '2009-03-30'::date))
>
>  Total runtime: 65279.352 ms

    Well, it is grabbing 2.268.490 rows, that's a lot of rows, so it is not
going to be very fast like a few milliseconds.
    Your columns are small, ints, dates, not large text strings which would
augment the total amount of data.
    So your timing looks pretty slow, it should be faster than this, maybe a
few seconds.

    With this quantity of rows, you want to try to make the disk accesses as
linear as possible.
    This means your table should be organized on disk by date, at least
roughly.
    If your data comes from an import that was sorted on some other column,
this may not be the case.

    What kind of bytes/s do you get from the drives ?

=> Can you post the result of "vmstat 1" during the entire execution of
the query ?

    2 phases should be visible in the vmstat output, the indexscan, and the
bitmap heapscan.

    You could use CLUSTER on the table (it will take a long time), or simply
create another table and INSERT INTO ... SELECT ORDER BY date. This will
also take a long time, but faster than CLUSTER. Then you could recreate
the indexes.

    Do you UPDATE or DELETE a lot from this table ? Is it vacuum'd enough ?



From:
Tom Lane
Date:

"Rainer Mager" <> writes:
> explain analyze select * from ad_log where date(start_time) <
> date('2009-03-31') and date(start_time) >= date('2009-03-30');

> Bitmap Heap Scan on ad_log  (cost=73372.57..3699152.24 rows=2488252
> width=32) (actual time=49792.862..64611.255 rows=2268490 loops=1)
>    Recheck Cond: ((date(start_time) < '2009-03-31'::date) AND
> (date(start_time) >= '2009-03-30'::date))
>    ->  Bitmap Index Scan on ad_log_date_all  (cost=0.00..72750.51
> rows=2488252 width=0) (actual time=49776.332..49776.332 rows=2268490
> loops=1)
>          Index Cond: ((date(start_time) < '2009-03-31'::date) AND
> (date(start_time) >= '2009-03-30'::date))
>  Total runtime: 65279.352 ms

Hmm ... it's pretty unusual to see the index fetch portion of a bitmap
scan take the bulk of the runtime.  Usually that part is fast and where
the pain comes is in fetching from the heap.   I wonder whether that
index has become bloated.  How big are the table and the index
physically?  (Look at pg_class.relpages, or if you want a really
accurate number try pg_relation_size().)

What Postgres version is this, exactly?

BTW, I think you've gone way overboard in your indexing of this table;
those indexes are certainly consuming well more space than the table
does, and a lot of them are redundant.

            regards, tom lane

From:
"Rainer Mager"
Date:

Thanks for all the replies, I'll try to address the follow up questions:

> From: David Wilson [mailto:]
>
> The stats look good and it's using a viable index for your query. What
> kind of hardware is this on, and what are the relevant postgresql.conf
> lines? (Or, for that matter, what does iostat say while this query's
> running?)

I'm running on Windows, so I don't have iostat, but perfmon tells me my Avg.
Disk Queue Length went up to 1.2 during the query (versus a normal value of
about 0.02). Also disk throughput was at about 1.2 MB/s during the query. I
don't know how much of this is random versus linear.



> From: PFC [mailto:]
>
>     With this quantity of rows, you want to try to make the disk
> accesses as
> linear as possible.
>     This means your table should be organized on disk by date, at
> least
> roughly.
>     If your data comes from an import that was sorted on some other
> column,
> this may not be the case.
>
>     What kind of bytes/s do you get from the drives ?

The data should be mostly ordered by date. It is all logged in semi-realtime
such that 99% will be logged within an hour of the timestamp. Also, as
stated above, during this query it was about 1.2 MB/s, which I know isn't
great. I admit this isn't the best hardware in the world, but I would expect
better than that for linear queries.

>     Do you UPDATE or DELETE a lot from this table ? Is it vacuum'd
> enough ?

No, this table has no UPDATEs or DELETEs. It is auto vacuum'd, but no manual
vacuuming.

In regards to clustering, I'm hesitant to do that unless I have no other
choice. My understanding is that I would need to do periodic re-clustering
to maintain it, and during that time the table is very busy.


> From: Tom Lane [mailto:]
> Hmm ... it's pretty unusual to see the index fetch portion of a bitmap
> scan take the bulk of the runtime.  Usually that part is fast and where
> the pain comes is in fetching from the heap.   I wonder whether that
> index has become bloated.  How big are the table and the index
> physically?  (Look at pg_class.relpages, or if you want a really
> accurate number try pg_relation_size().)

Can you give me some more info on how to look at these stats? That is,
what/where is pg_class.relpages, etc. I'll also do some searching for this
info.

> What Postgres version is this, exactly?

8.3.3

> BTW, I think you've gone way overboard in your indexing of this table;
> those indexes are certainly consuming well more space than the table
> does, and a lot of them are redundant.

Agreed, I need to look carefully at all of the queries we do on this table
and reduce this.



--Rainer


From:
Tom Lane
Date:

"Rainer Mager" <> writes:
>> From: Tom Lane [mailto:]
>> Hmm ... it's pretty unusual to see the index fetch portion of a bitmap
>> scan take the bulk of the runtime.  Usually that part is fast and where
>> the pain comes is in fetching from the heap.   I wonder whether that
>> index has become bloated.  How big are the table and the index
>> physically?  (Look at pg_class.relpages, or if you want a really
>> accurate number try pg_relation_size().)

> Can you give me some more info on how to look at these stats?

Since you've got 8.3 it's easy: select pg_relation_size('tablename')
(or indexname).  The result is in bytes, so you might want to
divide by 1K or 1M to keep the number readable.

            regards, tom lane

From:
"Rainer Mager"
Date:


> -----Original Message-----
> From: Tom Lane [mailto:]
> "Rainer Mager" <> writes:
> >> From: Tom Lane [mailto:]
> >> Hmm ... it's pretty unusual to see the index fetch portion of a
> bitmap
> >> scan take the bulk of the runtime.  Usually that part is fast and
> where
> >> the pain comes is in fetching from the heap.   I wonder whether that
> >> index has become bloated.  How big are the table and the index
> >> physically?  (Look at pg_class.relpages, or if you want a really
> >> accurate number try pg_relation_size().)
>
> > Can you give me some more info on how to look at these stats?
>
> Since you've got 8.3 it's easy: select pg_relation_size('tablename')
> (or indexname).  The result is in bytes, so you might want to
> divide by 1K or 1M to keep the number readable.

Ok, nice and simple...I like it:

The result for the table ad_log, is 30,063 MB. The result for the index,
ad_log_date_all, is 17,151 MB. I guess this roughly makes sense since the
index is on 4 fields and the table only has 6 fields.

For the particular query I'm trying to optimize at the moment I believe I
should be able to use an index that references only 2 fields, which, I
imagine, should reduce the time needed to read it. I'll play with this a bit
and see what happens.

Any other suggestions?

--Rainer




From:
Robert Haas
Date:

On Sun, Apr 5, 2009 at 11:35 PM, Rainer Mager <> wrote:
>> -----Original Message-----
>> From: Tom Lane [mailto:]
>> "Rainer Mager" <> writes:
>> >> From: Tom Lane [mailto:]
>> >> Hmm ... it's pretty unusual to see the index fetch portion of a
>> bitmap
>> >> scan take the bulk of the runtime.  Usually that part is fast and
>> where
>> >> the pain comes is in fetching from the heap.   I wonder whether that
>> >> index has become bloated.  How big are the table and the index
>> >> physically?  (Look at pg_class.relpages, or if you want a really
>> >> accurate number try pg_relation_size().)
>>
>> > Can you give me some more info on how to look at these stats?
>>
>> Since you've got 8.3 it's easy: select pg_relation_size('tablename')
>> (or indexname).  The result is in bytes, so you might want to
>> divide by 1K or 1M to keep the number readable.
>
> Ok, nice and simple...I like it:
>
> The result for the table ad_log, is 30,063 MB. The result for the index,
> ad_log_date_all, is 17,151 MB. I guess this roughly makes sense since the
> index is on 4 fields and the table only has 6 fields.
>
> For the particular query I'm trying to optimize at the moment I believe I
> should be able to use an index that references only 2 fields, which, I
> imagine, should reduce the time needed to read it. I'll play with this a bit
> and see what happens.

Even if your query "could use" an index four fields, a lot of times it
won't be the winning strategy, because it means reading a lot more
data from the disk.  Plus, all of these huge indices are competing for
RAM with data from the table itself.  You might want to think about
getting rid of all of the indices with more than 1 or 2 columns.
ad_log_unique is probably huge and it seems like it's probably not
improving your data integrity as much as you might think...

...Robert

From:
Matthew Wakeling
Date:

On Mon, 6 Apr 2009, Rainer Mager wrote:
> The data should be mostly ordered by date. It is all logged in semi-realtime
> such that 99% will be logged within an hour of the timestamp. Also, as
> stated above, during this query it was about 1.2 MB/s, which I know isn't
> great. I admit this isn't the best hardware in the world, but I would expect
> better than that for linear queries.

Might you have an unbalanced index tree? Reindexing would also solve that
problem.

Matthew

--
 There are only two kinds of programming languages: those people always
 bitch about and those nobody uses. (Bjarne Stroustrup)

From:
"Rainer Mager"
Date:

So, I defragged my disk and reran my original query and it got a little better, but still far higher than I'd like. I then rebuilt (dropped and recreated) the ad_log_date_all index and reran the query and it is quite a bit better:

 

# explain analyze select * from ad_log where date(start_time) < date('2009-03-31') and date(start_time) >= date('2009-03-30');

                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on ad_log  (cost=64770.21..3745596.62 rows=2519276 width=32) (actual time=1166.479..13862.107 rows=2275167 loops=1)

   Recheck Cond: ((date(start_time) < '2009-03-31'::date) AND (date(start_time) >= '2009-03-30'::date))

   ->  Bitmap Index Scan on ad_log_date_all  (cost=0.00..64140.39 rows=2519276 width=0) (actual time=1143.582..1143.582 rows=2275167 loops=1)

         Index Cond: ((date(start_time) < '2009-03-31'::date) AND (date(start_time) >= '2009-03-30'::date))

 Total runtime: 14547.885 ms

 

 

During the query the disk throughput peaked at 30MB/s and was mostly at around 20MB/s, much better.

 

So, a few questions:

 

What can I do to prevent the index from getting bloated, or in whatever state it was in?

 

What else can I do to further improve queries on this table? Someone suggested posting details of my conf file. Which settings are most likely to be useful for this?

 

Any other suggestions?

 

 

Thanks,

 

--Rainer

From:
"Rainer Mager"
Date:

Thanks for all of the suggestions so far. I’ve been trying to reduce the number of indices I have, but I’m running into a problem. I have a need to do queries on this table with criteria applied to the date and possibly any or all of the other key columns. As a reminder, here’s my table:

 

                                         Table "public.ad_log"

   Column   |            Type             |                         Modifiers

------------+-----------------------------+------------------------------------------------------------

 ad_log_id  | integer                     | not null default nextval('ad_log_ad_log_id_seq'::regclass)

 channel    | integer                     | not null

 player     | integer                     | not null

 ad         | integer                     | not null

 start_time | timestamp without time zone |

 end_time   | timestamp without time zone |

 

So, I need indices that make it fast querying against start_time as well as all possible combinations of channel, player, and ad. Below is a sample query that uses all of these (note that I’ve removed actual strings to protect customer data). The result is fine in terms of speed, but since it’s using the ad_log_ad_date index I’m wondering what the best strategy is to cover queries that don’t specify an ad. Should I have 4 indices, one with just the start_time (for when no other columns are specified) and the other three each with the start_time and the three other criteria: channel, player, and ad? I’m currently experimenting with various options, but since it takes a couple of hours to create a particular index this is taking a while.

 

 

# explain analyze SELECT ad_log.ad_log_id, channels.name as channel_name, players.name as player_name, ads.name as ad_name, start_time, end_time, (data IS NOT NULL) AS has_screenshot FROM channels, players, players_history, ads,  ad_log LEFT OUTER JOIN ad_log_screenshot USING (ad_log_id) WHERE channel=channels.id AND player=players_history.id AND players_history.player_instance = players.id AND ad=ads.id AND channels.name LIKE '<some channel>' AND players.name LIKE '<some player>' AND ads.name LIKE '<some ad>' AND  date(start_time) BETWEEN '2009-01-20' AND date('2009-01-21') ORDER BY channels.name, players.name, start_time, ads.name LIMIT 100 OFFSET 100;

 

                                                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=7425.26..7425.26 rows=1 width=120) (actual time=1256.116..1256.202 rows=39 loops=1)

   ->  Sort  (cost=7425.26..7425.26 rows=1 width=120) (actual time=1255.965..1256.068 rows=139 loops=1)

         Sort Key: channels.name, players.name, ad_log.start_time, ads.name

         Sort Method:  quicksort  Memory: 35kB

         ->  Nested Loop Left Join  (cost=0.01..7425.25 rows=1 width=120) (actual time=179.086..1255.451 rows=139 loops=1)

               ->  Nested Loop  (cost=0.01..7417.06 rows=1 width=88) (actual time=137.488..1212.531 rows=139 loops=1)

                     Join Filter: (ad_log.channel = channels.id)

                     ->  Nested Loop  (cost=0.01..7415.73 rows=1 width=60) (actual time=120.308..1192.867 rows=139 loops=1)

                           Join Filter: (players_history.id = ad_log.player)

                           ->  Nested Loop  (cost=0.00..36.92 rows=1 width=17) (actual time=21.960..23.405 rows=1 loops=1)

                                 Join Filter: (players.id = players_history.player_instance)

                                 ->  Seq Scan on players  (cost=0.00..11.80 rows=1 width=17) (actual time=5.981..6.083 rows=1 loops=1)

                                       Filter: (name ~~ '<some player>'::text)

                                 ->  Seq Scan on players_history  (cost=0.00..14.50 rows=850 width=8) (actual time=15.880..16.592 rows=850 loops=1)

                           ->  Nested Loop  (cost=0.01..7371.03 rows=622 width=51) (actual time=75.161..1156.076 rows=15600 loops=1)

                                 ->  Seq Scan on ads  (cost=0.00..72.79 rows=1 width=27) (actual time=15.776..31.975 rows=1 loops=1)

                                       Filter: (name ~~ '<some ad>'::text)

                                 ->  Index Scan using ad_log_ad_date on ad_log  (cost=0.01..7267.77 rows=2438 width=32) (actual time=59.375..1095.229 rows=15600 loops=1)

                                       Index Cond: ((ad_log.ad = ads.id) AND (date(ad_log.start_time) >= '2009-01-20'::date) AND (date(ad_log.start_time) <= '2009-01-21'::date))

                     ->  Seq Scan on channels  (cost=0.00..1.31 rows=1 width=36) (actual time=0.128..0.132 rows=1 loops=139)

                           Filter: (channels.name ~~ '<some channel>'::text)

               ->  Index Scan using ad_log_screenshot_pkey on ad_log_screenshot  (cost=0.00..8.18 rows=1 width=36) (actual time=0.304..0.304 rows=0 loops=139)

                     Index Cond: (ad_log.ad_log_id = ad_log_screenshot.ad_log_id)

 Total runtime: 1256.572 ms

 

 

 

Thanks,

 

--Rainer

From:
Tom Lane
Date:

"Rainer Mager" <> writes:
> So, I need indices that make it fast querying against start_time as well as
> all possible combinations of channel, player, and ad.

There's some general principles in the manual --- have you read
http://www.postgresql.org/docs/8.3/static/indexes.html
especially 11.3 and 11.5?

            regards, tom lane

From:
PFC
Date:

> What can I do to prevent the index from getting bloated, or in whatever
> state it was in?
>
>
> What else can I do to further improve queries on this table? Someone
> suggested posting details of my conf file. Which settings are most
> likely to
> be useful for this?

    If you often do range queries on date, consider partitioning your table
by date (something like 1 partition per month).
    Of course, if you also often do range queries on something other than
date, and uncorrelated, forget it.

    If you make a lot of big aggregate queries, consider materialized views :

    Like "how many games player X won this week", etc

    - create "helper" tables which contain the query results
    - every night, recompute the results taking into account the most recent
data
    - don't recompute results based on old data that never changes

    This is only interesting if the aggregation reduces the data volume by
"an appreciable amount". For instance, if you run a supermarket with 1000
distinct products in stock and you sell 100.000 items a day, keeping a
cache of "count of product X sold each day" will reduce your data load by
about 100 on the query "count of product X sold this month".

    The two suggestion above are not mutually exclusive.

    You could try bizgres also. Or even MySQL !... MySQL's query engine is
slower than pg but the tables take much less space than Postgres, and it
can do index-only queries. So you can fit more in the cache. This is only
valid for MyISAM (InnoDB is a bloated hog). Of course, noone would want to
use MyISAM for the "safe" storage, but it's pretty good as a read-only
storage. You can even use the Archive format for even more compactness and
use of cache. Of course you'd have to devise a way to dump from pg and
load into MySQL but that's not hard. MySQL can be good if you target a
table with lots of small rows with a few ints, all of them in a
multicolumn index, so it doesn't need to hit the table itself.

    Note that one in his right mind would never run aggregate queries on a
live R/W MyISAM table since the long queries will block all writes and
blow up the reaction time. But for a read-only cache updated at night, or
replication slave, it's okay.

From:
Nikolas Everett
Date:

       If you often do range queries on date, consider partitioning your table by date (something like 1 partition per month).
       Of course, if you also often do range queries on something other than date, and uncorrelated, forget it.

If you pick your partition to line up with your queries than you can probably do away with the date index.  Even if it doesn't always line up perfectly its worth considering.



       If you make a lot of big aggregate queries, consider materialized views :

       Like "how many games player X won this week", etc

       - create "helper" tables which contain the query results
       - every night, recompute the results taking into account the most recent data
       - don't recompute results based on old data that never changes

       This is only interesting if the aggregation reduces the data volume by "an appreciable amount". For instance, if you run a supermarket with 1000 distinct products in stock and you sell 100.000 items a day, keeping a cache of "count of product X sold each day" will reduce your data load by about 100 on the query "count of product X sold this month".

This obviously creates some administration overhead.  So long as this is manageable for you this is a great solution.  You might also want to look at Mondrian at http://mondrian.pentaho.org/ .  It takes some tinkering but buys you some neat views into your data and automatically uses those aggregate tables.
 

Nik Everett