Re: timestamp indexing - Mailing list pgsql-performance
From | Tobias Brox |
---|---|
Subject | Re: timestamp indexing |
Date | |
Msg-id | 20050531022011.GC26100@tobias.exoweb.net Whole thread Raw |
In response to | Re: timestamp indexing (Tobias Brox <tobias@nordicbet.com>) |
Responses |
Re: timestamp indexing
|
List | pgsql-performance |
[Tobias Brox - Tue at 10:06:25AM +0800] > [Tom Lane - Mon at 01:57:54PM -0400] > > Your followup hasn't shown up here yet, > > I'll check up on that and resend it. Hrm ... messed-up mail configuration I suppose. Here we go: Paul McGarry unintentionally sent a request for more details off the list, since it was intended for the list I'll send my reply here. While writing up the reply, and doing research, I discovered that this is not a problem with indexing timestamps per se, but more with a query of the kind "give me 5% of the table"; it seems like it will often prefer to do a full table scan instead of going via the index. I think that when I had my university courses on databases, we also learned about flat indexes, where the whole index has to be rebuilt whenever a field is updated or inserted in the middle, and I also think we learned that the table usually would be sorted physically by the primary key on the disk. As long as we have strictly incrementing primary keys and timestamps, such a setup would probably be more efficient for queries of the kind "give me all activity for the last two weeks"? Here follows my reply to Paul, including some gory details: [Paul McGarry - Mon at 07:59:35PM +1000] > What version of postgresql are you using and what are the exact > datatypes and queries? We are still using 7.4.6, but I suppose that if our issues are completely or partially solved in pg 8, that would make a good case for upgrading :-) The datatypes I'm indexing are timestamp without time zone. Actually I may be on the wrong hunting ground now - the production system froze completely some days ago basically due to heavy iowait and load on the database server, rendering postgresql completely unresponsive - and back then we had too poor logging to find out what queries that was causing it to grind to a halt, and since we've never such a bad problem before, we didn't know how to handle the situation (we just restarted the entire postgresql; if we had been just killing the processes running the rogue database queries, we would have had very good tracks of it in the logs). I digress. The last days I've looked through profiling logs, and I'm checking if the accumulatively worst queries can be tuned somehow. Most of them are big joins, but I'm a bit concerned of the amounts of "Seq Scan" returned by "explain" despite the fact that only a small fraction of the tables are queried. I reduced the problem to a simple "select * from table where created>xxx" and discovered that it still won't use index, and still will be costly (though of course not much compared to the big joined query). The "ticket" table have less than a million rows, around 50k made the last ten days: NBET=> explain analyze select * from ticket where created>'2005-05-20'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on ticket (cost=0.00..19819.91 rows=89553 width=60) (actual time=535.884..1018.268 rows=53060 loops=1) Filter: (created > '2005-05-20 00:00:00'::timestamp without time zone) Total runtime: 1069.514 ms (3 rows) Anyway, it seems to me that "indexing on timestamp" is not the real issue here, because when restricting by primary key (numeric, sequential ID) the execution time is the same or worse, still doing a sequence scan: NBET=> explain analyze select * from ticket where id>711167; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on ticket (cost=0.00..19819.91 rows=92273 width=60) (actual time=550.855..1059.843 rows=53205 loops=1) Filter: (id > 711167) Total runtime: 1110.469 ms (3 rows) I've tried running equivalent queries on a table with twice as many rows and width=180, it will pull from the index both when querying by ID and timestamp, and it will usually spend less time. Running "select * from ticket" seems to execute ~2x slower than when having the restriction. > I have a 7.3 database with a "timestamp with time zone" field and we > have to be very careful to explicitly cast values as that in queries > if it is to use the index correctly. I believe it's an issue that is > cleared up in newer versions though. I suppose so - as said, restricting by primary key didn't improve the performance significantly, so I was clearly wrong indicating that this is a special issue with indexing a timestamp. -- Tobias Brox, Beijing
pgsql-performance by date: