Re: Database slowness -- my design, hardware, or both? - Mailing list pgsql-general
From | Reuven M. Lerner |
---|---|
Subject | Re: Database slowness -- my design, hardware, or both? |
Date | |
Msg-id | 45EDB783.1040402@lerner.co.il Whole thread Raw |
In response to | Re: Database slowness -- my design, hardware, or both? ("Webb Sprague" <webb.sprague@gmail.com>) |
Responses |
Re: Database slowness -- my design, hardware, or both?
|
List | pgsql-general |
Hi, Webb Sprague. You wrote: > ... but I see two seq scans in your explain in a loop -- this is > probably not good. If you can find a way to rewrite the IN clause > (either de-normalizing through triggers to save whatever you need on > an insert and not have to deal with a set, or by using except in the > query, or someing else more creative)... I would normally agree that an IN clause is a place to worry -- except that I'm using IN on a very small table, with about 4-5 rows. That might indeed affect things, and I could certainly pull out these values into a Perl variable that I then interpolate literally into the SQL. However, I have to assume that this wouldn't affect things all that much. > Also -- there is a good book on temporal databases by Snodgrass that > might give some interesting ideas; maybe you have already seen it, but > still. Heh -- that's one of the first references that I looked at when I started this project. Perhaps I'll look at it again; the design of my data warehouse took some of those suggestions into account, but I wouldn't be surprised if there's something more I could be doing. > I am thinking you could increment a sequence variable to give > you a "tick" integer with each action, rather than trying to use > timestamps with all their overhead and inaccuracy (1 second is a long > time, really). Lamport also did work on clocks that might apply. I agree that 1 second is a ridiculously long time. The logfiles were created by people outside of my research group, and are a wonderful lesson in how to use XML poorly and inefficiently. The designers of the logfile weren't expecting to get dozens or hundreds of values per second. You can be sure that my research group, which is now looking to create logfiles of our own, will place a strong emphasis on high-resolution timestamps. Practically speaking, I've managed to get around this problem by using the "id" column, which comes from a sequence, and is thus guaranteed to be increasing. We can thus be assured that a row with an ID of x will come before a row whose ID is x + 10, so long as their transaction types are the same. So we can't guarantee that a node entry comes before a variable setting via the ID, but we can order two variable settings based on the ID. > Also have you tried dropping all your fk and checks just to see if you > get a difference in speed on an update? It would be interesting, > perhaps. I sorta figured that because our UPDATE is taking place on a column without a foreign key, that this wouldn't matter much. Am I wrong? > If you could get rid of the sorted limit 1 clause in your function, > there would be less variablity and make it easier to understand; you > probably need to denormalize somehow, perhaps using ticks; I am not > sure.... Yeah, that's a good point. I'm not sure how possible that'll be, though, given that at *some* point, I need to find the "most recent" setting of the same variable. There might be dozens or hundreds of rows separating the assignment of a new value to a particular variable. > > Could a trigger set your previous_value on insert or update, rather > than querying for it later? Well, what I've been doing is INSERTing each row into the "transactions" table without previous_value. Then I run the mass UPDATE for previous_value, doing nothing else at the same time. Then, after previous_value is set, I do the query. Reuven
pgsql-general by date: