Re: Alter/update large tables - VERRRY annoying behaviour! - Mailing list pgsql-general
From | Nigel J. Andrews |
---|---|
Subject | Re: Alter/update large tables - VERRRY annoying behaviour! |
Date | |
Msg-id | Pine.LNX.4.21.0204152335270.20382-100000@ponder.fairway2k.co.uk Whole thread Raw |
In response to | Re: Alter/update large tables - VERRRY annoying behaviour! (Dmitry Tkach <dmitry@openratings.com>) |
Responses |
Re: Alter/update large tables - VERRRY annoying behaviour!
|
List | pgsql-general |
On Mon, 15 Apr 2002, Dmitry Tkach wrote: > Neil Conway wrote: > >> On Mon, 15 Apr 2002 13:07:20 -0400 >> "Dmitry Tkach" <dmitry@openratings.com> wrote: > >>> This took me awfully long, but worked (I guess). >>> I say 'I guess', because I wasn't able so far to verify that - when I triued to do >>> select * from a limit 1; >>> It just hungs on me ... at least, it looks like it does. > >> This didn't hang, it just requires a sequential scan of the whole table. > > I know it does (as I said below). The point is that it SHOULD NOT, and > especially, that I can't imagine anyone, not familiar with postgres > internals to expect that it would - all it needs to do is to grab the > first row and return immediately. > That's what it would do, if you just create a new table and populate it > with data. > >> As you observe below, it will also need to scan through dead tuples, > > Not 'also' - JUST the dead ones! That's what's especially annoying about > it - as soon as it finds the first tuple, that's not dead, it returns. > > but that is just a product of MVCC and there's no real way around > it. > > My whole point is that I don't believe it (that there is no way around) > :-) > For one thing, I have never seen ANY database engine (Oracle, Informix, > DB2) that would take more than a second to get the first row from a > table, regardless of what has been done to that table before. > That (and my common sense too) tells me that there MUST be a 'way around > it'. > I can see, that it's not currently implemented in postgres, but do > believe (and that's the whole point of me posting that message in the > first place) that it is a huge usability issue and really needs to be > fixed. Ok, so I realise that addressing this issue of using a limit of 1 doesn't address anything associated with using this table for something useful, however I find it an interesting point. All that has been asked for is that a single row be returned. This doesn't have to be the first valid row at all. Indeed isn't it one of axioms of SQL that data is returned in some arbitrary order unless specifically forced into an order? Therefore shouldn't doing SELECT * FROM x LIMIT 1 use an index, any index, to just fetch a single row? Or, is this an effect of a row needing to be visited in order to determine it's 'visibility' as proposed as a reason why a sequential scan was being performed for one of my queries I posted about recently? I am still a little confused about that though. If there is an index and deletes have been commited, shouldn't the index have been updated and forced to forget about the deleted rows unless there is a transaction open that can still access those deleted items? As for the rest of the argument, is it constructive? I for one thought the original poster had done more than a normal user, me included, would have done before posting. >[stuff deleted] > >>> First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert? > >> Yes, AFAIK -- MVCC requires this. > > What's MVCC? Funny, I was about to ask that question. Something about variable size of fields in the physical storage? > >>> - Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished). > >> Is this 7.2? If not, VACUUM should be substantially faster in 7.2. > > Yes, it is 7.2 Pleased I haven't tried what I'm doing in a version older than 7.2 in that case, and my table's only 1 million rows. > [more deletions] However, a useful thread, since it explains a 'feature' a ran into earlier today after managing about 4/5 of an update of every row of my table. I had been thinking the problem was large storage usage by the txtidx type, I realise now it is the updated = deleted+inserted storage requirement. I'm thinking of changing what I'm doing to a way I considered earlier [without thinking about this thread] so I can stay with in the limits or my resources. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
pgsql-general by date: