Re: proposal: be smarter about i/o patterns in index scan - Mailing list pgsql-hackers
From | Glen Parker |
---|---|
Subject | Re: proposal: be smarter about i/o patterns in index scan |
Date | |
Msg-id | AJEKKAIECKNMBCEKADJPKEACCFAA.glenebob@nwlink.com Whole thread Raw |
In response to | Re: proposal: be smarter about i/o patterns in index scan (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane > > Or: > > 2. Sort AND COPY TID's into physical order. > > 3. Read heap in phy. order, match results to un-sorted TID list. > > That sounds quite cheap. > > No, it sounds like handwaving. What's your "match results" step, and You got me :-) > does it take less than O(N^2) time? How do you get to *return* the Less than O(N^2)??? Geez I think so! > tuples in index order, without having read them all before you return > the first one? (Which is really what the problem is with the sort > alternative, at least for fast-start cases such as the LIMIT 1 example.) (you'll have to pardon me for thinking as I type... and being to long winded :-) It takes O(1) time. If you have a hard maximum of TID's you'll grab from the index (which sounds right), you could store the TID list in a vector. The IO-order-sorted list could be a singly-linked-list OR a vector, but either way, each entry would have to point back to an offset in the index-ordered list. The index-ordered list would be (sizeof(TID) * max_TID_count) bytes, and the IO-ordered list would be (sizeof(TID) + sizeof(int) * max_TID_count) bytes. If a TID is 8 bytes and we're going to fetch 2048 index entries per iteration, that gets us (*counting on fingers*) 16K (for index-ordered list) + 24K for the IO-ordered list. The index-ordered list can also be a singly-linked list, in which case the mapping would be by pointer. If both lists are single-linked lists, then the size overhead (for a fully realized 2048 entry scan) would be: ((sizeof(TID) + sizeof(void*)) * max_TID_count) + ((sizeof(TID) + sizeof(void*) + sizeof(void*)) * max_TID_count) = (*more counting on fingers*) ... 24K + 32K = 56K. Hmm, the vector style would be crazy expensive for small queries. The linked-list approach sounds pretty good though. I assume no one thinks doing this would be completely free :-) And, if you did the IO-ordered list as a vector, you'd save the linked-list overhead, and you would know exactly how big to make it, so it wouldn't have to hurt you on small queries. 2048 entries seems pretty darn generous actually. > How do you get to *return* the > tuples in index order, without having read them all before you return Hmm, worsed case scenario would be to scan the heap in exactly-reverse order of index order. With a 2048 entry batch size, you'd have a fair amount of overhead on a LIMIT 1 :-) Unless the index scanner could be given a maximum tuple count, rather than just 'knowing' it from a GUC value. Would this dirty up the plan tree beyond all recognition? > What happens when you run out of memory for the list of TIDs ... er, > make that two lists of TIDs? Now you really got me. Hand waving: the same thing you do when you run out of memory anywhere else :-) By configuring the server to fetch index entries in 1-entry batches, you'd be right back to the overhead (or very nearly so) of the current implementation, so it would only hurt people who chose to be hurt by it :-) -Glen
pgsql-hackers by date: