Re: Escaping the ARC patent - Mailing list pgsql-hackers
From | Jonah H. Harris |
---|---|
Subject | Re: Escaping the ARC patent |
Date | |
Msg-id | 4202A7BF.5090505@tvi.edu Whole thread Raw |
In response to | Escaping the ARC patent (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Escaping the ARC patent
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
I'm familiar with the 2Q algorithm. I also remember seeing, I believe, a public domain 2Q implementation floating around somewhere. Tom Lane wrote: >I've been doing a bit of research on $subj, and coming to the conclusion >that the ARC patent is a lot narrower than it might appear. In fact >most of the parts of the algorithm that we actually want have prior art. >I looked in particular at Johnson and Shasha's well-known "2Q" paper, >published in 1994 (http://citeseer.ist.psu.edu/63909.html). This paper >describes the use of two lists, which they call A1 and Am (as opposed to >ARC's T1 and T2) but the basic principle is the same: a page goes into >A1 on first use, and doesn't get to Am unless used a second time before >aging out of the cache. 2Q also includes a list of pages that have >recently been in the cache but no longer are. So the actually >patentable parts of ARC are just some rather narrow decisions about the >management of these lists, in particular the use of a target T1len to >dynamically adapt the sizes of the lists. The 2Q paper proposes using >fixed fractions of the total available space for each list --- and it >includes statistics showing that the algorithm isn't excessively >sensitive to the exact values used, so ARC's claimed "self tuning" >advantage isn't all that great after all. > >These conclusions are borne out by a close reading of the patent >application (which is at >http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220040098541%22.PGNR.&OS=DN/20040098541&RS=DN/20040098541 >if you want to look for yourself). Claim 1 reads > >1. A method for adaptively managing pages in a cache memory with a >variable workload, comprising: maintaining the cache memory into a first >list L1 and a second list L2; wherein the cache memory has a capacity to >store c pages; and adaptively distributing the workload between the >first list L1 and the second list L2, to a total capacity of c pages. > >Given the prior art, the critical word in this sentence is "adaptively"; >take that out and you have nothing that wasn't published long before. >If we remove the adaptivity --- ie, just use a fixed division of list >sizes --- we escape claim 1 and all the other claims that depend on it. > >The only other claim that isn't dependent on claim 1 or a restatement of >it is > >45. A method for adaptively managing pages in a memory, comprising: >defining a cache memory; defining a cache directory; organizing the >cache directory into fours disjoint lists of pages: list T1, list T2, >list B1, and list B2; and wherein the cache memory contains pages that >are members of any of the list T1 or the list T2. > >So if we use non-variable sizes of T1/T2 and don't use the four-way >list structure to manage remembrance of pages-formerly-in-cache, >we escape the patent. But we still have scan resistance, which is the >main thing that ARC was going to buy us. Pages that are scanned only >once don't get out of A1 and so aren't able to swamp out pages >referenced multiple times. > >After reading the 2Q paper my inclination is to use exactly Johnson and >Shasha's "simplified 2Q" algorithm, which uses just A1 and Am with no >remembrance of formerly cached pages. Their "full 2Q" algorithm strikes >me as a tad bizarre because it will only promote a page into Am after it >has fallen out of A1, ie, it takes two physical reads of the page to >get into Am. That's just weird. I think that pages should advance from >A1 into Am on second reference. Given that, you don't need any >remembrance of pages that were formerly in A1, which basically halves >the memory overhead of the ARC algorithm. > >An advantage of heading in this direction (as opposed to, say, LRU/k >or other algorithms) is that this represents a direct simplification >of the ARC code we have now. We can probably implement it almost >entirely by deletions from freelist.c, with little newly written code. >That gives me a whole lot more confidence that the result will be >reliable enough to back-patch into 8.0.*. > >Comments? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > >
pgsql-hackers by date: