Home > mailing lists

Re: Escaping the ARC patent - Mailing list pgsql-hackers

From	Jonah H. Harris
Subject	Re: Escaping the ARC patent
Date	February 3, 2005 22:38:29
Msg-id	4202A7BF.5090505@tvi.edu Whole thread Raw
In response to	Escaping the ARC patent (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Escaping the ARC patent
List	pgsql-hackers

Tree view

I'm familiar with the 2Q algorithm.  I also remember seeing, I believe, 
a public domain 2Q implementation floating around somewhere.

Tom Lane wrote:

>I've been doing a bit of research on $subj, and coming to the conclusion
>that the ARC patent is a lot narrower than it might appear.  In fact
>most of the parts of the algorithm that we actually want have prior art.
>I looked in particular at Johnson and Shasha's well-known "2Q" paper,
>published in 1994 (http://citeseer.ist.psu.edu/63909.html).  This paper
>describes the use of two lists, which they call A1 and Am (as opposed to
>ARC's T1 and T2) but the basic principle is the same: a page goes into
>A1 on first use, and doesn't get to Am unless used a second time before
>aging out of the cache.  2Q also includes a list of pages that have
>recently been in the cache but no longer are.  So the actually
>patentable parts of ARC are just some rather narrow decisions about the
>management of these lists, in particular the use of a target T1len to
>dynamically adapt the sizes of the lists.  The 2Q paper proposes using
>fixed fractions of the total available space for each list --- and it
>includes statistics showing that the algorithm isn't excessively
>sensitive to the exact values used, so ARC's claimed "self tuning"
>advantage isn't all that great after all.
>
>These conclusions are borne out by a close reading of the patent
>application (which is at

>http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220040098541%22.PGNR.&OS=DN/20040098541&RS=DN/20040098541
>if you want to look for yourself).  Claim 1 reads
>
>1. A method for adaptively managing pages in a cache memory with a
>variable workload, comprising: maintaining the cache memory into a first
>list L1 and a second list L2; wherein the cache memory has a capacity to
>store c pages; and adaptively distributing the workload between the
>first list L1 and the second list L2, to a total capacity of c pages.
>
>Given the prior art, the critical word in this sentence is "adaptively";
>take that out and you have nothing that wasn't published long before.
>If we remove the adaptivity --- ie, just use a fixed division of list
>sizes --- we escape claim 1 and all the other claims that depend on it.
>
>The only other claim that isn't dependent on claim 1 or a restatement of
>it is
>
>45. A method for adaptively managing pages in a memory, comprising:
>defining a cache memory; defining a cache directory; organizing the
>cache directory into fours disjoint lists of pages: list T1, list T2,
>list B1, and list B2; and wherein the cache memory contains pages that
>are members of any of the list T1 or the list T2.
>
>So if we use non-variable sizes of T1/T2 and don't use the four-way
>list structure to manage remembrance of pages-formerly-in-cache,
>we escape the patent.  But we still have scan resistance, which is the
>main thing that ARC was going to buy us.  Pages that are scanned only
>once don't get out of A1 and so aren't able to swamp out pages
>referenced multiple times.
>
>After reading the 2Q paper my inclination is to use exactly Johnson and
>Shasha's "simplified 2Q" algorithm, which uses just A1 and Am with no
>remembrance of formerly cached pages.  Their "full 2Q" algorithm strikes
>me as a tad bizarre because it will only promote a page into Am after it
>has fallen out of A1, ie, it takes two physical reads of the page to
>get into Am.  That's just weird.  I think that pages should advance from
>A1 into Am on second reference.  Given that, you don't need any
>remembrance of pages that were formerly in A1, which basically halves
>the memory overhead of the ARC algorithm.
>
>An advantage of heading in this direction (as opposed to, say, LRU/k
>or other algorithms) is that this represents a direct simplification
>of the ARC code we have now.  We can probably implement it almost
>entirely by deletions from freelist.c, with little newly written code.
>That gives me a whole lot more confidence that the result will be
>reliable enough to back-patch into 8.0.*.
>
>Comments?
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>  
>

pgsql-hackers by date:

From: Tom Lane
Date: 03 February 2005, 22:28:39
Subject: Escaping the ARC patent

From: Tom Lane
Date: 03 February 2005, 22:49:42
Subject: Re: Escaping the ARC patent

Re: Escaping the ARC patent - Mailing list pgsql-hackers

Previous

Next