Thread: Adding new dependencies for in-core
Hi, I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345 (Share parsed query texts across different connections) and it looks like https://github.com/ben-manes/concurrentlinkedhashmap would be useful to implement the cache. We don't need all the stuff CLHM has, but I see no easy&scalable way of doing concurrent map with eviction by mere java.util.concurrent stuff. Well, I can do lock splitting (i.e. use multiple small LRU caches instead of a single big cache), however that looks like reinventing the wheel. Can you suggest what is the best way of adding CLHM dependency? Should it be added in plain? Should it be shaded? (i.e. renamed to org.postgresql.clhm...) I'm more inclined to the "add regular dependency" approach. I expect similar question might appear if we consider using netty for IO. -- Regards, Vladimir Sitnikov
Hi Vladimir,
There are a large number of people still downloading jars, so simply using maven dependency doesn't work.
It would have to be shaded IMO.
This is why I have avoided dependencies in the past.
On 30 June 2015 at 08:21, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
Hi,
I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
(Share parsed query texts across different connections) and it looks
like https://github.com/ben-manes/concurrentlinkedhashmap would be
useful to implement the cache.
We don't need all the stuff CLHM has, but I see no easy&scalable way
of doing concurrent map with eviction by mere java.util.concurrent
stuff.
Well, I can do lock splitting (i.e. use multiple small LRU caches
instead of a single big cache), however that looks like reinventing
the wheel.
Can you suggest what is the best way of adding CLHM dependency?
Should it be added in plain?
Should it be shaded? (i.e. renamed to org.postgresql.clhm...)
I'm more inclined to the "add regular dependency" approach.
I expect similar question might appear if we consider using netty for IO.
--
Regards,
Vladimir Sitnikov
--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc
>There are a large number of people still downloading jars, so simply using maven dependency doesn't work. We can add a list of URLs like http://search.maven.org/remotecontent?filepath=org/postgresql/postgresql/9.4-1201-jdbc41/postgresql-9.4-1201-jdbc41.jar > large number of people still downloading jars You know, even non-mavenized pgjdbc used maven for dependencies fetching :) >It would have to be shaded IMO. We can't shade forever, can we? Vladimir
Vladimir, sorry for being so ingorant not to give it a deeper thought, but what is in CLHM what is not also in java.util.concurrent.ConcurrentHashMap? -Markus -----Original Message----- From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov Sent: Dienstag, 30. Juni 2015 14:21 To: List Subject: [JDBC] Adding new dependencies for in-core Hi, I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345 (Share parsed query texts across different connections) and it looks like https://github.com/ben-manes/concurrentlinkedhashmapwould be useful to implement the cache. We don't need all the stuff CLHM has, but I see no easy&scalable way of doing concurrent map with eviction by mere java.util.concurrentstuff. Well, I can do lock splitting (i.e. use multiple small LRU caches instead of a single big cache), however that looks likereinventing the wheel. Can you suggest what is the best way of adding CLHM dependency? Should it be added in plain? Should it be shaded? (i.e. renamed to org.postgresql.clhm...) I'm more inclined to the "add regular dependency" approach. I expect similar question might appear if we consider using netty for IO. -- Regards, Vladimir Sitnikov -- Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-jdbc
It is a Concurrent *Linked* HashMap, which presumably facilitates LRU removal. On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote: > Vladimir, > > sorry for being so ingorant not to give it a deeper thought, but what is in CLHM what is not also in java.util.concurrent.ConcurrentHashMap? > > -Markus > > -----Original Message----- > From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov > Sent: Dienstag, 30. Juni 2015 14:21 > To: List > Subject: [JDBC] Adding new dependencies for in-core > > Hi, > > I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345 > (Share parsed query texts across different connections) and it looks like https://github.com/ben-manes/concurrentlinkedhashmapwould be useful to implement the cache. > > We don't need all the stuff CLHM has, but I see no easy&scalable way of doing concurrent map with eviction by mere java.util.concurrentstuff. > > Well, I can do lock splitting (i.e. use multiple small LRU caches instead of a single big cache), however that looks likereinventing the wheel. > > Can you suggest what is the best way of adding CLHM dependency? > Should it be added in plain? > Should it be shaded? (i.e. renamed to org.postgresql.clhm...) > > I'm more inclined to the "add regular dependency" approach. > > I expect similar question might appear if we consider using netty for IO. > > -- > Regards, > Vladimir Sitnikov > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
I see. Just to throw in a different idea: Possibly we might need caches in other places too at a later time, so maybe we might like to apply strategy design pattern instead of becoming directly dependent of one particular class? I mean, it is nice that the linked hash map actually implies LRU, but if we want LRU, it might be a better design choice, to explicitly model this as a class "LruCacheStrategy" which allows us to even configure or replace the strategy later? -----Original Message----- From: Steven Schlansker [mailto:stevenschlansker@gmail.com] Sent: Dienstag, 30. Juni 2015 19:09 To: Markus KARG Cc: List Subject: Re: [JDBC] Adding new dependencies for in-core It is a Concurrent *Linked* HashMap, which presumably facilitates LRU removal. On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote: > Vladimir, > > sorry for being so ingorant not to give it a deeper thought, but what is in CLHM what is not also in java.util.concurrent.ConcurrentHashMap? > > -Markus > > -----Original Message----- > From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov > Sent: Dienstag, 30. Juni 2015 14:21 > To: List > Subject: [JDBC] Adding new dependencies for in-core > > Hi, > > I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345 > (Share parsed query texts across different connections) and it looks like https://github.com/ben-manes/concurrentlinkedhashmap would be useful to implement the cache. > > We don't need all the stuff CLHM has, but I see no easy&scalable way of doing concurrent map with eviction by mere java.util.concurrent stuff. > > Well, I can do lock splitting (i.e. use multiple small LRU caches instead of a single big cache), however that looks like reinventing the wheel. > > Can you suggest what is the best way of adding CLHM dependency? > Should it be added in plain? > Should it be shaded? (i.e. renamed to org.postgresql.clhm...) > > I'm more inclined to the "add regular dependency" approach. > > I expect similar question might appear if we consider using netty for IO. > > -- > Regards, > Vladimir Sitnikov > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
There should be some implementation behind that "strategy" anyway. That is the interesting part since I do not want reinvent CLHM stuff. BTW. CLHM implements java.util.concurrent.ConcurrentMap, so I think it is a safe interface for replacement in the future (e.g. to https://github.com/ben-manes/caffeine when we have jdk8-minimum builds) PS. sorry for the strange reply with empty subject. Vladimir
I'm having trouble believing any end user will replace the caching strategy for prepared statements. Maybe one or two Gentoorefugees will, and they will probably misconfigure it horribly anyway ;) On Jun 30, 2015, at 10:56 AM, Markus KARG <markus@headcrashing.eu> wrote: > I see. Just to throw in a different idea: Possibly we might need caches in > other places too at a later time, so maybe we might like to apply strategy > design pattern instead of becoming directly dependent of one particular > class? I mean, it is nice that the linked hash map actually implies LRU, but > if we want LRU, it might be a better design choice, to explicitly model this > as a class "LruCacheStrategy" which allows us to even configure or replace > the strategy later? > > -----Original Message----- > From: Steven Schlansker [mailto:stevenschlansker@gmail.com] > Sent: Dienstag, 30. Juni 2015 19:09 > To: Markus KARG > Cc: List > Subject: Re: [JDBC] Adding new dependencies for in-core > > It is a Concurrent *Linked* HashMap, which presumably facilitates LRU > removal. > > On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote: > >> Vladimir, >> >> sorry for being so ingorant not to give it a deeper thought, but what is > in CLHM what is not also in java.util.concurrent.ConcurrentHashMap? >> >> -Markus >> >> -----Original Message----- >> From: pgsql-jdbc-owner@postgresql.org > [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov >> Sent: Dienstag, 30. Juni 2015 14:21 >> To: List >> Subject: [JDBC] Adding new dependencies for in-core >> >> Hi, >> >> I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345 >> (Share parsed query texts across different connections) and it looks like > https://github.com/ben-manes/concurrentlinkedhashmap would be useful to > implement the cache. >> >> We don't need all the stuff CLHM has, but I see no easy&scalable way of > doing concurrent map with eviction by mere java.util.concurrent stuff. >> >> Well, I can do lock splitting (i.e. use multiple small LRU caches instead > of a single big cache), however that looks like reinventing the wheel. >> >> Can you suggest what is the best way of adding CLHM dependency? >> Should it be added in plain? >> Should it be shaded? (i.e. renamed to org.postgresql.clhm...) >> >> I'm more inclined to the "add regular dependency" approach. >> >> I expect similar question might appear if we consider using netty for IO. >> >> -- >> Regards, >> Vladimir Sitnikov >> >> >> -- >> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-jdbc >> >> >> >> -- >> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-jdbc > > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
>I'm having trouble believing any end user will replace the caching strategy for prepared statements. My wild guess is Markus a bit over-engineers _internal_ implementation of the driver. I think he hints best practices of coding so the driver does not get tied to the single cache implementation. Markus, did I get you right? Vladimir
On Jun 30, 2015, at 11:14 AM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote: >> I'm having trouble believing any end user will replace the caching strategy for prepared statements. > > My wild guess is Markus a bit over-engineers _internal_ implementation > of the driver. > I think he hints best practices of coding so the driver does not get > tied to the single cache implementation. > Markus, did I get you right? I am not opposed to hiding this behind an interface, but unless we expose this cache to the end user (which IMO we shouldnot, barring a compelling reason) it is not hard to replace it internally even if it breaks the not-public-API. Sothe interface just adds complexity for no real gain unless we actually expect to switch it out at runtime. Additionally, the ConcurrentMap interface doesn't really have any "removeLRU" sorts of functionality, so it's not clear thatit is the proper interface anyway. Final note, is a ConcurrentLinkedHashMap actually the data structure we should be using? The "linked" part enforces thatthe removal strategy is not LRU, it is actually FIFO. So an incoming query may evict the oldest entry, which could verywell be the most used entry. Maybe this is not a problem in practice but I thought I'd point it out.
> it is not hard to replace it internally even if it breaks the not-public-API I fully agree here. Sorry that we moved off-topic. >Final note, is a ConcurrentLinkedHashMap actually the data structure we should be using? I am not yet sure which API of CLHM would be used. I just assume CLHM allows some "smart eviction". If it turns out CLHM is a no-no in pgjdbc, I might have to resort to a org.postgresql.util.LruCache[128] kind of data structure. I do not like that very much (even though it would make me a bit happy for reusing my own code), so I want feedback on policy of using dependencies. Vladimir
I also do not think an END USER will do that, but WE (i. e. pgjdbc developers) might one day like to experiment with other caching strategies. :-) -----Original Message----- From: Steven Schlansker [mailto:stevenschlansker@gmail.com] Sent: Dienstag, 30. Juni 2015 20:10 To: Markus KARG Cc: List Subject: Re: [JDBC] Adding new dependencies for in-core I'm having trouble believing any end user will replace the caching strategy for prepared statements. Maybe one or two Gentoo refugees will, and they will probably misconfigure it horribly anyway ;) On Jun 30, 2015, at 10:56 AM, Markus KARG <markus@headcrashing.eu> wrote: > I see. Just to throw in a different idea: Possibly we might need caches in > other places too at a later time, so maybe we might like to apply strategy > design pattern instead of becoming directly dependent of one particular > class? I mean, it is nice that the linked hash map actually implies LRU, but > if we want LRU, it might be a better design choice, to explicitly model this > as a class "LruCacheStrategy" which allows us to even configure or replace > the strategy later? > > -----Original Message----- > From: Steven Schlansker [mailto:stevenschlansker@gmail.com] > Sent: Dienstag, 30. Juni 2015 19:09 > To: Markus KARG > Cc: List > Subject: Re: [JDBC] Adding new dependencies for in-core > > It is a Concurrent *Linked* HashMap, which presumably facilitates LRU > removal. > > On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote: > >> Vladimir, >> >> sorry for being so ingorant not to give it a deeper thought, but what is > in CLHM what is not also in java.util.concurrent.ConcurrentHashMap? >> >> -Markus >> >> -----Original Message----- >> From: pgsql-jdbc-owner@postgresql.org > [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov >> Sent: Dienstag, 30. Juni 2015 14:21 >> To: List >> Subject: [JDBC] Adding new dependencies for in-core >> >> Hi, >> >> I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345 >> (Share parsed query texts across different connections) and it looks like > https://github.com/ben-manes/concurrentlinkedhashmap would be useful to > implement the cache. >> >> We don't need all the stuff CLHM has, but I see no easy&scalable way of > doing concurrent map with eviction by mere java.util.concurrent stuff. >> >> Well, I can do lock splitting (i.e. use multiple small LRU caches instead > of a single big cache), however that looks like reinventing the wheel. >> >> Can you suggest what is the best way of adding CLHM dependency? >> Should it be added in plain? >> Should it be shaded? (i.e. renamed to org.postgresql.clhm...) >> >> I'm more inclined to the "add regular dependency" approach. >> >> I expect similar question might appear if we consider using netty for IO. >> >> -- >> Regards, >> Vladimir Sitnikov >> >> >> -- >> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-jdbc >> >> >> >> -- >> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-jdbc > > > > > -- > Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-jdbc
I won't say it is over-engineering: I think it is always a good idea to *explicitly* pick a strategy instead of taking one*for granted* just because it comes with the box. For example, at the JCP we discuss the intensive use of the Java CacheAPI for various endavious exactly like this one. :-) -----Original Message----- From: Vladimir Sitnikov [mailto:sitnikov.vladimir@gmail.com] Sent: Dienstag, 30. Juni 2015 20:14 To: Steven Schlansker; Markus KARG Cc: List Subject: Re: [JDBC] Adding new dependencies for in-core >I'm having trouble believing any end user will replace the caching strategy for prepared statements. My wild guess is Markus a bit over-engineers _internal_ implementation of the driver. I think he hints best practices of coding so the driver does not get tied to the single cache implementation. Markus, did I get you right? Vladimir
Steven Schlansker <stevenschlansker <at> gmail.com> writes: > On Jun 30, 2015, at 11:14 AM, Vladimir Sitnikov wrote: > > Final note, is a ConcurrentLinkedHashMap actually the data structure we > should be using? The "linked" part enforces that the removal strategy is > not LRU, it is actually FIFO. So an incoming query may evict the oldest > entry, which could very well be the most used entry. Maybe this is not a > problem in practice but I thought I'd point it out. CLHM evicts by LRU. Java's LinkedHashMap may be configured in either insertion order (FIFO) or access order (LRU). The concurrent version provides only the access order as it is intended to be used as a cache. So the eviction is smart, as much as an LRU can be. CLHM is really easy to shade (as many do), fork (as some do), and is pretty tiny. An LRU policy is surprisingly tricky to implement fully concurrently, because every read is in fact a write. This is solved by borrowing an idea from Postgres - the write ahead log. The reads are recorded cheaply and replayed in batches so that reads and writes may operate concurrently. The successor project, Caffeine, includes a design document that describes a similar approach, with some additional optimizations. Also see the benchmarks. That cache is heavier (JAR size) due to providing a richer feature set. Cheers, Ben (author) https://github.com/ben-manes/caffeine/wiki/Design https://github.com/ben-manes/caffeine/wiki/Benchmarks