Thread: Adding new dependencies for in-core

Adding new dependencies for in-core

From

Vladimir Sitnikov

Date:

30 June 2015, 12:21:24

Hi,

I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
(Share parsed query texts across different connections) and it looks
like https://github.com/ben-manes/concurrentlinkedhashmap would be
useful to implement the cache.

We don't need all the stuff CLHM has, but I see no easy&scalable way
of doing concurrent map with eviction by mere java.util.concurrent
stuff.

Well, I can do lock splitting (i.e. use multiple small LRU caches
instead of a single big cache), however that looks like reinventing
the wheel.

Can you suggest what is the best way of adding CLHM dependency?
Should it be added in plain?
Should it be shaded? (i.e. renamed to org.postgresql.clhm...)

I'm more inclined to the "add regular dependency" approach.

I expect similar question might appear if we consider using netty for IO.

--
Regards,
Vladimir Sitnikov

Re: Adding new dependencies for in-core

From

Dave Cramer

Date:

30 June 2015, 12:33:08

Hi Vladimir,

There are a large number of people still downloading jars, so simply using maven dependency doesn't work.

It would have to be shaded IMO.

This is why I have avoided dependencies in the past.

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 30 June 2015 at 08:21, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

Hi,

I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
(Share parsed query texts across different connections) and it looks
like https://github.com/ben-manes/concurrentlinkedhashmap would be
useful to implement the cache.

We don't need all the stuff CLHM has, but I see no easy&scalable way
of doing concurrent map with eviction by mere java.util.concurrent
stuff.

Well, I can do lock splitting (i.e. use multiple small LRU caches
instead of a single big cache), however that looks like reinventing
the wheel.

Can you suggest what is the best way of adding CLHM dependency?
Should it be added in plain?
Should it be shaded? (i.e. renamed to org.postgresql.clhm...)

I'm more inclined to the "add regular dependency" approach.

I expect similar question might appear if we consider using netty for IO.

--
Regards,
Vladimir Sitnikov

--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Adding new dependencies for in-core

From

Vladimir Sitnikov

Date:

30 June 2015, 15:47:48

>There are a large number of people still downloading jars, so simply using maven dependency doesn't work.

We can add a list of URLs like
http://search.maven.org/remotecontent?filepath=org/postgresql/postgresql/9.4-1201-jdbc41/postgresql-9.4-1201-jdbc41.jar

> large number of people still downloading jars

You know, even non-mavenized pgjdbc used maven for dependencies fetching :)

>It would have to be shaded IMO.

We can't shade forever, can we?

Vladimir

Re: Adding new dependencies for in-core

From

"Markus KARG"

Date:

30 June 2015, 17:06:53

Vladimir,

sorry for being so ingorant not to give it a deeper thought, but what is in CLHM what is not also in
java.util.concurrent.ConcurrentHashMap?

-Markus

-----Original Message-----
From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov
Sent: Dienstag, 30. Juni 2015 14:21
To: List
Subject: [JDBC] Adding new dependencies for in-core

Hi,

I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
(Share parsed query texts across different connections) and it looks like
https://github.com/ben-manes/concurrentlinkedhashmapwould be useful to implement the cache. 

We don't need all the stuff CLHM has, but I see no easy&scalable way of doing concurrent map with eviction by mere
java.util.concurrentstuff. 

Well, I can do lock splitting (i.e. use multiple small LRU caches instead of a single big cache), however that looks
likereinventing the wheel. 

Can you suggest what is the best way of adding CLHM dependency?
Should it be added in plain?
Should it be shaded? (i.e. renamed to org.postgresql.clhm...)

I'm more inclined to the "add regular dependency" approach.

I expect similar question might appear if we consider using netty for IO.

--
Regards,
Vladimir Sitnikov


--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Adding new dependencies for in-core

From

Steven Schlansker

Date:

30 June 2015, 17:09:30

It is a Concurrent *Linked* HashMap, which presumably facilitates LRU removal.

On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote:

> Vladimir,
>
> sorry for being so ingorant not to give it a deeper thought, but what is in CLHM what is not also in
java.util.concurrent.ConcurrentHashMap?
>
> -Markus
>
> -----Original Message-----
> From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov
> Sent: Dienstag, 30. Juni 2015 14:21
> To: List
> Subject: [JDBC] Adding new dependencies for in-core
>
> Hi,
>
> I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
> (Share parsed query texts across different connections) and it looks like
https://github.com/ben-manes/concurrentlinkedhashmapwould be useful to implement the cache. 
>
> We don't need all the stuff CLHM has, but I see no easy&scalable way of doing concurrent map with eviction by mere
java.util.concurrentstuff. 
>
> Well, I can do lock splitting (i.e. use multiple small LRU caches instead of a single big cache), however that looks
likereinventing the wheel. 
>
> Can you suggest what is the best way of adding CLHM dependency?
> Should it be added in plain?
> Should it be shaded? (i.e. renamed to org.postgresql.clhm...)
>
> I'm more inclined to the "add regular dependency" approach.
>
> I expect similar question might appear if we consider using netty for IO.
>
> --
> Regards,
> Vladimir Sitnikov
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Adding new dependencies for in-core

From

"Markus KARG"

Date:

30 June 2015, 17:56:56

I see. Just to throw in a different idea: Possibly we might need caches in
other places too at a later time, so maybe we might like to apply strategy
design pattern instead of becoming directly dependent of one particular
class? I mean, it is nice that the linked hash map actually implies LRU, but
if we want LRU, it might be a better design choice, to explicitly model this
as a class "LruCacheStrategy" which allows us to even configure or replace
the strategy later?

-----Original Message-----
From: Steven Schlansker [mailto:stevenschlansker@gmail.com]
Sent: Dienstag, 30. Juni 2015 19:09
To: Markus KARG
Cc: List
Subject: Re: [JDBC] Adding new dependencies for in-core

It is a Concurrent *Linked* HashMap, which presumably facilitates LRU
removal.

On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote:

> Vladimir,
>
> sorry for being so ingorant not to give it a deeper thought, but what is
in CLHM what is not also in java.util.concurrent.ConcurrentHashMap?
>
> -Markus
>
> -----Original Message-----
> From: pgsql-jdbc-owner@postgresql.org
[mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov
> Sent: Dienstag, 30. Juni 2015 14:21
> To: List
> Subject: [JDBC] Adding new dependencies for in-core
>
> Hi,
>
> I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
> (Share parsed query texts across different connections) and it looks like
https://github.com/ben-manes/concurrentlinkedhashmap would be useful to
implement the cache.
>
> We don't need all the stuff CLHM has, but I see no easy&scalable way of
doing concurrent map with eviction by mere java.util.concurrent stuff.
>
> Well, I can do lock splitting (i.e. use multiple small LRU caches instead
of a single big cache), however that looks like reinventing the wheel.
>
> Can you suggest what is the best way of adding CLHM dependency?
> Should it be added in plain?
> Should it be shaded? (i.e. renamed to org.postgresql.clhm...)
>
> I'm more inclined to the "add regular dependency" approach.
>
> I expect similar question might appear if we consider using netty for IO.
>
> --
> Regards,
> Vladimir Sitnikov
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Adding new dependencies for in-core

From

Vladimir Sitnikov

Date:

30 June 2015, 18:08:33

There should be some implementation behind that "strategy" anyway.
That is the interesting part since I do not want reinvent CLHM stuff.

BTW. CLHM implements java.util.concurrent.ConcurrentMap, so I think it
is a safe interface for replacement in the future (e.g. to
https://github.com/ben-manes/caffeine when we have jdk8-minimum
builds)

PS. sorry for the strange reply with empty subject.
Vladimir

Re: Adding new dependencies for in-core

From

Steven Schlansker

Date:

30 June 2015, 18:09:42

I'm having trouble believing any end user will replace the caching strategy for prepared statements.  Maybe one or two
Gentoorefugees will, and they will probably misconfigure it horribly anyway ;) 

On Jun 30, 2015, at 10:56 AM, Markus KARG <markus@headcrashing.eu> wrote:

> I see. Just to throw in a different idea: Possibly we might need caches in
> other places too at a later time, so maybe we might like to apply strategy
> design pattern instead of becoming directly dependent of one particular
> class? I mean, it is nice that the linked hash map actually implies LRU, but
> if we want LRU, it might be a better design choice, to explicitly model this
> as a class "LruCacheStrategy" which allows us to even configure or replace
> the strategy later?
>
> -----Original Message-----
> From: Steven Schlansker [mailto:stevenschlansker@gmail.com]
> Sent: Dienstag, 30. Juni 2015 19:09
> To: Markus KARG
> Cc: List
> Subject: Re: [JDBC] Adding new dependencies for in-core
>
> It is a Concurrent *Linked* HashMap, which presumably facilitates LRU
> removal.
>
> On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote:
>
>> Vladimir,
>>
>> sorry for being so ingorant not to give it a deeper thought, but what is
> in CLHM what is not also in java.util.concurrent.ConcurrentHashMap?
>>
>> -Markus
>>
>> -----Original Message-----
>> From: pgsql-jdbc-owner@postgresql.org
> [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov
>> Sent: Dienstag, 30. Juni 2015 14:21
>> To: List
>> Subject: [JDBC] Adding new dependencies for in-core
>>
>> Hi,
>>
>> I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
>> (Share parsed query texts across different connections) and it looks like
> https://github.com/ben-manes/concurrentlinkedhashmap would be useful to
> implement the cache.
>>
>> We don't need all the stuff CLHM has, but I see no easy&scalable way of
> doing concurrent map with eviction by mere java.util.concurrent stuff.
>>
>> Well, I can do lock splitting (i.e. use multiple small LRU caches instead
> of a single big cache), however that looks like reinventing the wheel.
>>
>> Can you suggest what is the best way of adding CLHM dependency?
>> Should it be added in plain?
>> Should it be shaded? (i.e. renamed to org.postgresql.clhm...)
>>
>> I'm more inclined to the "add regular dependency" approach.
>>
>> I expect similar question might appear if we consider using netty for IO.
>>
>> --
>> Regards,
>> Vladimir Sitnikov
>>
>>
>> --
>> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-jdbc
>>
>>
>>
>> --
>> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-jdbc
>
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Adding new dependencies for in-core

From

Vladimir Sitnikov

Date:

30 June 2015, 18:14:31

>I'm having trouble believing any end user will replace the caching strategy for prepared statements.

My wild guess is Markus a bit over-engineers _internal_ implementation
of the driver.
I think he hints best practices of coding so the driver does not get
tied to the single cache implementation.
Markus, did I get you right?

Vladimir

Re: Adding new dependencies for in-core

From

Steven Schlansker

Date:

30 June 2015, 18:22:28

On Jun 30, 2015, at 11:14 AM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

>> I'm having trouble believing any end user will replace the caching strategy for prepared statements.
>
> My wild guess is Markus a bit over-engineers _internal_ implementation
> of the driver.
> I think he hints best practices of coding so the driver does not get
> tied to the single cache implementation.
> Markus, did I get you right?

I am not opposed to hiding this behind an interface, but unless we expose this cache to the end user (which IMO we
shouldnot, barring a compelling reason) it is not hard to replace it internally even if it breaks the not-public-API.
Sothe interface just adds complexity for no real gain unless we actually expect to switch it out at runtime. 

Additionally, the ConcurrentMap interface doesn't really have any "removeLRU" sorts of functionality, so it's not clear
thatit is the proper interface anyway. 

Final note, is a ConcurrentLinkedHashMap actually the data structure we should be using?  The "linked" part enforces
thatthe removal strategy is not LRU, it is actually FIFO.  So an incoming query may evict the oldest entry, which could
verywell be the most used entry.  Maybe this is not a problem in practice but I thought I'd point it out.

Re: Adding new dependencies for in-core

From

Vladimir Sitnikov

Date:

30 June 2015, 18:31:06

> it is not hard to replace it internally even if it breaks the not-public-API

I fully agree here. Sorry that we moved off-topic.

>Final note, is a ConcurrentLinkedHashMap actually the data structure we should be using?

I am not yet sure which API of CLHM would be used. I just assume CLHM
allows some "smart eviction".

If it turns out CLHM is a no-no in pgjdbc, I might have to resort to a
org.postgresql.util.LruCache[128] kind of data structure.
I do not like that very much (even though it would make me a bit happy
for reusing my own code), so I want feedback on policy of using
dependencies.

Vladimir

Re: Adding new dependencies for in-core

From

"Markus KARG"

Date:

30 June 2015, 20:21:06

I also do not think an END USER will do that, but WE (i. e. pgjdbc
developers) might one day like to experiment with other caching strategies.
:-)


-----Original Message-----
From: Steven Schlansker [mailto:stevenschlansker@gmail.com]
Sent: Dienstag, 30. Juni 2015 20:10
To: Markus KARG
Cc: List
Subject: Re: [JDBC] Adding new dependencies for in-core

I'm having trouble believing any end user will replace the caching strategy
for prepared statements.  Maybe one or two Gentoo refugees will, and they
will probably misconfigure it horribly anyway ;)

On Jun 30, 2015, at 10:56 AM, Markus KARG <markus@headcrashing.eu> wrote:

> I see. Just to throw in a different idea: Possibly we might need caches in
> other places too at a later time, so maybe we might like to apply strategy
> design pattern instead of becoming directly dependent of one particular
> class? I mean, it is nice that the linked hash map actually implies LRU,
but
> if we want LRU, it might be a better design choice, to explicitly model
this
> as a class "LruCacheStrategy" which allows us to even configure or replace
> the strategy later?
>
> -----Original Message-----
> From: Steven Schlansker [mailto:stevenschlansker@gmail.com]
> Sent: Dienstag, 30. Juni 2015 19:09
> To: Markus KARG
> Cc: List
> Subject: Re: [JDBC] Adding new dependencies for in-core
>
> It is a Concurrent *Linked* HashMap, which presumably facilitates LRU
> removal.
>
> On Jun 30, 2015, at 10:06 AM, Markus KARG <markus@headcrashing.eu> wrote:
>
>> Vladimir,
>>
>> sorry for being so ingorant not to give it a deeper thought, but what is
> in CLHM what is not also in java.util.concurrent.ConcurrentHashMap?
>>
>> -Markus
>>
>> -----Original Message-----
>> From: pgsql-jdbc-owner@postgresql.org
> [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Vladimir Sitnikov
>> Sent: Dienstag, 30. Juni 2015 14:21
>> To: List
>> Subject: [JDBC] Adding new dependencies for in-core
>>
>> Hi,
>>
>> I've had a quick look into https://github.com/pgjdbc/pgjdbc/issues/345
>> (Share parsed query texts across different connections) and it looks like
> https://github.com/ben-manes/concurrentlinkedhashmap would be useful to
> implement the cache.
>>
>> We don't need all the stuff CLHM has, but I see no easy&scalable way of
> doing concurrent map with eviction by mere java.util.concurrent stuff.
>>
>> Well, I can do lock splitting (i.e. use multiple small LRU caches instead
> of a single big cache), however that looks like reinventing the wheel.
>>
>> Can you suggest what is the best way of adding CLHM dependency?
>> Should it be added in plain?
>> Should it be shaded? (i.e. renamed to org.postgresql.clhm...)
>>
>> I'm more inclined to the "add regular dependency" approach.
>>
>> I expect similar question might appear if we consider using netty for IO.
>>
>> --
>> Regards,
>> Vladimir Sitnikov
>>
>>
>> --
>> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-jdbc
>>
>>
>>
>> --
>> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-jdbc
>
>
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Adding new dependencies for in-core

From

"Markus KARG"

Date:

30 June 2015, 20:22:20

I won't say it is over-engineering: I think it is always a good idea to *explicitly* pick a strategy instead of taking
one*for granted* just because it comes with the box. For example, at the JCP we discuss the intensive use of the Java
CacheAPI for various endavious exactly like this one. :-) 

-----Original Message-----
From: Vladimir Sitnikov [mailto:sitnikov.vladimir@gmail.com]
Sent: Dienstag, 30. Juni 2015 20:14
To: Steven Schlansker; Markus KARG
Cc: List
Subject: Re: [JDBC] Adding new dependencies for in-core

>I'm having trouble believing any end user will replace the caching strategy for prepared statements.

My wild guess is Markus a bit over-engineers _internal_ implementation of the driver.
I think he hints best practices of coding so the driver does not get tied to the single cache implementation.
Markus, did I get you right?

Vladimir

Re: Adding new dependencies for in-core

From

Ben Manes

Date:

02 July 2015, 22:58:49

Steven Schlansker <stevenschlansker <at> gmail.com> writes:

> On Jun 30, 2015, at 11:14 AM, Vladimir Sitnikov wrote:
>
> Final note, is a ConcurrentLinkedHashMap actually the data structure we
> should be using?  The "linked" part enforces that the removal strategy is
> not LRU, it is actually FIFO.  So an incoming query may evict the oldest
> entry, which could very well be the most used entry.  Maybe this is not a
> problem in practice but I thought I'd point it out.

CLHM evicts by LRU. Java's LinkedHashMap may be configured in either insertion
order (FIFO) or access order (LRU). The concurrent version provides only the
access order as it is intended to be used as a cache. So the eviction is
smart, as much as an LRU can be.

CLHM is really easy to shade (as many do), fork (as some do), and is pretty
tiny. An LRU policy is surprisingly tricky to implement fully concurrently,
because every read is in fact a write. This is solved by borrowing an idea
from Postgres - the write ahead log. The reads are recorded cheaply and
replayed in batches so that reads and writes may operate concurrently.

The successor project, Caffeine, includes a design document that describes a
similar approach, with some additional optimizations. Also see the benchmarks.
That cache is heavier (JAR size) due to providing a richer feature set.

Cheers,
Ben (author)

https://github.com/ben-manes/caffeine/wiki/Design
https://github.com/ben-manes/caffeine/wiki/Benchmarks