Java soft-references – usage consequences on a memory based architecture (data grid/caching, cloud)

My previous article on java references introduced the java referencing model with the nice features that it can include. One of them the soft-reference was as I explained very useful for cache and buffers in a VM.

In this article I will discuss the particular effect that using soft-references can have on a java cloud or data-grid. We will particularly focus on GigaSpaces as it is the currently more active solution as a Cloud-Computing end-to-end application server and build in java.

When a developer uses soft reference, he allows the JVM to take control on a part of the heap and manage it just the way she likes. The developer in that case really has no control at all. Even if the Garbage Collection cycles are controlled by JVM the developer can try to invoke GC (unless the -XX:+DisableExplicitGC is on but we really not consider this as a good practise). As all of us know calling an explicit GC is not a hundred percent guaranty of having a real full GC cycle but certainly help to reduce the heap in different scenarios. However this will for sure have no impact at all on the soft-reference usage.

So “why can this be bad?” After all the goal of soft-reference is too be a cache, use the heap but still guarantee the security of the JVM (no risk of OutOfMemoryException). Basically if there is no direct impact on most of application we can wonder what is the impact of the soft-reference on heap monitoring. Indeed the objects held by soft-reference are here but are not consuming heap in a “risky way”. Basically they cannot be considered as responsible for any memory leak or even “real usage”.

However a monitoring of the heap may make people think that the heap is really used and that basically they have a risk for their application. If people can be educated and can find out that memory is related to soft-reference, there is no statistics in the JVM that clearly shows it.

The problem is even more relevant when an application uses a self-monitoring to take actions at given SLA. This is basically what will and should happen in many cases on a java cloud. We can this way enter in SLA being triggered without any good reason.

This is why soft-reference usage must be used with caution in those environments.

 

If we look at the example of GigaSpaces:

-As a cloud provider GigaSpaces help you to monitor and to define SLAs based on the heap and memory usage. As we already saw this can be really dangerous (depending on your SLA action) if many soft references are used. We can run in scenarios where some part of your application will not be relocated on a VM because of these soft-references.– I want to really emphasis that such scenarios will mostly be bad usage of soft-reference in the application code or sometimes some really extreme situation that I will detail later.

-As a data-grid vendor, GigaSpaces has a very nice memory protection feature that allows the Spaces to protect themselves from OutOfMemory by cancelling write operations performed on it. These protections are based on memory SLAs (see http://www.gigaspaces.com/wiki/display/XAP66/Memory+Management+Facility#MemoryManagementFacility-MemoryUsage) and implies the same impact when the memory is consumed by soft-references.

Again the scenarios that will push to run in such situations are rare and quite extreme. But let details one of them that can happen with GigaSpaces.

The GigaSpaces software uses soft-reference in his communication protocol LRMI to manage buffers. In most situations this will never cause any high heap consumption. However when many client connect to the space to request concurrently big object or many small objects we may run in a situation where many buffers are created and will consume memory. This kind of scenario may happen typically in Grid Computing scenarios, when the whole Grid (we encounter mostly DataSynapse or Platform systems) tries to connect and perform request on a single data-grid node.

The question that comes next is: “so what can I do in such a case?”

The truth is that there is no easy answer to this question. The parameter -XX:SoftRefLRUPolicyMSPerMB help to tune the soft-reference garbaging in SUN JVM. However this parameter is not part of the java specification. It does indeed not exist in other JVM.

I will therefore recommend using in such use-case a SUN JVM to be able to perform the tuning that my application will require.

I don’t know if any of you have been involved in similar situations, and I wonder how you managed to handle this.

The SpaceRetry plugin for OpenSpaces

The spaceRetry project is a plugin for GigaSpaces OpenSpaces.

Features:

  • Lazy-load: we can instantiate the PU also if the proxy can’t connect to the space
  • Connection is made when we invoke a method on the IJSpace Proxy (like a write)

Synopsis:

We have developed this plugin in order to fix two problems with the space components of OpenSpaces.
1) The Space component of OpenSpaces allows you to create an IJSpace.
If we want to create a Remote Space with an url like jini://*/*/space the remote site has to be deployed first. If not, the deployment of the PU will failed with a org.openspaces.core.space.CannotFindSpaceException (ATTENTION: Failed to find space with url …).

2) The second problem is about automatic proxy renewal.
If the proxy loose his connection there is no mechanism to try to rebind the connection when a method of IJSpace is invoked.

Now, let’s dive into the configuration of this component.

Configuration

In order to have the <fc-os:spaceretry> namespace add

xmlns:fc-os="http://opensource.fastconnect.org/schema/spaceretry"

Then add the schema location

http://opensource.fastconnect.org/schema/spaceretry http://opensource.fastconnect.org/schema/space/fcspace.xsd

Finally, to use the fcSpaceRetry proxy, you have to wrap the <os-core:space /> with the <fc:spaceretry/> tag.

<fc-os:spaceretry id="myspace" max-retry-init="1" max-retry-invoke="1" wait-time="5000">
<os-core:space id="ijspace" url="jini://*/*/mySpace" />
</fc-os:spaceretry>

Parameters:

  • max-retry-init: the number of tests for connexion during the initialization of the proxy
  • max-retry-invoke: the number of tests for connexion when a method of IJSpace is invoke
  • wait-time: the wait time between both tests of connexion (in millisecond)

Use the <os-core:space> like you have the used to do.

If you use Maven, you have also add a dependency to space-retry in your POM.xml

<dependencies>
<dependency>
<groupId>fr.fastconnect.openspaces</groupId>
<artifactId>space-retry</artifactId>
<version>0.7</version>
</dependency>
</dependencies>

Technical Information

The spaceRetry plugin extends the UrlSpaceFactoryBean and overloads the doCreateSpace method to create a new instance of IJSpaceProxy (dynamic proxy of IJSpace).

Algorithm :

We try n times to create an IJSpace with the SpaceFinder.find(spaceURL). When the max-retry-init is reached, we create a proxy of the IJSpace but not initialised.

When a method of the IJSpace is invoked and if the connection is not established the IJSpaceProxy try n times to make the connection.
When the max-retry-invoke is reached, the IJSpaceProxy throw a RuntimeException.

Compatibility

The spaceRetry plugin is compatible with all components of the IJSpace (core, events or remoting components). Just use the id of the spaceRetry bean to have all benefits of the plugin.
Example :

<fc:spaceretry id="myspace" max-retry-init="2" max-retry-invoke="5" wait-time="5000">
<os-core:space id="dontuseit" url="/./mySpace" />
</fc:spaceretry>
<os-core:giga-space id="gigaSpace" space="myspace"/>

More information on https://opensource.fastconnect.org/redmine/projects/show/extended-space-bean