Single mirror multiple data-sources

As part of my job, I often visit customers of very different needs and application types, but most of them have one thing in common; data, data and more data. Data related challenges are numerous but in this post I’d like to share with you a few alternatives to solving a common scenario; where the application is using multiple persistency layers that need to be synchronized, without compromising consistency or performance.

Persistency as a Service

Data modification performed on a GigaSpaces cluster can be received asynchronously using a central process. This process is called the Mirror Service.

This concept is mostly used in two scenarios:

  • As a persistence mechanism; click here to read more.
  • As an optimized over the WAN replication facility

The Mirror Service allows providing your own logic to deal with batches of modifications by implementing the BulkDataPersister interface. While a powerful capability, this does not allow configuring multiple implementations simultaneously.

Simultaneous Configuration

A solution to workaround the challenge stated above is to implement a specific BulkDataPersister that broadcasts the data to concrete implementations, internally. Every method calls will then be forwarded to all persistency layers. Composite BulkDataPersister is a generic implementation of this pattern, allowing to simply configure multiple concrete BulkDataPersister used by a Mirror Service.

What do I mean by simply? All you need to do is to configure your regular BulkDataPersister implementations (using either regular property file or spring injection) and then configure the Mirror Service to use Composite BulkDataPersister.

Internally, a thread pool will be used to delegate the execution of all implementations in parallel.

Inconsistency Handling

The Composite BulkDataPersister approach could leads to inconsistency between the various BulkDataPersisters, in case one of the delegates fails.

To solve this, the following steps can be taken:

• A generic strategy can be used for handling partial failures.

• One of the default strategies will use another space to store all information related to a particular failure.

• Ultimately, if the strategy itself fails, an exception is propagated to the Mirror Service, which then triggers the redo-log mechanism (so one batch might be received several time by a delegate).

Implementation of this pattern with full documentation is available here.

Since inconsistencies pose a potential risk using this approach, I recommend keeping in mind all failure scenarios, including the most unlikely ones. If everything fails, data modifications can be received several times by persistency layers due to the Mirror Service redo-log granularity.

Nevertheless, this simple pattern allows leveraging the performance improvement provided by the Mirror Service in highly distributed and dispersed environments, while still ensuring full reliability. In fact, one of our clients with strong WAN performance requirements is already using it.

Further improvements to this pattern might include transaction usage and JMX management.

Do you consider those features important? Are the available entry points sufficient to your needs?