Compute Grid – Parallel Processing new generation

GridGain introduced some years ago the neat capacity to execute remotely some Java code without any deployment in a MapReduce style. New GigaSpaces 6.6 release introduces a similar feature.

How do they compare? Are they equally powerful? To try answering this question we will develop on both platforms a simple application summing characters of some words.

API

A similar high level interface allows to execute multiple times a task then reduce all results in a final answer. In GigaSpace this interface is Task, in GridGain GridTask.

While similar in appearance their usage is quite different.

In GridGain a task will be split in several Jobs server side with full control on those individual Jobs. This includes arguments but also job types. Each individual job execution can then be split into sub jobs

In GigaSpaces a task will be executed on some (or all) nodes but identically. You cannot split and generate dynamically parameterized tasks. This can be simulated using ExecutorBuilder but client side. In our case we will have to manually reduce client side our results. Tasks executions are here intended to be parallelized on a set of partitions then reduced thus limiting dynamic behaviors.

Both products allow to control what to do with incoming results during reduce phase and when to stop and return final result.

GigaSpaces

Our sample implementation would be here divided into two pieces.

The task itself:

import org.openspaces.core.executor.Task;

public class GridWordCountTask implements Task {
  private static final long serialVersionUID = 1L;
  private final String word;

  public GridWordCountTask(final String word) {
    this.word = word;
  }

  public Integer execute() throws Exception {
    return this.word.length();
  }

}

and the client side code:

final IJSpace proxy = IJSpace.class.cast(SpaceFinder.find(args[0]));
final GigaSpace gigaSpace = new GigaSpaceConfigurer(proxy).gigaSpace();
final ExecutorBuilder executorBuilder = gigaSpace.executorBuilder(new SumReducer(Integer.class));
final String sentence = "This is a very simple test.";

for (final String word : sentence.split(" ")) {
  executorBuilder.add(new GridWordCountTask(word), 0);
}

Pros:

  • Native support with GigaSpace component.
  • Resources defined in your Processing Unit server side.
  • Built in reducers.

Cons:

  • Classes definition modification implies space restart. This is currently a known limitation GigaSpaces R&D will overcome in a further release.
  • Routing execution has to be manually handled.
  • Multiple concurrent executions are strictly identical. You can’t provide different arguments (you have to retrieve arguments as part of the logic -through the space for instance-).

GridGain

Here is the task implementation:

public class GridWordCountTask extends GridTaskSplitAdapter {

  private static final long serialVersionUID = 1L;

  @Override
  protected Collection split(final int gridSize, final String argument) throws GridException {
    final List jobs = new ArrayList(1);
    for (final String word : argument.split(" ")) {
      jobs.add(new GridJobAdapter(argument) {

        private static final long serialVersionUID = 1L;

        public Integer execute() {
          return word.length();
        }

      });

    }
    return jobs;
  }

  public Integer reduce(final List results) throws GridException {
    int sum = 0;
    for (final GridJobResult result : results) {
      sum += (Integer) result.getData();
    }
    return sum;
  }

}

and the client code:

GridFactory.start();
try {
  final String sentence = "This is a very simple test.";

  final GridTaskFuture future = GridFactory.getGrid().execute(GridWordCountTask.class, sentence);
  System.out.println("Sentence <"+sentence+"> has <"+future.get()+"> characters.");
} finally {
  GridFactory.stop(true);
}

Pros:

  • Annotation facility based on AOP (Gridify).
  • Easily extensible/pluggable to a number of products.
  • Good documentation/samples.
  • Progress indicators of tasks/jobs can be generated.

Cons:

  • No user interface.

ExecutorService support

Both products support ExecutorService to execute tasks. This almost remove the vendor specific code and facilitate switching between local and remote execution (might be useful for testing).

The only mandatory step is to ensure that your Callable is Serializable. Relying on following interface will do the job:

public interface GridCallable<E extends Serializable> extends Callable<E>, Serializable {
}

Conclusion

While still young GigaSpaces native support of remote java code is a good news and will certainly become a killer feature. Integrated approach advocated by GigaSpaces allows to really easily submit task execution which can natively using the simple Space API access in memory information. Furthermore data affinity is a key feature here and doesn’t have to be managed.
On the other hand GridGain focuses on task executions and allows to plug a number of components (including caching component) through their nice SPI support. Remote execution is more mature and supports concepts such as job stealing and pluggable load balancing.

For now GridGain specialized framework allows more complex remote execution work flows but introduction of this new Executor feature gives us a sign that GigaSpaces is willing to fill the gap sooner than later.
GigaSpaces integrated (by design) approach (in memory caching, virtual ESB, parallel processing and transactional support) is definitively worth a look if you privilege ease of use and management or are looking for very low latency application platform providing capabilities you may have to incorporate into your application by linking several pieces of technology together.

Resources