Articles

H2O is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, machine learning algorithms are implemented. At H2O, we design every operation, be it data transformation, training of machine learning models, or even parsing to utilize the distributed computation model. In order to work with big data fast, it’s necessary.

However, a single operation usually can not utilize clusters’ computational resources to the very maximum. Data needs to be distributed across the cluster, and many operations require sequential execution of tasks, which, even if implemented in a distributed manner, follow after each other and require data exchange. These and many other smaller factors, if summed up together, may introduce a significant overhead.

Source de l’article sur DZONE