Distributed analytics: Solid foundation for data-dependent world
It’s hardly breaking news to suggest that monolithic approaches to data management and analytics are not fit for purpose in a connected, big data world. The interesting question is how to extend the data and content architecture, and the accompanying analytics, to encompass new and unfamiliar sources such as Internet of Things (IoT) data, without throwing away millions of dollars of existing investment.
Step forward distributed analytics–a new way of thinking about how to extend capabilities out into the data landscape, by providing appropriate data management and analytics “in the moment” while only transporting the data and insights back to the core that are necessary.
A concert of machine learning, analytical agents and the cloud
We are not going to claim that distributed analytics is going to happen overnight, nor even that all the technology required to achieve the vision is available yet, at least not in an enterprise-friendly package. However, we believe that as a collection of design and approach principles, it has some of the most solid foundations for how enterprises (across the industries) should start planning for a much more connected, data-dependent world.
Briefly, the idea that already difficult-to-manage data warehouses and inflexible data governance processes can be adapted to incorporate much greater amounts of data, at much higher speeds, is largely discredited. While it might be theoretically possible, the costs associated with applying a legacy approach to a relatively new and growing problem would be too high. The conundrum is, that legacy has cost enterprises a small fortune in investment, a fortune that is still far from paying its returns. And core data-related business processes such as financial management and regulatory reporting require a heavily governed approach to ensure accuracy (if not always timeliness).
Some argue that it is time to throw away these technologies and start afresh with data lakes and the cloud, among other options. We argue it’s possible to have both: a locked-down core of technology matched to much more flexible technologies that augment it, something we refer to as the “elastic architecture.”
But what does this actually mean, in practice? The (slightly disappointing) answer is that no two distributed analytics solutions will look the same, but they will share characteristics.
In the short term, the data lake will become a standard among most organizations. By its very nature, it should be a mix of both data landing zone and longer-term data storage, as well as, increasingly, home to some pretty complex data science-led analysis. Likely best located in the cloud (public, private or–much more likely–a hybrid deployment that helps span the bridge between on- and off-premises), the data lake could be considered a buffer between a relatively well-organized internal data architecture and the more chaotic world beyond. This would, in essence, augment existing investments in information management technologies and provide the route to “bring the data home” if deemed necessary.
Longer term, we see a future in spreading analytic capabilities outside that core, beyond the warehouse and the data lake, and out toward the machines and devices generating the vast amounts of data that are causing the “problem.” Such capabilities can be delivered by analytical agents: packages of software loaded locally that don’t rely on heavy compute or memory. Using local resources backed up by machine learning-driven optimization running at the center, these analytical agents would only send the data back to the core that is necessary for things like exceptional events, improving the machine learning algorithms and regulatory reporting. Immediate proximity means that optimization of physical processes could happen in near real time, without relying on the transport of vast quantities of data from and to the edge.
A wide array of IoT applications will benefit from distributed analytics
In the context of the IoT, there are a number of clear and immediate applications for distributed analytics. These include:
- Critical processing at the edge for low-latency requirement IoT applications like connected and autonomous vehicles
- Manufacturing 4.0 applications and devices, such as a robotic arm in a factory production line: making sure that any deviation from the usual output on the production line is immediately recognized and remedied
- Medical implants or medication dosage systems requiring quick-turnaround adjustments that depend on local processing of physiological, environmental and device functionality data
- Time-critical public security applications dependent on analytics at the edge for triggers/actions/alerts; for example, facial recognition for tracking criminals
- Support for IoT applications and device deployments in rural or remote locations, where connectivity is limited but local processing is likely to be needed to maximize effectiveness.
Distributed and edge analytics also have a role to play in filtering the IoT data stream. The sheer volume of IoT data is already presenting a challenge to networks and processing power, and this will only increase as more and more devices are connected. Edge analytics running on a local router, switch, IoT gateway or server can allow analytics to be performed on-the-fly, so that only the most relevant or important data need be transported to the cloud for deeper analytics (and in many cases, integration with other data or enterprise applications, or incorporation into machine learning models).
Orchestrating these capabilities is only just beginning, and the technology is still emerging. Solutions are already being marketed by players including IBM, HPE, Cisco (with SAS) and Huawei, but there is still some way to go before technologies and integration strategies mature. But it is already clear that, as an approach that looks to the future without abandoning existing requirements and the investments that support them, distributed analytics provides practical steps forward for the enterprise and will be a key element of a successful IoT strategy.
For more information about IoT research and analysis from Ovum, which belongs to the same corporate family as IoT Institute, send email to email@example.com.