Data Strategies for Efficient and Secure Edge Computing Services
Key Takeaways from this article include the following:
- IoT systems are distributed, which brings new complexity and security risks.
- Distributed systems need tiered strategies for data processing and storage.
- Edge computing architecture also requires new approaches for data at rest and in motion.
The challenges of building and properly managing an Internet of Things (IoT) network have grown alongside the benefits of the technology. At the end of the day, IoT is a distributed processing framework that comes with the challenges of distributed systems. As a result, developers and architects have to consider the business needs for the data (latency, security and volume requirements), cost and as well as other factors to best determine how to architect a distributed environment.
Data Considerations for Edge Computing Services
There is a long list of design questions that comes with executing an IoT network: where does computation happen? Where and how do you store and encrypt data? Do you require encryption for data in motion or just at rest? How do you coordinate workflows across devices? And finally, how much does this cost? While this is an intimidating list, we can build good practices that have evolved both prior to the advent of IoT and more recently with the increasing use of edge computing.
First, let’s take a look at computation and data storage. When possible, computation should happen close to the data. By minimizing transmission time, you reduce the overall latency for receiving results. Remember that distributing computation can increase overall system complexity, creating new vulnerabilities in various endpoints, so it’s important to keep it simple.
One approach is to do minimal processing on IoT devices themselves. A data collection device may just need to package a payload of data, add routing and authentication to the payload, then send it to another device for further processing. There are some instances, however, where computing close to the collection site is necessary.
One example of computing close to a sensor is in anomaly detection. If an IoT device is monitoring the equipment function, you want to know about a malfunction as soon as possible.
On a factory floor, for instance, you may send sensor data to an edge device that analyzes data from all sensors on the floor so the analysis and alerts can be performed quickly after a malfunction and you can be alerted. If immediate analysis is not necessary, however, it may be more cost-effective to send data to a centralized ingestion point, such as an ingestion service in the cloud that then writes the processed data to a data store. This delayed option makes sense when collecting data to train machine learning models for example. Assigning data to different processing locations thus involves understanding the business purpose of the data, which helps you decide how to architect your networks for edge and cloud processing.
A two-tiered data transmission strategy is also an option. In this case, you will extract the most useful information from raw data, such as aggregates (e.g. sums, averages), or variations from baseline predictions and then transmit them using a low-latency network. Note that low-latency networks may cost too much to send large volumes of data that are not needed immediately. Alternatively, you can use batch file uploads of data collected at edge devices or ship disks (known as shuttles) from edge devices to a centralized data collection site.
Ensuring Security for Edge Computing Services
Computation and data transmission plans don’t get far without a proper security and encryption strategy. In the interest of business and personal confidentiality, use secure protocols for network transmissions, such as TLS (Transport Layer Security) and X.509. Luckily, cloud providers typically provide encryption at rest, but consider whether you want a cloud provider to manage encryption keys or if you want to self-manage keys. When managing them yourself, you should also ask whether you will use a cloud provider’s encryption key infrastructure or your own. Finally, message digests ensure that there is no tampering from a source or while data is on the move.
For the user experience and day-to-day operations, coordinating workflows is crucial. Keeping workflows simple is the best way to make them functional and reliable. You can do so by spreading out responsibilities within the distributed system.
You may use edge devices for pre-processing, for example, while ingestion services handle data collection more centrally. A best practice is to ensure that the ingestion service does minimal processing. Typically it will validate only source devices and authenticate messages, then write the data to a message queue or other persistent storage solution for further processing.
Determining how you want to store data for long-term analysis is critical for workflow streamlining. Historical data can be especially useful for training machine learning models, such as anomaly detection tasks, and may be valuable. Finally, implement streaming workflows that utilize basic business processing logic, such as limits on how long to wait for late data transmissions and rules for how to handle missing data in a time series.
Controlling costs is the final piece of the distributed system puzzle. In particular, data processing and transfer costs will likely dominate your IoT network bill. However, there are ways to mitigate the cost of these operations. As costly as storage is, a tiered storage strategy is a powerful way to reduce costs. This strategy relies on keeping new data on fast and accessible, albeit expensive, devices such as solid state drives, while older data is stored on slower but lower-cost cold storage devices. A tiered storage strategy can aid the problem of data storage costs.
Developers and architects have many options when it comes to engineering IoT processing pipelines and the best choices are often driven by application specific constraints.