Using NoSQL and Embedded Databases to Prevent Corrupted Systems

Embedded data management for IoT has entered the limelight following two upsized IPOs for Confluent and Couchbase.

Callum Cyrus

August 4, 2021

5 Min Read
Image of data center against idyllic view of bright sun over clouds during sunny day. Concept of big data

As IoT engineers cram more data analysis onto devices to enable better decision making in the moment, there’s a renewed focus on database protocols that accelerate queries and prevent information loss.

In July 2021 Couchbase became the latest embedded database provider to try its luck on the trading floor, raising $200 million in its Nasdaq listing. Couchbase focuses on providing multi-model databases, documents records and key value stores for enterprises building mission-critical IoT applications.

Among Couchbase’s clients is the Irish budget airline Ryanair, which implemented the technology to direct airline data in its memory-constrained smartphone booking app.  Ryanair claims mobile bookings that had taken 5 minutes to complete were reduced to 30 seconds, even without a reliable network connection.

Historically, Internet of Things (IoT) endpoints with limited resources have transported information to data servers for post-processing and analysis because these servers have greater resources. But the risks of not having an adequate database system for co-ordinating information at source are also significant and have increased alongside data volumes collected by intelligent IoT applications – everything from computer vision to medical diagnostics.

Power outages or system reboots could jeopardize the readings in machine-to-machine networks and undermine the performance of connected systems. Moreover, delays in retrieving information could hinder machine learning models deployed at the edge, which were designed to draw on cloud resources less often.

SQL vs. NoSQL

SQL is the most mature and perhaps well-known language for building and querying databases, but it was historically considered a poor fit for IoT applications. The language uses rigid columns and rows to organize data according to relational logic. However this is less appropriate for unstructured information, which lack labels to denote its significance – as is often produced by cloud servers and smart devices.

In addition, storing SQL is pretty data and resource intensive. Some of the benefits -such as relational schema which govern how each data set links to one another – are dampened given that many IoT logs are principally a basic record. Despite the drawbacks, SQL is still sometimes preferred for IoT use-cases where the database must be rock solid, particularly in line with a standard known as ACID compliance (atomicity, consistency, isolation and durability.)

Operators in sectors such as finance or public policy may need to ensure SQL can be ported onto IoT gateways in order to maintain ACID compliance. In addition, some of the more recent SQL flavors incorporate document handling features that supports IoT implementation more easily.

NoSQL database languages refer to techniques which incorporate elements of SQL but do not store data tables in relational schema. These alternatives are often used by IoT device developers because they offer flexibility in key areas. But they often lack some of the contingencies afforded with relational models.

Popular NoSQL varieties include document databases, which efficiently store data as JSON documents for general-purpose usage – – or columnar databases that can query each column of a dataset without scanning every row.   More generally, some NoSQL database query systems appear to be match-made for IoT. Time series databases, in particular, are best suited for efficiently categorizing each IoT recording, as part of a log file or data history. Time-stamped information can then be dispatched to the cloud for more complex analytics functions, providing a user with complete oversight of the field as monitored by IoT over time.

“The smartest organizations understand that keeping a historical record of real-time samples and the metrics and key performance indicators they power is key to detect, remediate, and restore when things go wrong, and to introspect and amplify when things go right,” said  Brian Gilmore, director of IoT product management at InfluxData, a san Francisco-based time series database provider.

By applying machine learning, time series analyses can be contextualized to benefit transactional IoT records in sectors ranging from supply chain management to e-commerce and online advertising. Among other uses, these strategies deploy compression to reduce database entries taken up by fluctuating IoT readings. For example, a rolling average integer calculated every hour can substitute minute-by-minute datapoints.

The Compatibility Dilemma

Regardless of the database protocol implemented, engineers must ensure compatibility with the specifications of all endpoints in the network.  While embedded database languages will have more modest specifications than their traditional cousins, there is no guarantee they can be installed on every node within a large machine-to-machine ecosystem.

Some database packages require at least 24-bit microprocessor technology, and they cannot be installed on legacy 16-bit endpoints. The operator would then have to decide whether the use-case merits spending to upgrade, or else backhaul information to the cloud where post-processing is standardized.

Moreover, many IoT device varieties pose unique challenges for database engineers to circumvent. Sensors will have slender memory profiles, for instance, while IoT gateways need data writing to take place concurrently with read access, to exchange information from multiple endpoints.

But as new IoT microprocessors deliver more compute power at the edge — and often with on-board capacity for implementing machine learning – demand has increased for new database offerings which incorporate bespoke features for embedded environments.

Embedded database systems increasingly court IoT with low-latency architectures, security barriers to prevent decryption of stored information and enrichment fields that can record data from images and sound.

Efficient data relays at the source of their generation would also be of great help to IT engineers as updating databases over wireless local or wide area networks may hinder rapid machine learning implementation at the edge, especially where connectivity is unreliable.

InfluxData’s Gilmore said: “Computer vision and digital audio processing are an extension of the sensor model, and while their outputs are far more complex than your typical thermistor, serializing and encoding images and audio clips brings them into the time-series domain.

“Tagged with ML-derived enrichment fields, these images and clips can be recalled, analyzed, modeled, and correlated with other time-series data in the business.”

 

Sign Up for the Newsletter
The most up-to-date news and insights into the latest emerging technologies ... delivered right to your inbox!

You May Also Like