Beyond AI Marketing: A McAfee Data Scientist Sheds Light
Last year, a joke related to security vendors’ use of artificial intelligence popped up on the internet. “How do you know that a security vendor really uses AI in their product?” asked Anton Chuvakin, Ph.D., head of solution strategy at Chronicle Security, who was then a Gartner analyst. “If they say they do it, then you know they don’t.”
There are a variety of views about what, precisely, is required for a system to earn the artificial intelligence designation. More certain is the fact that marketers have cannibalized the term “AI” in cybersecurity and elsewhere. As a result, answering the question of what precisely a vendor means when describing products drawing on “machine learning,” “deep learning” or “AI” requires analyzing the offering from a mathematical perspective, according to Celeste Fralick, Ph.D., chief data scientist and senior principal engineer at McAfee.
In an interview at McAfee MPOWER, Fralick provided perspective on machine learning and deep learning as well as AI. Incidentally, she sees those three forming a triangle, with machine learning at the base and AI at the apex. She also touched on her work with statistics, research, quality control and other roles that paved the way to her current role, as well as an earlier data scientist role within Intel’s Internet of Things group.
The Importance of Context
Evaluating the efficacy of a given machine learning or deep learning strategy requires a contextual understanding. In that regard, data science has much in common with many normal daily activities. “You wouldn’t go to the beach in a tuxedo. You would go to the beach in a pair of shorts,” said Fralick, using a tuxedo as a metaphor for AI.
[IoT Security Summit is the conference where you learn to secure the full IoT stack, from cloud to the edge. Visit the website for more information.]
While AI is becoming increasingly vital for cybersecurity, just because a technology is more complex doesn’t necessarily make it more efficacious. “In AI, the compute is more intense. The intelligence is better, obviously, but maybe that’s not what the use case requires,” Fralick said.
In a similar vein, Steve Grobman, senior vice president and chief technology officer at McAfee, said in a press session at McAfee MPOWER: “We have to understand the limitations of AI. it’s an amazing tool, but it’s not magic.”
No matter what a given use case needs in the end, Fralick recommends starting with the most straightforward techniques before moving to machine learning, deep learning and ultimately, AI. “I use basic statistics, and then apply machine learning,” she said.
When determining which type of tool is best for the job, it is vital to ask the end user what kind of accuracy they want. “Ask them: ‘What can you tolerate? What are your expectations for the algorithm.’”
Fralick also counsels organizations planning a data-science-based project to analyze how much similar research has been done on the topic. “See if anybody else has written anything about it,” she said. If a proposed project relates to, say, adversarial machine learning, data scientists should “go to Google Scholar and start looking for adversarial machine learning.” That’s a better alternative to coming up with, say, a strategy for defending against adversarial machine learning only to discover later that someone else had the same general idea.
Deploying a technology that leverages machine learning can be relatively simple. “Most machine learning nowadays uses statistical models that have been around for years,” Fralick said. Examples include linear regression, stepwise regression and mixed model regression. “Take any of your regressions. slap it into a software program that you’re going to train and verify,” she added. The system, continually learning from itself, provides a feedback loop.
More complex than machine learning, deep learning may be a powerful technique, but its trendy status is sparking backlash as well. Technology Review recently published an article based on an interview with New York University professor Gary Marcus that states: “the [broader AI research] field’s current overemphasis on [deep learning] may well lead to its demise.” While the technique excels at speech and image recognition, it stumbles in understanding “conversations or causal relationships,” the article continued. Marcus, along with collaborator Ernest Davis, recently penned the book “Rebooting AI: Building Artificial Intelligence We Can Trust” advocating for a more lucid and holistic approach to artificial intelligence research.
“A lot of computer scientists nowadays come out of school, and the first thing they do is focus on deep learning,” Fralick said. While deep learning has its place, organizations deploying it should have a good reason, Fralick said. Deep learning algorithms tend to have a heavy footprint, are slow to train, and are susceptible to adversarial machine learning. “So why not start with the simple stuff?”
Machine learning isn’t necessarily required for deep learning, Fralick said. “I was taught to do deep learning without a computer — by hand. It would drive you absolutely bonkers.” But it is possible. “But of course, if you want to do good deep learning, and you want to do it continuously, you’re going to use machine learning. But why start there?”
A Big-Picture View
While the roots of data science are certainly not new, the data scientist profession has but a short history. It makes sense then to draw on disciplines, predating the “data science” term, that have a strong data focus.
Fralick provides an example of this principle when asked a question about support vector machines, which Wikipedia defines as: “supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.”
“Someone asked me: ‘How can you predict the growth of a cluster on one side of a support vector machine?’” Fralick recalled. “For some reason, my head went to bacteria.” Having studied microbiology and chemistry as an undergraduate, and later bioengineering at the graduate level, Fralick once had a job working as a bench microbiologist. “I also spent quite a bit of time in quality and reliability, which is very systematic. You look at things from start to finish,” Fralick said, reflecting on her experience in the semiconductor and medical device fields. “And that’s what I think data scientists really need. Above and beyond the math, the statistics, and the software and hardware, data scientists need to be able to connect the dots from start to finish. And if they don’t understand that, they’re not going to be able to understand data lineage.”