SOAR’n With DevOps(?)

When you need Security at WebScale

Published in

The Startup

12 min readSep 8, 2020

Achieving a technology-enabled business nirvana is a goal of a cloud adoption (cloud migration, digital transformation, digital modernization, IT modernization, whatever buzz phrase you like). This is an end state where an organization has achieved some aspirational business agility where any competitor can be challenged and adapt to the marketplace in near-real-time — securely! Dear reader, close your eyes and imagine such a moment in time. Imagine that global webscale infrastructure of thousands of microservices. Imagine thousands of transactions per second in all time zones. Now imagine your current organization keeping the lights on at web-scale. It is okay if you need to open your eyes to escape the horror.

Adjusting the organization to operate at web-scale securely is a multi-dimensional task. There are so many different concerns spanning people, process, and technology; where does anyone even begin? To anchor this story in some real-world context, I will describe some initial technology conditions of the enterprise wanting to become webscale.

The customer base is external and not internal corporate users.
The organization has a few customer-facing products (the software applications) that are composed of various technologies.
Some products are simple and some resemble the complexity of supply chain or customer relationship management solutions.
Some applications are composed of imperfect microservices and others are monoliths/microlith cash machines.
There are some applications running in Kubernetes and the organization has generally started to adopt cloud-native concepts.
The customer-facing portfolio is not 100% in the cloud — the payment gateways are still on-premise (as an example).
The organization is on the cusp of expanding beyond using just two cloud regions.
There are active product roadmaps and active infrastructure roadmaps.

That is the reference enterprise throughout this article.

Regarding the people dimension, This article will also address a couple of questions your CISO will need answers for:

Can my SOC (Security Operations Center) proactively see when a customer-facing app is compromised?
Can my application operators resolve issues that span multiple clouds?
Can I leverage DevOps to deliver webscale security capabilities?

The primary focus of this article will be the exploration of “SOAR” as a means to answer those questions for our representative enterprise.

What is SOAR? SOAR stands for Security Orchestration Automation and Response. Conceptually, it is an approach to deliver a robust security capability by aligning tools and processes in such a manner that an enterprise can successfully respond to security incidents quickly. Each security vendor has its own story for SOAR and definition. Curiously, each vendor’s SOAR story aligns with the products and services it offers — hmm, never would have thought that (wink). For the purposes of this article, SOAR represents one of the rare occurrences where the acronym describes what needs to be achieved. An organization attempting to SOAR is pursuing a level of Automated Security capabilities coupled with an intelligent(maybe AI-assisted) Response capability that can all be Orchestrated. A vendor may frame the discussion around threat detection — this refers to a response capability. Another vendor may frame a SOAR conversation around vulnerability management — this is about automating the management.

This article is organized as follows:

Response: The organization needs response capability. What does this mean in the SOAR context?
Orchestration and Automation: These terms are often conflated. An attempt is made to differentiate and define these terms.
DevOps: Is there a relationship between SOAR and DevOps? How does a software development organization leverage or participate in SOAR related activities?
Parting Thoughts: What are the next steps a CISO should take?

Response

The last letter in SOAR is probably the most important letter in the acronym. Keeping the lights on is all about how an organization responds to events that can turn the lights out. Traditionally, organizations already have various incidence response plans. There is even an ISO specification for incident management. Therefore, the concepts required to deal with all types of incidences (security or otherwise) are broadly known.

An effective response capability depends on good sensing capabilities — AKA observability; you can’t manage what you can’t measure. This is the first place where the SecOps teams and Ops teams in the composite SRE team should align. The orchestration and automation will not matter if the sensing to support response is disjointed, disparate, and incomplete. Therefore, those teams should also use the same monitoring, logging, and tracing tools. I cannot stress enough how important the unification of visibility is for a technology organization operating at webscale. Unified visibility leads to a unified canonical language across the organization. Other industries have decades of research and targeted analysis regarding the impact of language and error prevention[1]. If the mix skill SRE team was assembled correctly (no small feat and topic for another time), that team will coalesce on common language eventually. However, coalescence should be facilitated by deploying common visibility solutions. This does not mean that there shouldn’t be specialized dashboards or a subset of tools only used by specific teams.

For example, you cannot have a security operations conversation without mentioning SIEM. SIEM stands for Security Information and Event Management. SIEM solutions are what security professionals use as their observability solution. SIEM products in the marketplace also provide some SecOps capabilities. Thusly, it makes sense that security professionals reading this story would assume that focusing on a SIEM solution solves the response component of SOAR. However, consider that the business value creators are software engineers working on products that make money (unless you are a security consultancy). SIEM products are built and sold to security people and not to software engineers. Therefore, your webscale observability solution should be composed of tools your webscale builders and operators are using — making a handful of security people happy while upsetting a factor of ten (or even 100) more software engineers is an emotionally and organizational incompetent thing to do. The types of observability tools one should focus using are often called APMs.

APMs (Application Performance Management) solutions provide various views into the state of applications and software platforms. The leading APMs even feature real-time machine learning-based analysis. Describing APMs in detail is beyond the scope of this article. However, the target customers for APMs are software engineers working on products that make money. Ask your random software engineer working on products that make money about APM products and that will solicit various product names and opinions on dashboards and what granularity of metrics/logs you should use. Ask this same population about SIEM tools…crickets.

The APM/SIEM integration space is ripe for disruption. Maybe Datadog intends to do such a thing. Anyone that can merge the APM and SIEM ecosystems will help bring the Op and Sec communities together — and make buckets of money in the process.

Can Response answer the following question?

Can my SOC (Security Operations Center) even proactively see when a customer-facing app is compromised?

YES

Response depends on a robust observability capability. With a proper instrumented environment your SOC will have the intelligence gathering capability required to respond.

Orchestration and Automation

Now that we have established the need to collect actionable intelligence about the computing environment, we need to take action. This is where the orchestration and automation from SOAR come into play. First question — what is the difference between orchestration and automation?

Ah, yes that question. These internets are full of various definitions.

Automation refers to converting an existing mechanical action that requires human labor into actions that can be executed without human labor. A simple example is converting all manual shell commands into a bash script that contains all the shell commands. An iteration on the shell script could be an ansible playbook or chef cookbook. The automation processes will result in the creation of various automated systems.

Orchestration refers to collecting and sequencing various processes together. Those processes could be a mix of human processes and automated systems. For example, in some regulated industries, you need a person (entity) of record to initiate an action. This initiation could lead many automated systems to be invoked. Therefore, one can consider orchestration as macro in scope and automation as micro in scope (like micro and macroeconomics).

Orchestration and automation are the foundational capabilities of any serious cloud journey. Each cloud provider delivers various services that are manipulatable via an API. The preferred way to interact with cloud providers is with their APIs. Additionally, each cloud provider has some sort of automation language (like YAML) or tooling construct. There are third-party cloud management solutions like Terraform that interact with cloud provider APIs. Cloud providers themselves promote automation. Azure features workflow automation in the Security Center product. Google cloud’s API and CLI often have more parameter options than their web console. AWS has its automation markup called Cloud Formation. Instead of getting lost in different cloud providers, this article will use AWS.

For starting the initial entry into AWS with best practices, AWS Control Tower allows the creation of a new multi-account AWS environment. AWS Control Tower is the managed version of an AWS Landing Zone solution, which itself represents an implementation of AWS’s Well-Architected Framework. The landing zone solution is code and markup to automate AWS management at the account level. With this automated account management framework, an organization can layer automation of security controls- even automated responses! With AWS, it is possible to implement automated responses to security misconfigurations with custom AWS Config rules. Without belaboring the details of AWS Config, if someone changes a security group (AKA firewall) configuration to one not aligned with corporate policy, a custom config rule can automatically revert the change.

The capabilities of AWS Config enabled an automated response based on the ability to detect a change. As mentioned in the previous section, one cannot respond to something that cannot be detected. Once there is a rich intelligence gathering capability it is possible to automate the actions that can be taken. However, obtaining rich intelligence gathering capability requires the instrumentation of EVERYTHING in the environment. It is impossible to manually instrument a globally dispersed webscale environment. Therefore the deployment of the instrumentation must be automated. One could say there is a small feedback loop between creating a response capability and creating automation.

The AWS example can be expanded into an orchestration scenario. We will assume the same security group change. The detected change is not a clear policy violation. In reality, such situations are common and lead to exposure via misconfiguration. Making the appropriate risk assessment to prevent exposure is where an enterprise wants to spend the high-value human intelligence effort on (as opposed to running scanners all day). Thus, the following three actions must be performed:

Route the notification of the configuration change to the correct parties
Perform a risk-based assessment of the change
Perform remediation if required

Configuring and deploying the notification is just another automation task. In our AWS example, one can use AWS SNS and subscriptions. Step three is just more automation. Someone on the SRE team would change Cloud Formation and/or Terraform and have the configuration applied via automation. Step two is a completely human intelligence-driven process. It may involve a ticketing system, a review board, a security test, and/or other activities.

Essentially, the O and A in SOAR address the following question:

Can my application operators be able to resolve issues that span multiple clouds?

Yes, assuming the threat intelligence exists, an organization with mature automation and orchestration can respond accordingly.

DevOps

With the elements of SOAR defined, how can we relate this security community term with the rest of the software community? Specifically, how does SOAR relate to DevOps? First, we need a definition of DevOps. We will use the following from research[2].

DevOps is a development methodology aimed at bridging the gap between Development (Dev) and Operations, emphasizing communication and collaboration, continuous integration, quality assurance and delivery with automated deployment using a set of development practices

There are obvious alignments with how this article describes SOAR and the presented definition of DevOps. For example, the automation and orchestration elements of SOAR align with software delivery via automated deployments. A successful response capability begins with situational awareness but a resolution (at the macro scale implied by orchestration) will require efficient communication and collaboration. However, there are still gaps between DevOps and SOAR.

One example of such a gap is the significant difference between how DevOps is described for the enterprise and how security is described within the enterprise. DevOps practices are centered around teams within an enterprise. Specifically, how individual teams deliver their features and products. Security is about protecting the enterprise in totality. This is reflected in the APM versus SIEM conversation. The marketing around APM solutions suggests customizations for various types of teams. SIEM products focus on enterprise-wide integrations. Additionally, there are DevOps practices while SOAR is still a conceptual path to delivering robust security. Can these differing views of the enterprise and concept versus practices be bridged?

When examining the DevSecOps manifesto, It represents an attempt to align the security community with the developer community. Specifically, by pushing the security community into a development frame of reference with the opening gambit of security-as-code. This allows some alignment around languages and processes between software creators and security practitioners. One can consider security-as-code in the same vein as infrastructure-as-code (IaC); whereas, cloud providers are configured via APIs implying that the infrastructure lifecycle can be managed similarly to the application lifecycle. Thusly, security-as-code allows the management of security controls using the same application lifecycle concepts. Alongside DevSecOps there is SecDevOps and DevOpsSec. At the macro level, they are identical but at the micro-level there are differences — this article makes no recommendation between them.

Quickly, we must examine the people component of the enterprise. Which community of technologists will have robust experience with cloud automation technologies like AWS CloudFormation and Terraform? These are your cloud platform engineers and not your security engineers. Your platform engineers naturally work with your application teams (many are former application developers!). Your security engineers should be working with platform engineers to deliver robust security automation.

This leaves us with the question:

Can I leverage DevOps to deliver webscale security capabilities?

YES, but SOAR does NOT directly enable this. DevOps provides a series of practices and methodologies the security organization can adopt. SOAR highlights what elements the security team should focus on. Identifying the automation they should participate in, determine where they should merge process orchestration, and enhancing collective observability. The last point enabling response.

Parting Thoughts

This article explained the SOAR acronym beyond the Gartner definition and without a security vendor focus. The purpose was to explore if SOAR provides a path to answer CISO questions when operating at webscale. The article then attempted to relate SOAR to DevOps. The CISO questions were answerable!

Hopefully, the dissection of the SOAR acronym aides those not in the security space to understand what the cloud-capable security practitioner wants to achieve. For the security practitioner, it allows him/her to frame conversations with the rest of the enterprise in terms they are familiar with (also why this article has little security jargon).

SOAR provides a framework that enables the CISO to have a conversation across the enterprise about delivering robust security capabilities. For the enterprise on the cusp of being webscale, the foremost goal is achieving a robust observability capability. It is the root of Response. This will require a conversation about standard metrics and measures. This includes mundane conversations about implementing standard log messages and message formats. For our reference enterprise, it will require developing a roadmap that identifies when some acceptable level of uniformity will be achieved. The same roadmap development will be required regarding automation and orchestration.

The CISO will not “own” all the skills necessary to succeed. As briefly mentioned earlier, the population of people in our reference enterprise with the cloud automation skills will not be from the security community. Thusly, any roadmap the CISO would even propose is highly dependent on what the engineering organization is pursuing. This is why DevSecOps is important. The CISO’s organization needs to align with how the engineering organization executes delivery.

Finally, we come to the security marketplace where vendors are hawking their SOAR solutions. Some even refer to their solutions as “SOAR platforms”. If you are a technology executive and someone is selling you a “SOAR Platform” it should:

Have out-of-the-box API integrations with a cloud provider with an ability to invoke provider automation. For example, can the SOAR platform apply an AWS Cloud Formation template? If it can’t, how can it deliver automation for the cloud?
It must have the ability to create workflows with lots of API integrations. Can the workflow be invoked via an API? Can the workflow invoke other APIs? This workflow integration allows for Orchestration. Look for Git, Slack, Jira as example integrations.
It must have customizable dashboards and integrations with SIEM products and standard security ecosystem products ( think IDS, WAF, etc). Obviously integrations with cloud provider observability solutions (like AWS CloudWatch). Remember, your primary webscale dashboards should be APMs while the SOAR displays are specific to your security team. Hence, it should be able to integrate with APMs (in both directions). As APMs use machine learning to help identify the signal from the noise, your chosen “SOAR Platform” should have the same capability.

To repeat, a technology executive should not deploy a magic SOAR tool if the organization is not already pursuing an integrated DevSecOps (or SecDecOps or DevOpsSec) approach.

The expanding SOAR ecosystem will include ever-improving best practices and solutions. SOAR as a concept provides a framework to better protecting the enterprise at webscale. Delivering automated responses to security incidents is several times better than manual methods. Merging the goals of SOAR (evolved security through orchestration and automated response) with DevOps practices will allow the webscale enterprise to respond in near-realtime — the operational nirvana that everyone in the C-suite desires, the lights never off.

Come SOAR with me.

— Nicholas

References

Sexton J. and Helmreich, R., Analyzing Cockpit Communication: The links between language, performance, error, and workload. University of Texas Team Research Project, Austin, USA, 2000
Ramtin Jabbari, Nauman bin Ali, Kai Petersen, and Binish Tanveer. 2016. What is DevOps? A Systematic Mapping Study on Definitions and Practices. In Proceedings of the Scientific Workshop Proceedings of XP2016 (XP ’16 Workshops). Association for Computing Machinery, New York, NY, USA, Article 12, 1–11.