Cloud, Security and Risk Management: 2016

Friday, October 28, 2016

CISO Mindmap - Business Enablement

While doing some research about CISO function, noticed a very good MindMap created by Rafeeq Rehman.

While what he has come up with is mindmap, I will try to deconstruct this mindmap to elaborate more about the various functions performed by CISO.

Let's begin:

Business Enablement
Security Operations
Selling Infosec (internally)
Compliance and Audit
Security Architecture
Project Delivery lifecycle
Risk Management
Governance
Identity Management
Budget
HR and Legal

So why I numbered them and in the order?

I believe Business Enablement is the most important function of a CISO. If (s)he doesn't know the business where (s)he operates, it will be a very difficult job to continue his duties as CISO. Consider a person coming from a technology background with no knowledge of Retail Business. If that person is hired as a CISO because (s)he knows the technology, that may not be a good deal. The only reason to become a successful CISO, one must know which business he is involved in. To understand the security function, he must understand the business climate.

If this retail business has a requirement of storing credit card information into their systems, CISO's job is to make sure appropriate PCI-DSS controls are in place so the data doesn't get into the wrong hands. While at the same time, making sure that PCI-DSS is not coming into the way of enabling the business to accept credit cards transactions. Yes, security is a requirement but not at the cost of not doing business.

That's why I rate business enablement as a very important function as a CISO.

What are some of the way CISO can enable business to adopt technology and still not come in their way?

Cloud Computing
Mobile technologies
Internet of things
Artificial Intelligence
Data Analytics
Crypto currencies / Blockchain
Mergers and Acquisitions

We will review each of these items in details in the following blog posts.

Friday, July 1, 2016

CIS: Center for Internet Security

CIS:
Center for Internet Security - "The Center for Internet Security mobilizes a broad community of stakeholders to contribute their knowledge, experience and expertise to identify, validate, promote and sustain the adoption of cybersecurity's best practices!"

Two resources of interest:

Secure Configuration Guides (aka "Benchmarks")
"Top 20" Critical Security Controls (CSC)

Benchmarks vs. Critical Security Controls:

Benchmarks are technology specific checklists that provide prescriptive guidance for secure configuration
CSCs are security program level activities:

Inventory your items
Securely configure them
Patch them
Reduce privileges
Train the humans
Monitor the access

CIS Benchmarks:

140 benchmarks available here
AWS CIS Foundations Benchmark here

Saturday, April 23, 2016

TPM (Trusted Platform Module)

TPM or Trusted Platform Module as referred by TCG (Trusted Computing Group) is a microcontroller used in Laptop and now also on servers to ensure the integrity of the platform. TPM can securely store artifacts used to authenticate the platform. These artifacts can include passwords, certificates, or encryption keys. A TPM can also be used to store platform measurements that help ensure that the platform remains trustworthy. Authentication (ensuring that the platform can prove that it is what it claims to be) and attestation (a process helping to prove that a platform is trustworthy and has not been breached) are necessary steps to ensure safer computing in all environments.

source: http://www.trustedcomputinggroup.org

Above image depicts the overall function of TPM module. Standard use case I have seen is ensuring secure boot process of servers. Secure boot will validate the code run at each step in the process, and stop the boot if the code is incorrect. The first step is to measure each piece of code before it is run. In this context, a measurement is effectively a SHA-1 hash of the code, taken before it is executed. The hash is stored in a platform configuration register (PCR) in the TPM.

TPM 1.2 only support SHA-1 algorithm

Each TPM has at least 24 PCRs. The TCG Generic Server Specification, v1.0, March 2005, defines the PCR assignments for boot-time integrity measurements. The table below shows a typical PCR configuration. The context indicates if the values are determined based on the node hardware (firmware) or the software provisioned onto the node. Some values are influenced by firmware versions, disk sizes, and other low-level information.

Therefore, it is important to have good practices in place around configuration management to ensure that each system deployed is configured exactly as desired.

Register	What is measured	Context
PCR-00	Core Root of Trust Measurement (CRTM), BIOS code, Host platform extensions	Hardware
PCR-01	Host platform configuration	Hardware
PCR-02	Option ROM code	Hardware
PCR-03	Option ROM configuration and data	Hardware
PCR-04	Initial Program Loader (IPL) code. For example, master boot record.	Software
PCR-05	IPL code configuration and data	Software
PCR-06	State transition and wake events	Software
PCR-07	Host platform manufacturer control	Software
PCR-08	Platform specific, often kernel, kernel extensions, and drivers	Software
PCR-09	Platform specific, often Initramfs	Software
PCR-10 to PCR-23	Platform specific	Software

So there are very good use case of TPM to ensure secure boot and integrity of hardware - who all are using TPM? There are many institutions who runs their private clouds have been seen using TPM chipset on their servers while many public clouds do not support TPM - why? that's mystery!

Monday, April 11, 2016

Hadoop Stack

In this post, I am exploring Hadoop stack and it's ecosystem.

Hadoop:

Oozie:

Oozie is a server-based workflow engine specialized in running workflow jobs with actions. It is typically used for managing Apache Hadoop Map/Reduce and Pig Jobs. In Oozie, there are workflow jobs and Coordinator jobs. Typically workflow jobs are Directed Acyclical Graph (DAG) of actions while coordinator jobs are recurrent Ozzie workflow jobs which are triggered by time (or frequency) and based on data availability.

Due to Oozie's integration with rest of the Hadoop stack, it is easy to support several types of Hadoop jobs out of the box.

From a product point of view, it's a Java Web Application that runs on Java Servlet container. In Oozie, a workflow is a collection of actions (Hadoop Map/Reduce jobs, Pig jobs) arranged in control dependency DAG (Direct Acyclic Graph)... Here control dependency dictates that from one action to another action - but second action can't run until the first action is completed.

These workflow definitions are written in hPDL (Process Definition Language). Oozie workflow actions start their jobs in remote systems (like Pig, Hadoop etc.). Once completed, remote systems callback Oozie to notify the action completion and then Oozie proceeds to the next actoin in workflow.

credit: https://oozie.apache.org/docs/4.2.0/DG_Overview.html

From Stackoverflow: DAG (Direct Acyclic Graph)

Graph = structure consisting of nodes, that are connected to each other with edges.
Directed = The connections between nodes (edges) have a direction: A --> B is not the same as B -> A.
Acyclic = "non-circular" = moving from node to node by following the edges, you will never encounter the same node for the second time.

A good example of a directed acyclic graph is a tree. Note, however, not all directed acyclic graphs are trees :)

Monday, March 14, 2016

Bare Metal - A dreary (but essential) part of Cloud

Recently I got a chance to attend Open Compute Summit 2016 in San Jose, CA. It was full of industry peers from web scale companies such as Facebook, Google, Microsoft along with many financial institutions like Goldman Sachs, Bloomberg, Fidelity, etc. Overall theme of this summit was to embrace the openness in hardware and embrace commodity hardware.

From historical point of view, OCP was a project initiated by Facebook few years ago where they opened many of the hardware components - motherboard, power supply, chassis, rack, later switch etc. as they needed things at scale and doing it using branded servers (pre-cut for enterprise by HP, Dell, IBM) wasn’t going to cut for them - thus they created (designed) their own gears. More details here.

Below is one of the OCP certified server (courtesy: http://www.wiwynn.com). It features very minimalistic feature and a stripped down version of typical Rack Mount server.

Coming back to this year’s summit, considering this was my first year at OCP summit, I had certain expectations and while being there I can say one thing for sure - “Bare Metal does look interesting again”. Why I say that? If it was only about Bare Metal, it certainly a boring thing but when you combine that bare metal with API and particularly if you are operating at a scale (doesn’t have to be at Facebook scale), it’s fun time. Let’s take a look.

Keynote started by Facebook’s Jason Taylor with journey over last year or so and where the community stands now. But fun begun when (another Jason) Jason Waxman from Intel talking about their involvement and how the server and storage (think NVMe) industry is growing and what they see coming in future - including Xeon D and Yosemite.

A good talk was given by Peter Winzer of Bell Labs. I knew UNIX and C born out of Bell Labs but it was fascinating to hear about the history and future of Bell Labs with innovations going in Fiber Optics and capacity of Fiber - with 100G is no brainer but 1Tbps is in the horizon.

Microsoft Azure’s CTO Mark Russinovich started discussing about how open Microsoft is - which to be honest other than their .NET framework being open, I had no idea that they have been contributing back to Open Source community - well, it’s a good thing! In past Microsoft has contributed their server design specs - Open Cloud Server (OCS) and Switch Abstraction Interface (SAI). OCS is the same server and data center design that powers their Azure Hyper-Scale Cloud (~ 1M servers). Using SAI and available APIs help network infrastructure providers integrate software with hardware platforms that are continually and rapidly evolving at cloud speed and scale. For this year, they have been working on a network switch and proposed a new innovation for OCP inclusion called Software Open Networking in the Cloud (SONiC). More details here.

There were many interesting technologies showcased in Expo but one struck my mind was Storage Archival Solution. This basic configuration can hold 26,112 disks (7.8 PB) with expandable modules spanning pair of datacenter row gives total capacity of up to 181 petabytes (HUGE!!). Is AWS Glacier running this underneath? Some details here.

For a coder at heart, it was good demonstration by companies such as Microsoft and Intel showing some love for OpenBMC to manage the bare metal. Firmware update seems to be common pain across industries but innovative approach taken by Intel and Microsoft using Capsule - which bring API and Envelop via UEFI - try to make it easier than it seems.

Overall, it was a good exposure to newer generation of hardware technologies and by accepting contributions from multiple companies, OCP is moving towards standardization on hardware. With standardization and API integration, it will make fun to play with Bare Metal.

Do you still think Bare Metal is dreary?

This article originally appeared on LinkedIn under the title Bare Metal - A dreary (but essential) part of Cloud

Monday, January 4, 2016

Log Management

What are available options for Log Management?

There are logs everywhere - systems, applications, users, devices, thermostats, refrigerators, microwaves - you name it.. and as your deployment grows, your complexity increases. When you need to analyze a situation or an outage, logs are your lifesaver.
There are tons of tools available - open-source, pay-per-use and few others.. Let's take a look at some of them here:

What are different tools/framework available to store these logs and analyze the logs - may be in real time, if not, after-the-fact analysis?

Splunk:

Splunk is a powerful log analysis software with choice of running in enterprise data center or over a cloud.
1. Splunk Enterprise: Search, monitor and analyze any machine data for powerful new insights.

2. Splunk Cloud: This provides Splunk enterprise and all it's feature in a SaaS way over the cloud.

3. Splunk Light: At a miniature scale of Splunk Enterprise - Log search and analysis for small IT environments

4. Hunk: Hunk provides the power to rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop without the need to move or replicate data.

Apache Flume:

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

Flume deploys as one or more agents, that's contained within it's own instance of JVM (Java Virtual Machine). Agents has three components: sources, sinks, and channels. An agent must have at least one of each in order to run. Sources collect incoming data as events. Sinks write events out, and channels provide a queue to connect the source and sink. Flume allows Hadoop users ingest high-volume streaming data directly into HDFS for storage.

credit: flume.apache.org

Apache Kafka:

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. Kafka is fast, scalable, durable and distributed by design. This was a LinkedIn project at some point later open-sourced and now one of the top-level Apache open source project. There are many companies who has deployed Kafka in their infrastructure.

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

Kafka maintains feeds of messages in categories called topics.
We'll call processes that publish messages to a Kafka topic producers.
We'll call processes that subscribe to topics and process the feed of published messages consumers..
Kafka is run as a cluster comprised of one or more servers each of which is called a broker.

So, at a high level, producers send messages over the network to the Kafka cluster which in turn serves them up to consumers like this:

credit: kafka.apache.org

Kakfa has a good ecosystems surrounding the main product. With wide range of choice to select from, it might be a good "free" version of log management tool. For a large systems deployments, Kakfa can act as a broker with multiple publishers - may be from Syslog-ng (with agent running on each systems), FluentD (again, with fluentd agents running on nodes and plugin on Kakfa) may solve the purpose of log collections. With log4j appender, it might be extremely easy for applications which uses log4j framework, use it seamlessly. Once you have logs ingested via these subsystems, searching logs can be cumbersome. With Kafka, there are some alternatives where you can dump these data into HDFS and run a Hive query against it and voila, you get your analysis.

Still there is some work to be done in terms of how easily someone can retrieve it like via Kibana dashboard.

ELK:

When we are talking about logs, how can we not remember ELK stack. When I got introduced to ELK stack, it was presented as a Splunk alternative as open source. I agree, it does have the feature sets to complete against core splunk product and if there is a right sizing (think: small, medium) involved, we don't need Splunk at all and ELK stack might be good enough. Though in recent usage, we have found some scalability issues when we reach few hundred gigs of logs per day.

Though one good feature I like of ELK stack is all-in-one. I have my log aggregator, search indexer and dashboard within one suite of application.

With so many choices, it becomes difficult to rely on one or the other. If someone has enough money to spend Splunk might be the right choice but if someone can throw a developer at it, either ELK stack or Kafka - depends on the scale at which they are growing, might be better off.

Cloud, Security and Risk Management