Saturday, August 15, 2015

Amazon Web Services (AWS) Risk and Compliance

This is a summary of AWS’s Risk and Compliance White Paper

AWS publishes SOC1 report - formerly known as Statement on Auditing Standards (SAS) 70, Service Organization report, widely recognized auditing standard developed by AICPA (American Institute of Certified Public Accountants). 

SOC 1 audit is an in-depth audit of design and operating effectiveness of AWS’s defined control objectives and control activities. 

Type II - refers that each of the controls described in reports are not only evaluated for adequacy of design, but are also tested for operating effectiveness by the external auditor. 

With ISO 27001 certification AWS is complying with a broad, comprehensive security standard and follows best practices in maintaining a secure environment. 

With PCI Data Security Standards (PCI DSS), AWS is complying with set of controls important to companies that handle credit card information. 

With AWS’s compliance with FISMA standards, AWS complies with wide range of specific control requirements by US government agencies. 

Risk Management:
AWS management has developed a strategic business plan which includes risk identification and the implementation of controls to mitigate and manage risks. Based on my understanding, AWS management re-evaluate those plans at least twice a year. 

Also, AWS compliance team have adopted various Information Security and Compliance framework - including but not limited to COBIT, ISO 27001/27002, AICPA Trust Service Principles, NIST 800-53 and PCI DSS v3.1. 

Additionally, AWS regularly scan all their Internet facing services for possible vulnerabilities and notified parties involved in remediation. External Pen Test (VA test) are also performed by reputed independent companies and repots are shared with AWS management. 

Reports/Certifications:

FedRAMP: AWS is Federal Risk and Authorization Management Program (FedRAMPsm) compliant Cloud Service Provider. 

FIPS 140-2: The Federal Information Processing Standard (FIPS) Publication 140-2 is a US government security standard that specifies the security requirements for cryptographic modules protecting sensitive information. AWS is operating their GovCloud (US) with FIPS 140-2 validated hardware. 

FISMA and DIACAP:
To allow US government agencies to comply with FISMA (Federal Information Security Management Act), AWS infrastructure has been evaluated by independent assessors for a variety of government systems as part of their system owner’s approval process.
Many agencies have successfully achieved security authorization for systems hosted in AWS in accordance with Risk Management Framework (RMF) process defined in NIST 800-37 and DoD Information Assurance Certification and Accreditation Process (DIACAP).

HIPPA:
Leveraging secure AWS environment to process, maintain and store protected health information, AWS is enabling entities to work in AWS cloud who need to comply with US Health Insurance Portability and Accountability Act (HIPPA). 

ISO 9001:
AWS has achieved ISO 9001 certification to directly support customers who develop, migrate and operate their quality-controlled IT systems in AWS cloud. This allows customers to utilize AWS’s compliance report as evidence of their ISO 9001 programs for industry specific quality programs such as ISO/TS 16949 in auto sector, ISO 13485 in medical devices, GxP in life science, AS9100 in aerospace industry. 

ISO 27001:
AWS has achieved ISO 27001 certification of their Information Security Management Systems (ISMS) covering AWS infrastructure, data centers, and multiple cloud services. 

ITAR:
AWS GovCloud (US) supports US International Traffic in Arms Regulations (ITAR) compliance. Companies subject to ITAR export regulations must control unintended exports by restricting access to protected data to US persons and restricting physical location of that data to US. AWS GovCloud provides such facilities and comply to the required compliance requirements. 

PCI DSS Level 1:
AWS is level 1 compliant under PCI DSS (Payment Card Industry Data Security Standards). Based on February 2013 guidelines by PCI Security Standards Council, AWS incorporated those guidelines in AWS PCI Compliance Package for customers. AWS PCI Compliance package include AWS PCI Attestation of Compliance (AoC), which shows that AWS has been successfully validated against standard applicable to a Level 1 Service Provider under PCI DSS Version 3.1.

SOC1/SOC2/SOC3:
AWS publishes Service Organization Controls 1 (SOC 1), Type II report. Audit of this report is done in accordance with AICPA: AT 801 (formerly SSAE 16) and International Standards for Assurance Engagements No. 3402 (ISAE 3402). 

This dual report intended to meet a broad range of financial auditing requirement of US and international bodies. 

In addition to SOC 1, AWS also publishes SOC 2, Type II report - that expands the evaluation of controls to the criteria set forth by the AICPA Trust Service Principles. These principle defines leading practice controls relevant to security, availability, processing integrity, confidentiality, and privacy applicable to service organization such as AWS. 

SOC 3 report is publicly-available summary of AWS SOC 2 report. The report includes the external auditor’s opinion of the operation of controls based on (AICPA’s Security Trust Principle included in SOC 2 report), the assertion from AWS management regarding effectiveness of controls, and overview of AWS infrastructure and Services.



Friday, July 10, 2015

Cloud Principles

This post explains some of the cloud pricinples to be utilized when working with Amazon Web Services. Though the references here are for AWS services, pricinples can be used across multiple clouds.

Principles:
- Design for failure and nothing will fail:
  • What happens if a node in your system fails? How do you recognize that failure? How do I replace that node? What kind of scenarios do I have to plan for?
  • What are my single points of failure? If a load balancer is sitting in front of an array of application servers, what if that load balancer fails?
  • If there are master and slaves in your architecture, what if the master node fails? How does the failover occur and how is a new slave instantiated and brought into sync with the master?
  • What happens to my application if the dependent services changes its interface?
  • What if downstream service times out or returns an exception?
  • What if the cache keys grow beyond memory limit of an instance?

Best practice:
  1. Failover gracefully using Elastic IPs: Elastic IP is a static IP that is dynamically re-mappable. You can quickly remap and failover to another set of servers so that your traffic is routed to the new servers. It works great when you want to upgrade from old to new versions or in case of hardware failures
  2. Utilize multiple Availability Zones: Availability Zones are conceptually like logical datacenters. By deploying your architecture to multiple availability zones, you can ensure highly availability. Utilize Amazon RDS Multi-AZ [21] deployment functionality to automatically replicate database updates across multiple Availability Zones.
  3. Maintain an Amazon Machine Image so that you can restore and clone environments very easily in a different Availability Zone; Maintain multiple Database slaves across Availability Zones and setup hot replication.
  4. Utilize Amazon CloudWatch (or various real-time open source monitoring tools) to get more visibility and take appropriate actions in case of hardware failure or performance degradation. Setup an Auto scaling group to maintain a fixed fleet size so that it replaces unhealthy Amazon EC2 instances by new ones.
  5. Utilize Amazon EBS and set up cron jobs so that incremental snapshots are automatically uploaded to Amazon S3 and data is persisted independent of your instances. 
  6. Utilize Amazon RDS and set the retention period for backups, so that it can perform automated backups.
- Decouple your components:
the more loosely coupled the components of the system, the bigger and better it scales.
  • Which business component or feature could be isolated from current monolithic application and can run standalone separately?
  • And then how can I add more instances of that component without breaking my current system and at the same time serve more users?
  • How much effort will it take to encapsulate the component so that it can interact with other components asynchronously?
Best prCTICES:
  1. Use Amazon SQS to isolate components 
  2. Use Amazon SQS as buffers between components
  3. Design every component such that it expose a service interface and is responsible for its own scalability in all appropriate dimensions and interacts with other components asynchronously
  4. Bundle the logical construct of a component into an Amazon Machine Image so that it can be deployed more often 
  5. Make your applications as stateless as possible. Store session state outside of component (in Amazon SimpleDB, if appropriate)
- Implement elasticity
  1. Proactive Cyclic Scaling: Periodic scaling that occurs at fixed interval (daily, weekly, monthly, quarterly)
  2. Proactive Event-based Scaling: Scaling just when you are expecting a big surge of traffic requests due to a scheduled business event (new product launch, marketing campaigns) 
  3. Auto-scaling based on demand. By using a monitoring service, your system can send triggers to take appropriate actions so that it scales up or down based on metrics (utilization of the servers or network i/o, for instance)
Automate Your Infrastructure
  • Create a library of “recipes” – small frequently-used scripts (for installation and configuration)
  • Manage the configuration and deployment process using agents bundled inside an AMI 
  • Bootstrap your instances
Bootstrap Your Instances
  1. Recreate the (Dev, staging, Production) environment with few clicks and minimal effort
  2. More control over your abstract cloud-based resources
  3. Reduce human-induced deployment errors
  4. Create a Self Healing and Self-discoverable environment which is more resilient to hardware failure
Best Practices:

  1. Define Auto-scaling groups for different clusters using the Amazon Auto-scaling feature in Amazon EC2.
  2. Monitor your system metrics (CPU, Memory, Disk I/O, Network I/O) using Amazon CloudWatch and take appropriate actions (launching new AMIs dynamically using the Auto-scaling service) or send notifications.
  3. Store and retrieve machine configuration information dynamically: Utilize Amazon DynamoDB to fetch config data during boot-time of an instance (eg. database connection strings). SimpleDB may also be used to store information about an instance such as its IP address, machine name and role.
  4. Design a build process such that it dumps the latest builds to a bucket in Amazon S3; download the latest version of an application from during system startup.
  5. Invest in building resource management tools (Automated scripts, pre-configured images) or Use smart open source configuration management tools like Chef, Puppet, CFEngine or Genome.
  6. Bundle Just Enough Operating System (JeOS22) and your software dependencies into an Amazon Machine Image so that it is easier to manage and maintain. Pass configuration files or parameters at launch time and retrieve user data23 and instance metadata after launch.
  7. Reduce bundling and launch time by booting from Amazon EBS volumes24 and attaching multiple Amazon EBS volumes to an instance. Create snapshots of common volumes and share snapshots25 among accounts wherever appropriate.
  8. Application components should not assume health or location of hardware it is running on. For example, dynamically attach the IP address of a new node to the cluster. Automatically failover and start a new clone in case of a failure.

- Think Parallel: The cloud makes parallelization effortless.
Best Practices:
  1. Multi-thread your Amazon S3 requests  
  2. Multi-thread your Amazon SimpleDB GET and BATCHPUT requests
  3. Create a JobFlow using the Amazon Elastic MapReduce Service for each of your daily batch processes (indexing, log analysis etc.) which will compute the job in parallel and save time.
  4. Use the Elastic Load Balancing service and spread your load across multiple web app servers dynamically
- Keep Dynamic Data close to Compute and Static Data close to End User:
Best Practices:
  1. Ship your data drives to Amazon using the Import/Export service. It may be cheaper and faster to move large amounts of data using the sneakernet28 than to upload using the Internet.
  2. Utilize the same Availability Zone to launch a cluster of machines
  3. Create a distribution of your Amazon S3 bucket and let Amazon CloudFront caches content in that bucket across all the 14 edge locations around the world





Sunday, July 5, 2015

Amazon Web Services (AWS) Security - an outside view

AWS and Security - A view from outside

Shared Responsibility Model:

Secure SDLC
     - static code analysis run as a part of build process
     - threat modeling
MFA
     - Google authenticator/RSA
MFA for AWS service API
     - terminating EC2 instance
     - sensitive data in S3 bucket
Security of Access Keys
     - must be secured 
     - use IAM roles for EC2 management
Enable CloudTrail
Run Trusted Advisor
EC2:
- encrypted file systems
- disabling password-only access to your guests, 
- utilizing some form of multi-factor authentication to gain access to instances (or at a minimum certificate-based SSH Version 2 access).
- privilege escalation mechanism with logging on a per-user basis.
- utilize certificatebased SSHv2 to access the virtual instance,
- disable remote root login,
- use command-line logging, 
- use ‘sudo’ for privilege escalation.
- generate your own key pairs in order
Firewall
- ports which are required
- certain CIDR blocks
- think about IPTables
EBS
- encrypt volume
- use DoD methods to wipe volume before deleting
ELB
- any particular cipher to use? for PCI/SOX compliance?
- use Server Order preference
- use of Perfect Forward Secrecy
VPC
- VPC security group
- IP range, Internet gateway, virtual private gateway
- Need Secret Access Key of the account
- To consider subnet and route tables
- To consider firewall/security groups
- Network ACLs:  inbound/outbound from a subnet within VPC
- ENI: Elastic Network Interface for management network / security appliance on network

CloudFront:
- By default, you can deliver content to viewers over HTTPS by using https://dxxxxx.cloudfront.net/image.jpg. If you want to deliver your content over HTTPS using your own domain name and your own SSL certificate, you can use SNI Custom SSL or Dedicated IP Custom SSL.
- With Server Name Identification (SNI) Custom SSL, CloudFront relies on the SNI extension of the TLS protocol,
- With Dedicated IP Custom SSL, CloudFront dedicates IP addresses to your SSL certificate at each CloudFront edge location so that CloudFront can associate the incoming requests with the proper SSL certificate.
S3 security:
- Use IAM policies
- Use of ACL to grant read/write access to other AWS account users
- Bucket policies: add/deny permission within single object in a bucket
- Restrict access to specific resource using POLICY KEYS: Based on request time (date condition),  whether request was send using SSL (Boolean condition),  Requestor IP address (IP condition) or requestor’s client (string condition)
- Use SSL endpoint for S3 via internet or via EC2
- Use client encryption library
- Use server side encryption (SSE)- S3 managed encryption
- S3 metadata not encrypted
- S3 data to Glacier archival at regular frequency
- s3 delete control via mfa
- CORS: cross-origin resource sharing - allows S3 objects to be referenced in HTML pages else they are considered cross-site scripting
DynamoDB:
- DynamoDB resources and API permissions via IAM
- Database level permission that allow/deny at item(row) and attribute(column) 
- Fine-grained access control allow you to specify via policy under what circumstances user/application can access DynamoDB table.
- IAM policy can restrict access individual items in the tables, attributes in these items or both.
- Allow Web Identity Federation instead of using IAM users via AWS STS (Secure Token Services)
- Each request must contain HMAC-SHA256 signature in header when sending request to DynamoDB
Amazon RDS:
- Access Control: Master User Account and Password, Create additional user accounts, DB Security Group - similar to EC2 security group, which defaults to deny all”. Access can be granted by adding database port in firewall via network IP range or EC2 security group.
- Using IAM further granular access can be granted.
- Network Isolation in muti-az deployment using DB Subnet groups
- RDS instance in VPC can be access via EC2 instances outside of VPC using SSH Bastion host and Internet Gateway.
- Encryption at RDS is available as means of transport encryption. SSL certificate installed on MySQL and SQL server - so app to DB connection is secure. 
- Encryption at rest is supported via TDE (Transparent Data Encryption) for SQL and Oracle Enterprise Edition.
- Encryption at rest is not supported for MySQL natively and application must send encrypted data if they want data-at-rest encrypted.
- Point-in-time recovery via automated backup with db log and tran log stored for user-specified retention period.
- Restore upto last 5 minutes.. store the backup for 35 days by-default.
- During backup storage I/O is suspended but with multi-az deployment, backup is done at standby, so no performance impact.
AWS RedShift:
- Cluster is closed to everyone by-default.
- Utilize security groups for network access to cluster.
- Database user permission is per cluster basis instead of per table basis. Though user can see the data in table rows generated by his activities; rows generated by other is not visible to the user.
- User who create an object is owner and only owner/superuser can query, grant or modify permission on the object.
- Redshift data is spread across multiple compute nodes in a cluster. Snapshot backups are uploaded to S3 of user-defined period.
- Four-tier Key Based architecture:
  • Data Encryption Keys: Encrypts Data Blocks in Cluster
  • Database Key: Encrypts Data Encryption Keys in Cluster
  • Cluster Key: Encrypts Database Keys in Cluster. Use AWS or HSM to store the cluster key.
  • Master Key: Encrypts Cluster Key, if stored in AWS. Encrypts the Cluster-Key-Encrypted-Database-Key if Cluster key is in HSM.

- RedShift uses Hardware-Accelerated SSL
- Offers strong cipher suites that uses Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) protocol allows PFS (Perfect Forward Secrecy).
AWS ElastiCache:
- Cache Security group like firewall
- By default, network access is turned off
- Use authorize Cache Security Group ingress API/CLI to authorize EC2 Security Group (in turn allows EC2 instances)
- Backup/Snapshot of ElastiCache Redis cluster point-in-time backup or scheduled backup.
AWS CloudSearch:
- Access to search domain’s endpoint is restricted by IP address so that only authorized hosts can submit documents and send search requests. 
- IP address authorization is used only to control access to the document and search endpoints.
AWS SQS:
- Access is based on AWS acct/IAM user and once authenticated, user has full access to all user operations. 
- Default access to individual queue is restricted to the AWS account that created it.
- Data stored in SQS is not encrypted by AWS but can be encrypted/decrypted by means of application. 
AWS SNS:
- Amazon SNS delivers notifications to clients using a “push” mechanism that eliminates the need to periodically check or “poll” for new information and updates. Amazon SNS can be leveraged to build highly reliable, event-driven workflows and messaging applications without the need for complex middleware and application management. The potential uses for Amazon SNS include monitoring applications, workflow systems, time-sensitive information updates, mobile applications, and many others.
- SNS provided access control mechanism so topics and message are secured against unauthorized access. 
- Topic owners can set policies on who can publish/subscribe to a topic.
AWS SWF:
- Access is granted based on an AWS account/IAM user. 
- Actors that participate in the execution of a workflow - deciders, activity workers, workflow administrators - must be IAM users under the AWS account that owns the AWS SWF resources. Other AWS account can’t be granted access to AWS SWF workflows.
AWS SES:
- AWS SES requires users to verify their email address or domain in order to confirm that they own it and to prevent others from using it. To verify a domain, Amazon SES requires the sender to publish a DNS record that Amazon SES supplies as proof of control over the domain. 
- SES uses content-filtering technologies to help detect and block messages containing viruses or malware before they can be sent.
- SES maintain complaint feedback loops with major ISPs.
- SES supports authentication mechanisms such as Sender Policy Framework (SPF) and DomainKeys Identified Mail (DKIM). When you authenticate an email, you provide evidence to ISPs that you own the domain. 
- For SES over SMTP, it requires to encrypt the connection using TLS - supported mechanisms: STARTTLS and TLSWrapper. 
- For SES over HTTP, communication will be protected by TLS through AWS SES’s HTTPS endpoint.
AWS Kinesis:
- Logical access to Kinesis is via AWS IAM, controlling which Kinesis operations users have permission to perform. 
- By associating EC2 instance with IAM role, credentials available as a part of role is available to the applications on that EC2 instances. Thus it avoid using long-term AWS security credentials.

AWS IAM:
- Allows to create multiple users and manage permission for each users within AWS account. 
- User permissions must be granted explicitly.
- IAM is integrated with AWS Marketplace to control software subscription, usage and cost. 
- Role uses temporary security credentials to delegate access to user/service that normally don’t have access to AWS resources.
- Temporary security credentials is in short life-span (default 12 hours) and it can’t be reused after expiry. 
- Temporary security credential are: Security Token, an Access Key ID, a Secret Access Key
- Useful in situations such as:
  • Federated (non-AWS) User access:
    • Identity federation between AWS and non-AWS users in corporate identity and authorization system.
    • Using SAML, AWS as Service Provider and provide users with federated Single-Sign-On (SSO) to the AWS management Console or get federated access to call AWS APIs. 
  • Cross-Account Access: For organization who uses multiple AWS accounts to manage their resources, a role can provider users who have permission in one account to access resources in another account.
  • Applications running on EC2 instance that need to access AWS resources: If EC2 need to make calls to S3 or DynamoDB resources, it can utilize role allowing management of large fleet of instances/autoscaling.

AWS CloudHSM:
- Dedicated Hardware Security Module (HSM) appliance to provide secure cryptographic key storage and operations within an intrusion-resistant, temper-evident device. 
- Variety of use cases such as database encryption, Digital Rights Management (DRM), Public Key Infrastructure (PKI), authentication and authorization, document signing, and transaction processing. 
- Support some of the strongest cryptographic algorithm available - AES, RSA, ECC etc. 
- Connection to CloudHSM available with EC2 and VPC via SSL/TLS using two-way digital certificate authentication
- Cryptographic partition is a logical and physical security boundary that restricts access to keys, so only owner of keys can control the keys and perform operations on HSM. 
- CloudHSM’s temper detection erase the cryptographic key material and generate event logs if tempering (physical or logical) detected. After 3 unsuccessful attempt to access HSM partition with Admin credentials, HSM appliance erase its HSM partition.

CloudTrail:
- Enable CloudTrail will send event to S3 bucket in 5 minutes. Data captured: Info about every API calls, location of that call, either console, CLI, SDK; captures console sign-in events, create log record every time AWS account owner, federated users, IAM user sign-in.
- CloudTrail access can be limited to only certain users via IAM.


Monday, March 2, 2015

XEN security bug may be forcing AWS, RackSpace to reboot their servers

It was reported by XEN security team about a bug where a buggy or malicious HVM gues can crash the host or read data relating to other guests or the hypervisor itself. This certainly cause a bigger security risk in public cloud environment - such as Amazon Web Services or RackSpace Public Cloud where they use XEN has a choice of hypervisor for guest VM.

Probably that's the reason it was reported last week of AWS reboot across regions.


Saturday, February 28, 2015

Risk Management Models

There are various Risk Management Models around, some of them discussed here:

General Risk Management Model: 


This five step general risk management model can be used in virtually any risk management process:

Step 1: Asset Identification

Identify and classify the assets, systems, and processes that need protection because they are vulnerable to threats. 

Step 2: Threat Assessment

After identifying assets, you identify both the threats and the vulnerabilities associated with each assets and the likelihood of their occurrence. All things have vulnerabilities; one of the key is to examine exploitable vulnerabilities. To list: CWE (from mitre.org), SANS Top 25 list, OWASP Top 10 list.. 

Step 3: Impact Determination and Quantification:

An impact is the loss created when a threat is realized and exploits a vulnerability. Tangible impact results in financial loss or physical damage. An intangible impact, such as impact on the reputation of a company, assigning a financial value can be difficult. 

Step 4: Control Design and Evaluation:

Determine the controls (also called countermeasure or safeguards) to put in place to mitigate risks. List of software control can be found in NIST SP 800-53
Step 5: Residual Risk Management:
A risk that remains after implementing controls is termed as residual risk. Multiple controls can be applied to achieve better defense posture through defense in depth.

Software Engineering Institute Model:


1. Identify:

Examine the system, enumerating potential risks.

2. Analyze:

Convert the risk data gathered into information that can be used to make decisions. Evaluate the impact, probability, and timeframe of the risk. Classify and prioritize each of the risks.

3. Plan: 

Review and evaluate the risks and decide what actions to take to mitigate them. Implement the plan.

4. Track:

Monitor the risks and the mitigation plans. Review periodically to measure progress and identify new risks.

5. Control:

Make corrections for deviations from risk mitigation plans. Changes in business procedures may require adjustments in plans or actions, as do faulty plans and risks that become problems.






Tuesday, February 17, 2015

Security Models

Security Models are used to understand the systems and processes developed to enforce security principles. There are three key elements which plays role in model implementation:

  • People
  • Processes
  • Technology

Various models discussed here are:

Access Control Models: 

There are various different access control models provide different aspect of protection but Access Control List (ACL) is the most commonly used. ACL is a list that contains the subject that has access right to a particular object. An ACL will identify not only the subject, but also the specific access that subject has for the object.

Other models discussed below: Discretionary Access Control (DAC), Mandatory Access Control (MAC), Role-based Access Control (RBAC), Rule-based Access Control (RBA)

Bell-LaPadula Confidentiality Model:


Bell-LaPadula security model is combination of mandatory and discretionary access control mechanism.

First Principle, known as - Simple Security Rule - that no subject can read information from an object with a security classification higher than that possessed by the subject itself. This is also refferred as "no-read-up" rule.

So arrange the access level in hierarchal form, with defined higher and lower level of access.

Bell-LaPadula was designed to preserve "confidentiality" - focused on read and write access.

Reading material higher than subject's level is a form of unauthorized access.
Courtesy: rutgures.edu

Second Principle, known as *-property (star property) - states that subject can write an object only if it's security classification is less than or equal to the object's security classification.

Also known as "No-Write-Down" principle.

This prevents the dissemination of information users that do not have appropriate level of access.

Usage example - to prevent data leakage, publishing bank balance - to a public page..


Take-Grant Model:

  • Built upon Graph Theory
  • Distinct Advantage: Definitively Determine Rights - Unique Rights (take and grant)
courtesy: http://clinuxpro.com/wp-content/uploads/2013/10/Take-Grant-Model.png

  • Value lies in ability to analyze an implementation is complete or might be capable to leak information.




Friday, January 2, 2015

Is OpenStack a hype?

As OpeStack matures and many more enterprises starts adopting OpenStack, it's not longer a hype. Don't believe me? Come and join me @ OpenStack summit in May 2015.

The OpenStack Summit is a five-day conference for developers, users, and administrators of OpenStack Cloud Software.