Ensure in-transit and at-rest encryption is enabled for Amazon EMR clusters

Data encryption helps prevent unauthorized users from reading data on a cluster and associated data storage systems. This includes data saved to persistent media, known as data at rest, and data that may be intercepted as it travels the network, known as data in transit.

Risk Level: High
Cloud Entity: EMR Cluster
CloudGuard Rule ID: D9.AWS.CRY.43
Category: Analytics

GSL LOGIC

EmrCluster should have securityConfiguration

REMEDIATION

From Portal
EMR versions 4.8.0 and later, supports the use of security configuration to specify settings for encrypting data at rest, data in transit, or both. When you enable at-rest data encryption, you can choose to encrypt EMRFS data in Amazon S3, data in local disks, or both. Each security configuration that you create is stored in Amazon EMR rather than in the cluster configuration, so you can easily reuse a configuration to specify data encryption settings whenever you create a cluster.

Following are the steps to create a Security Configuration using the AWS console:

  1. Open the Amazon EMR console at https://console.aws.amazon.com/emr.
  2. In the navigation pane, choose Security Configurations, Create security configuration.
  3. Type a Name for the security configuration.
  4. Choose options for Encryption and Authentication as described in the sections below and then choose Create.

From TF

  1. Use following terraform code to create the security configuration for EMR clusters.
resource "aws_emr_security_configuration" "foo" {
	name = "emrsc_example"
	
	configuration = EOF
	{
		"EncryptionConfiguration": {
			"AtRestEncryptionConfiguration": {
				"S3EncryptionConfiguration": {
					"EncryptionMode": "VALUE"
				},
				"LocalDiskEncryptionConfiguration": {
					"EncryptionKeyProviderType": "VALUE",
					"AwsKmsKey": "KMS_KEY"
				}
			},
			"EnableInTransitEncryption": true
			"EnableAtRestEncryption": true
		}
	}
	EOF
}
  1. Use following terraform code to create a new EMR clusters.
    Note: we use EOF in terraform code to pass the json format without escaping it with / or /n /r
resource "aws_emr_cluster" "cluster" {
	name          = "example_name"
	release_label = "EMR_RELEASE_VERSION"
	applications  = ["Spark"]
	
	additional_info = EOF
	{
		"instanceAwsClientConfiguration": {
			"proxyPort": 8099,
			"proxyHost": "my-proxy.example.com"
		}
	}
	EOF

From Command Line

  1. Use below create-security-configuration command to create a security configuration.
    a. For SECURITY_CONFIG_NAME, specify the name of the security configuration. This is the name you specify when you create a cluster that uses this security configuration.
    b. For SEC_CONFIG_DEF, specify an inline JSON structure or the path to a local JSON file, such as file://MySecConfig.json. The JSON parameters define options for Encryption, IAM Roles for EMRFS access to Amazon S3, and Authentication as described in the sections below.
aws emr create-security-configuration --name SECURITY_CONFIG_NAME --security-configuration SEC_CONFIG_DEF
  1. Use below create-cluster command to create a new cluster.
aws emr create-cluster --release-label EMR_RELEASE_VERSION --instance-type VALUE --instance-count VALUE

Note: --instance-count parameter, the cluster consists of a single master node running on the EC2 instance type specified. When used together with --instance-count , one instance is used for the master node, and the remainder are used for the core node type.

References

  1. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-create-security-configuration.html
  2. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-data-encryption-options.html
  3. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_security_configuration
  4. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_cluster
  5. https://awscli.amazonaws.com/v2/documentation/api/latest/reference/emr/create-cluster.html
  6. https://awscli.amazonaws.com/v2/documentation/api/latest/reference/emr/create-security-configuration.html

EMR Cluster

Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Compliance Frameworks

  • AWS CCPA Framework
  • AWS CloudGuard Best Practices
  • AWS CloudGuard SOC2 based on AICPA TSC 2017
  • AWS HITRUST
  • AWS HITRUST v11.0.0
  • AWS ITSG-33
  • AWS MITRE ATT&CK Framework v10
  • AWS MITRE ATT&CK Framework v11.3
  • AWS PCI-DSS 4.0