Skip to content

Cloud Identity and Access Management

Learning Guide: Prompt engineering solves "how to say things clearly," while cloud account permission management solves "who can do what." This chapter revolves around one question: In the cloud world, how do you grant access conveniently without handing the keys to the wrong people?

Before you begin, it's recommended to brush up on two fundamentals:


0. Introduction: Why Do People "Step on Landmines" as Soon as They Get on the Cloud?

🔐IAM vs RAM ComparisonPermission services across cloud providers
AWS IAM
👤User management
👥User groups
🎭Role assumption
📋Permission policies
🔗Identity federation
🔑Access keys
User management
AWS IAM
IAM User supports programmatic and console access
VS
Alibaba Cloud RAM
RAM user has similar capabilities and supports sub-account sign-in
Alibaba Cloud RAM
👤User management
👥User groups
🎭Role assumption
📋Permission policies
🔗Identity federation
🔑Access keys
💡Core idea:IAM and RAM share the same core concepts. Most differences are terminology and implementation details.

Many people encounter similar situations when they first start using cloud services:

  • Hard-coding AccessKeys directly in code and committing them to GitHub for convenience;
  • Giving all employees "admin permissions," only to have someone accidentally delete the production database;
  • After a project handover, having no idea who still has former employees' account credentials;
  • Hearing about enabling MFA but putting it off because it seems "too much trouble."

Intuitively, we might think: "These employees lack security awareness."

But most of the time, the problem isn't the people — it's the failure to establish a proper permission management system.

Problem
  • Context is hard to keep consistent:As conversations grow, earlier and later meaning can drift apart.
  • Key facts are easy to lose:Information given early can be hard to reference accurately later.
  • Call cost keeps rising:Every round has to process a large amount of history again.
Likely causes
  • The model only sees the current call:It can only rely on the context provided in this round.
  • Information is not structured:Important facts and minor details are mixed together, making stable memory hard.
  • History is recomputed repeatedly:Large fixed prefixes are processed again and again across turns.
Impact
  • Answer quality becomes unstable:Longer conversations make consistency and traceability harder.
  • Cost is hard to estimate:Context size fluctuates heavily from turn to turn.
  • Production systems become hard to maintain:Without a clear context strategy, systems are hard to operate and extend.

Faced with these challenges, relying on "being more careful" is no longer enough. We need a systematic approach to permission management — and that's exactly what IAM (Identity and Access Management) sets out to solve.


1. What Is IAM/RAM? Starting with the "Access Control System"

1.1 Analogy: A Company's Smart Access Control

Imagine your company moves into a new office building:

ScenarioWithout IAMWith IAM
New employee onboardingGive them a master key that opens every doorGive them an access card that only opens doors in their work area
Employee departureThe key is just lost, and no one knows who has itImmediately revoke their access card in the system — all doors are locked
ContractorsLend them the key for a few daysIssue a temporary access card, set to expire automatically in 3 days
VisitorsThe front desk hands them a keyIssue a one-time visitor code that only accesses the meeting room

IAM (Identity and Access Management) is like this "smart access control system":

  • Identity: Who? Employee, contractor, visitor, application
  • Access: Which doors can they enter? What operations can they perform?
  • Management: How to issue keys, how to revoke them, how to check records

1.2 AWS IAM vs Alibaba Cloud RAM

🔐IAM vs RAM ComparisonPermission services across cloud providers
AWS IAM
👤User management
👥User groups
🎭Role assumption
📋Permission policies
🔗Identity federation
🔑Access keys
User management
AWS IAM
IAM User supports programmatic and console access
VS
Alibaba Cloud RAM
RAM user has similar capabilities and supports sub-account sign-in
Alibaba Cloud RAM
👤User management
👥User groups
🎭Role assumption
📋Permission policies
🔗Identity federation
🔑Access keys
💡Core idea:IAM and RAM share the same core concepts. Most differences are terminology and implementation details.

Different cloud providers have their own IAM implementations:

Cloud ProviderService NameCore Concepts
AWSIAM (Identity and Access Management)User, Group, Role, Policy
Alibaba CloudRAM (Resource Access Management)User, User Group, Role, Policy
Tencent CloudCAM (Cloud Access Management)User, User Group, Role, Policy
Huawei CloudIAMUser, User Group, Agency, Policy
AzureAzure AD + RBACUser, Group, Role, RBAC

Although the names differ, the core concepts are the same:

  • User: Represents a specific person or application
  • Group: Manages permissions for a batch of users
  • Role: Defines a set of permissions that can be "assumed"
  • Policy: Specific permission rules (what is allowed/denied)

2. Users, Groups, Roles: Which One Should You Use?

2.1 Differences Between the Three "Identities"

🔐Identity Provider IntegrationEnterprise SSO login flow
1Open app
2Redirect to IdP
3User login
4Issue token
5Return to app
6Exchange credentials
7Access resources
User opens enterprise app

The user opens a business system in the browser, and the app detects that no valid session exists.

UserOpen →Enterprise app
💡Core idea:Enterprise IdP centralizes identity management and avoids creating separate accounts on every cloud platform.

Let's use an office scenario as an analogy:

ConceptAnalogyUse CaseCharacteristics
UserFull-time employee with their own desk and access cardLong-term, stable team membersHas permanent credentials (password, AK/SK)
GroupDepartment, like "Engineering" or "Sales"Batch permission managementCannot log in; just a permission container
RoleTemporary visitor pass, contractor temporary cardTemporary authorization, cross-account accessNo permanent credentials; obtains temporary credentials by "assuming"

2.2 Real Case: Permission Evolution at a Startup

Phase 1: Founding Team (2-3 people)

Problem: Using the root account directly to log into the console because it's "easier"
Risk: The root account has all permissions; if compromised, the entire account is ruined

Phase 2: Team Expansion (5-10 people)

Improvement: Create IAM Users for everyone, assign different permissions
Problems:
- Ops engineer Xiao Wang left — where are his AK/SK scattered across servers?
- The new frontend dev needs S3 read-only access, the backend dev needs RDS access — configuring each one manually is too tedious

Phase 3: Standardization (10-30 people)

Improvements:
1. Create IAM Groups by role:
   - Developers: S3, EC2, RDS read/write
   - DevOps: Full permissions, but MFA required
   - ReadOnly: View all resources, cannot modify
   - QAs: Test environment resource access

2. Use IAM Roles:
   - EC2 instances use Instance Profiles — no more storing AK/SK on servers
   - Cross-account access via Role Assume — no shared AK/SK
   - CI/CD uses OIDC Federation — no long-term credential storage

Phase 4: Multi-Account / Enterprise (30+ people)

Architecture:
- Master Account: Only used for billing and organizational management; no resources placed here
- Audit Account: Collects logs from all accounts
- Dev Account: Development environment
- Staging Account: Pre-release/testing environment
- Prod Account: Production environment, strictest permissions

Permission Flow:
- Developers have read-only access to the Dev account by default
- To modify production, submit a ticket to request Assume into a temporary Prod Role
- All Assume operations are logged by CloudTrail for periodic auditing

3. Roles and Policies: The "Soul" of Permission Management

3.1 The Essence of a Role: Trust + Permissions

🎭Roles and PoliciesHow policies combine
🎭
CrossAccountS3AccessRoleCross-account access role
📦S3ReadWritePolicy
Allows3:GetObject
Allows3:PutObject
📊CloudWatchLogsPolicy
🚫DenySensitiveData
💡Core idea:Policy composition means a role can attach multiple policies, and final permissions are the combined result. Deny has higher priority than Allow.

An IAM Role has two core components:

  1. Trust Policy: Who can assume this role?
  2. Permission Policy: What can they do after successfully assuming it?

Using a theater performance analogy:

ConceptAnalogyExplanation
Role"Hamlet" in the scriptDefines what play to perform (permissions)
Trust PolicyThe director saying "who can play Hamlet"Could be "actors from this troupe" (same-account users), "actors borrowed from a neighboring troupe" (cross-account), "guest stars" (external IdP)
Permission PolicyThe script contentWhat Hamlet can do: deliver lines, duel, go mad (specific permissions)
Assume RoleAn actor going on stageXiao Li is chosen to play Hamlet; once on stage, he has all the permissions defined in the script
Temporary CredentialsPerformance passXiao Li gets a "temporary performance pass" that expires after the show

3.2 Policy: The "Grammar" of Permissions

🏛️Permission HierarchyScope differences across permission levels
👑
Root accountHighest account-wide privilege
👤
IAM adminFull IAM access
👥
Regular userLimited permissions
🎭
Temporary roleDefined by policy
🔑
Service accountAPI access
Root account
Scope:Highest account-wide privilege
Scenario:Account owner with all permissions
Full managementBilling managementClose account
💡Core idea:Least privilege means always granting only the minimum permissions required to complete the work.

An IAM Policy is a JSON document that defines "who can do what to which resources."

A Complete Policy Example:

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3ReadWrite",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::my-app-bucket/*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "ap-northeast-1"
        },
        "Bool": {
          "aws:MultiFactorAuthPresent": "true"
        }
      }
    },
    {
      "Sid": "DenySensitiveData",
      "Effect": "Deny",
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::my-app-bucket/sensitive/*"
    }
  ]
}

Key Field Explanations:

FieldMeaningExample
VersionPolicy syntax version"2012-10-17"
StatementArray of permission statements; can contain multiple rules[...]
SidStatement ID, optional, used to identify this rule"AllowS3ReadWrite"
EffectEffect: Allow or Deny"Allow"
ActionAllowed/denied operations; supports wildcards"s3:GetObject", "s3:*"
ResourceTarget resource, identified by ARN"arn:aws:s3:::bucket/*"
ConditionOptional; only takes effect when specific conditions are metRegion restriction, MFA requirement, etc.

3.3 Permission Priority: Deny > Allow > Default Deny

IAM's permission evaluation logic can be summed up in one sentence: Explicit Deny always wins; no Allow means Deny.

The evaluation flow is as follows:

1. First check if there is a Deny policy
   ├─ Has Deny → Denied (regardless of any Allow)
   └─ No Deny → Continue checking

2. Then check if there is an Allow policy
   ├─ Has Allow → Allowed
   └─ No Allow → Denied (default deny principle)

Practical Example: Protecting Sensitive Data

json
// Policy 1: Normal permissions for developers
{
  "Effect": "Allow",
  "Action": ["s3:*"],
  "Resource": "arn:aws:s3:::company-data/*"
}

// Policy 2: Protect sensitive directories (even developers with s3:* cannot access)
{
  "Effect": "Deny",
  "Action": ["s3:*"],
  "Resource": "arn:aws:s3:::company-data/sensitive/*"
}

Key Points:

  • Although developers have s3:* Allow permissions
  • The sensitive directory has an explicit Deny rule
  • Deny takes higher priority, so developers cannot access sensitive data
  • Even if the developer is an admin, this Deny still applies (unless it's the root account)

4. Access Keys (AK/SK): A "Key" That Needs Careful Handling

4.1 What Are AK/SK?

🔑Access Key ManagementAK/SK lifecycle
ActiveCreated 45 days ago
Access Key:AKIAIOSF...
Secret Key:************************************
123456API calls
2 hours agoLast used
💡Core idea:Access key leaks are a common cause of cloud security incidents. Prefer IAM roles, and rotate keys regularly when keys are unavoidable.

Access Keys are long-term credentials provided by cloud services for programmatic API calls. They consist of two parts:

ComponentNamePurposeAnalogy
Access Key IDAccess Key IDIdentifies who you are (like a username)Bank card number
Secret Access KeySecret Access KeyProves you are who you say you are (like a password)Bank card PIN

4.2 Why Are AK/SK "High-Risk Items"?

Real Case: A Startup's Lesson

Xiao Li is a new backend engineer at a startup. In his first week, his task is to debug a file upload feature.

python
# Xiao Li's code (serious security issue!)
import boto3

# Hard-coded AK/SK directly in the code for convenience
s3 = boto3.client(
    's3',
    aws_access_key_id='AKIAIOSFODNN7EXAMPLE',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    region_name='ap-northeast-1'
)

def upload_file(file_path, bucket_name, object_name):
    s3.upload_file(file_path, bucket_name, object_name)
    print(f"File uploaded to s3://{bucket_name}/{object_name}")

# Test upload
upload_file('./test.jpg', 'my-company-bucket', 'uploads/test.jpg')

What Happened a Week Later:

  1. Xiao Li committed the code to GitHub (including AK/SK)
  2. The code on GitHub was scanned by crawlers, and the AK/SK were extracted
  3. Attackers used these credentials to create a large number of EC2 instances in the company account for crypto mining
  4. At the end of the month, the bill arrived: an extra $12,000 in charges
  5. An audit revealed the AK/SK leak, and Xiao Li was called in for a talk...

What Does This Case Teach Us?

Wrong PracticeCorrect Practice
Hard-coding AK/SK in codeUse IAM Roles so the program automatically obtains temporary credentials
Committing AK/SK to a Git repositoryUse .gitignore to exclude config files; use a secrets management service
Using the same AK/SK long-term without rotationRotate AK/SK regularly; use temporary credentials instead of long-term ones
Assigning excessive permissions to AK/SKFollow the principle of least privilege; grant only necessary permissions

4.3 AK/SK Security Best Practices

Scenario 1: Local Development

bash
# Correct approach: Use AWS CLI to configure credentials — don't write them in code
aws configure
# Then enter Access Key ID and Secret Access Key as prompted
# This info is saved in ~/.aws/credentials with permissions set to 600

# No credential configuration needed in code
import boto3
s3 = boto3.client('s3')  # Automatically reads from ~/.aws/credentials

Scenario 2: Servers / EC2

python
# Correct approach: Use IAM Instance Profile
# 1. Create an IAM Role and attach the needed permissions (e.g., S3ReadOnly)
# 2. Create an Instance Profile and associate it with this Role
# 3. When launching EC2, select this Instance Profile

# No credentials needed in code at all
import boto3
s3 = boto3.client('s3')  # Automatically obtains temporary credentials from EC2 metadata service

# Temporary credentials auto-rotate — no need to worry about expiration

Scenario 3: CI/CD Pipelines

yaml
# Correct approach: Use OIDC Federation (OpenID Connect)
# Example with GitHub Actions:

# 1. Create an OIDC Identity Provider in AWS, trusting GitHub
# 2. Create an IAM Role with a trust policy allowing specific GitHub repos to assume it
# 3. Configure in GitHub Actions

name: Deploy
on: [push]

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write # Critical: allows requesting an OIDC token
      contents: read
    steps:
      - uses: actions/checkout@v3

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
          aws-region: ap-northeast-1
          # Note: No Access Key here! Entirely using temporary credentials

      - name: Deploy
        run: aws s3 sync ./build s3://my-bucket/

Summary: AK/SK Usage Security Levels

Security LevelPracticeSuitable ForRisk Level
HighestUse IAM Role (no long-term creds)EC2, Lambda, ECS, CI/CDVery Low
HighUse OIDC FederationGitHub Actions, GitLab CILow
MediumUse secrets management serviceLocal development, small teamsMedium
LowUse environment variablesRapid prototyping, personal projectsHigh
Very LowHard-code in source codeNot recommended for any scenarioVery High

5. Multi-Factor Authentication (MFA): Adding a "Lock" to Your Account

5.1 What Is MFA?

🔐Multi-Factor AuthenticationMFA two-factor authentication flow
🔐Password
📱MFA
Success
Enter password
💡Core idea:Enabling MFA can reduce account takeover risk by 99.9%. Even if the password leaks, an attacker cannot sign in without your MFA device.

MFA (Multi-Factor Authentication), also called 2FA (Two-Factor Authentication), is a security mechanism that requires users to provide two or more different types of authentication factors when logging in:

Factor TypeWhat It IsExamples
Knowledge Factor (something you know)Information only the user knowsPassword, PIN code
Possession Factor (something you have)A physical device the user possessesPhone, hardware key
Inherence Factor (something you are)The user's biological characteristicsFingerprint, facial recognition

5.2 Why Is MFA So Important?

Real Data Tells the Answer:

Attack MethodSuccess Rate Without MFASuccess Rate With MFA
Password guessing / brute forceVery HighExtremely Low (second factor still required)
Phishing attacks to obtain passwordsVery HighExtremely Low (phishing page cannot obtain MFA code)
Password leaks (from other website breaches)Very HighExtremely Low (second factor unknown)

Microsoft Security Report (2020): Enabling MFA can block 99.9% of automated attacks.

5.3 MFA in Practice: Enabling MFA for the AWS Root Account

Step 1: Log into the AWS Console

  1. Log in with your root account email and password
  2. Click your account name in the top-right corner and select "Security Credentials"

Step 2: Enable MFA

  1. Find the "Multi-factor authentication (MFA)" section
  2. Click "Assign MFA device"
  3. Choose the MFA device type ("Authenticator app" recommended)

Step 3: Configure Virtual MFA

  1. Install Google Authenticator or Microsoft Authenticator on your phone
  2. Scan the QR code or manually enter the secret key
  3. Enter the 6-digit code shown in the app (enter two consecutive codes, since the code refreshes every 30 seconds)

Done! Your root account now has MFA protection.


6. Cross-Account Access: How to "Visit" Safely?

6.1 Why Do You Need Cross-Account Access?

As businesses grow, many companies adopt a multi-account architecture to isolate different environments:

Account TypePurposePermission Requirements
Master AccountOrganization management, billingRarely used
Security AuditCentralized log collection from all accountsRead-only access to other accounts
Shared ServicesShared resources (image registries, etc.)Read-only access from other accounts
DevelopmentDevelopment environmentFull access for developers
StagingTesting / pre-release environmentTester permissions
ProductionProduction environmentStrictly limited, requires approval

The Problem: How does the Production account's EC2 pull images from the Shared Services account's registry?

  • Option A: Write AK/SK in Production's user data (Dangerous! AK/SK leakage risk)
  • Option B: Use cross-account Role Assume (Recommended! Temporary credentials, auto-rotation)

6.2 How Cross-Account Role Assume Works

Account A (Production)                    Account B (Shared Services)
    |                                           |
    |  1. Request Assume Role                  |
    |  "I want to assume Account B's          |
    |   ECRReadRole"                            |
    |------------------------------------------>|
    |                                           |
    |                    2. Check Trust Policy  |
    |                    "Can Account A         |
    |                     assume me?"           |
    |                                           |
    |  3. Return temporary credentials         |
    |  AccessKeyId, SecretKey, SessionToken    |
    |<------------------------------------------|
    |                                           |
    |  4. Use temporary credentials            |
    |     to access ECR                         |
    |  docker pull accountB.dkr.ecr...         |

Key Points:

  • Temporary credentials are valid for 1 hour by default, configurable up to 12 hours
  • No need to store any long-term credentials in code
  • Trust policies can restrict who can assume the role (e.g., specific accounts, specific external IDs)

6.3 Hands-On: Configuring Cross-Account ECR Access

Scenario: The Production account's EC2 needs to pull Docker images from the Shared Services account.

Step 1: Create an IAM Role in the Shared Services Account

  1. Log into the Shared Services account's AWS Console
  2. Go to IAM → Roles → Create role
  3. Select "Another AWS account"
  4. Enter the Production account's Account ID
  5. Optional: Check "Require external ID" and enter a random string (adds security)
  6. Attach permission: AmazonEC2ContainerRegistryReadOnly
  7. Name the Role: CrossAccountECRReadRole

Step 2: Get the Role ARN

After creation, copy the Role's ARN:

arn:aws:iam::SHARED_SERVICES_ACCOUNT_ID:role/CrossAccountECRReadRole

Step 3: Configure EC2 Instances in the Production Account

Method A: Use Instance Profile (Recommended)

  1. Create an IAM Role in the Production account (for EC2 use)
  2. Trust policy: Trust the EC2 service
  3. Permission policy: Allow assuming the cross-account Role
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::SHARED_SERVICES_ACCOUNT_ID:role/CrossAccountECRReadRole"
    }
  ]
}
  1. Create an Instance Profile and associate it with this Role
  2. When launching EC2, select this Instance Profile

Method B: Dynamically Assume Role in EC2 User Data

bash
#!/bin/bash
# Install AWS CLI
yum install -y aws-cli

# Assume cross-account Role
CREDS=$(aws sts assume-role \
  --role-arn arn:aws:iam::SHARED_SERVICES_ACCOUNT_ID:role/CrossAccountECRReadRole \
  --role-session-name EC2PullSession)

# Extract temporary credentials
export AWS_ACCESS_KEY_ID=$(echo $CREDS | jq -r '.Credentials.AccessKeyId')
export AWS_SECRET_ACCESS_KEY=$(echo $CREDS | jq -r '.Credentials.SecretAccessKey')
export AWS_SESSION_TOKEN=$(echo $CREDS | jq -r '.Credentials.SessionToken')

# Log in to ECR
aws ecr get-login-password --region ap-northeast-1 | \
  docker login --username AWS --password-stdin SHARED_SERVICES_ACCOUNT_ID.dkr.ecr.ap-northeast-1.amazonaws.com

# Pull the image
docker pull SHARED_SERVICES_ACCOUNT_ID.dkr.ecr.ap-northeast-1.amazonaws.com/my-app:latest

Step 4: Test Cross-Account Access

Run on the Production EC2 instance:

bash
# Test if Assume Role works
aws sts get-caller-identity
# Should show: arn:aws:sts::PRODUCTION_ACCOUNT_ID:assumed-role/CrossAccountECRReadRole/EC2PullSession

# Test if we can list Shared Services ECR repositories
aws ecr describe-repositories --registry-id SHARED_SERVICES_ACCOUNT_ID

Done! Now the Production EC2 can safely pull images from Shared Services without sharing any long-term credentials.


7. Hands-On: Building a Secure Permission System

7.1 Building a Permission Architecture from Scratch

Permission Management Best PracticesApply security controls by priority
👑Protect the root accountP0

The root account owns the cloud service and needs the strongest protection.

✓ Enable MFA✓ Create an IAM admin user✓ Delete root access keys
👤Minimize user permissionsP0
🎭Prefer IAM rolesP1
🔑Manage access keys safelyP1
📊Monitoring and auditingP2
💡Core idea:Start from P0 and improve step by step. Each improvement significantly raises account security.

Suppose you're the tech lead at a 10-person startup and need to design an AWS permission architecture from scratch. Here are the recommended implementation steps:

Phase 1: Root Account Protection (Day 1)

Goal: Protect the root account — this is the most important account

1. Enable root account MFA (mandatory)
   - Hardware MFA recommended (YubiKey), or Google Authenticator

2. Create an IAM admin user
   - Username: admin (or your name)
   - Permissions: AdministratorAccess (but will be tightened later)
   - Enable MFA

3. Delete the root account's Access Keys (if any were created)
   - The root account should never have AK/SK

4. Configure root account usage alerts
   - Use CloudWatch + SNS to send email/SMS whenever the root account logs in

Phase 2: Team Permission Grouping (Week 1)

Goal: Group team members and manage permissions in batches

1. Analyze team roles:
   - Backend developers (2)
   - Frontend developer (1)
   - Mobile developer (1)
   - Product manager (1)
   - Designer (1)
   - Founders / admins (3)

2. Create IAM Groups:

   Group: Developers
   ├── Members: All developers (backend, frontend, mobile)
   ├── Permissions:
   │   ├── EC2: Start, stop, view (but cannot delete others' instances)
   │   ├── S3: Read/write development environment buckets
   │   ├── RDS: Read-only (cannot modify production database)
   │   └── CloudWatch: View logs
   └── Restriction: Can only operate in the ap-northeast-1 region

   Group: ProductTeam
   ├── Members: Product manager, designer
   ├── Permissions:
   │   ├── S3: Read-only (view data files)
   │   ├── CloudWatch Dashboard: View monitoring charts
   │   └── Cost Explorer: View billing (but cannot modify)
   └── Restriction: Read-only; cannot modify any resources

   Group: Administrators
   ├── Members: Founders, tech lead
   ├── Permissions: AdministratorAccess
   └── Requirement: Must use MFA to perform operations

3. Create an IAM User for each person and add them to the corresponding Group
   - Never attach permissions directly to individuals — always manage via Groups
   - Enable MFA (mandatory)

Phase 3: Application-Layer Permission Optimization (Weeks 2-4)

Goal: Let applications access AWS resources securely

1. EC2 instances use Instance Profiles
   - No more configuring AK/SK on servers
   - Create an IAM Role and attach needed permissions (e.g., S3 read/write)
   - Create an Instance Profile and associate it with this Role
   - Select this Instance Profile when launching EC2
   - Application code uses boto3 directly without credential configuration

2. If AK/SK must be used (third-party integrations)
   - Use AWS Secrets Manager to store AK/SK
   - Application reads from Secrets Manager at startup
   - Set up regular rotation (90 days)
   - Monitor AK/SK usage

3. Configure CloudTrail to record all API calls
   - Create a dedicated S3 bucket for log storage
   - Enable log file validation (to prevent tampering)
   - Configure SNS notifications for critical events (e.g., root account usage, policy changes)

Phase 4: Security Hardening (Ongoing)

Goal: Establish continuous security monitoring and improvement mechanisms

1. Enable AWS Config
   - Monitor resource configuration changes
   - Check compliance (e.g., whether security groups have 0.0.0.0/0 open)

2. Enable IAM Access Analyzer
   - Continuously analyze resource policies
   - Identify external access (e.g., whether S3 buckets are public)

3. Regularly review IAM configuration
   - Monthly check for unused IAM Users and Roles
   - Check Access Key usage
   - Verify Group membership is reasonable

4. Establish a security incident response process
   - If AK/SK leak is discovered: Immediately delete, rotate, audit the impact scope
   - If abnormal API calls are detected: Immediately investigate and restrict permissions

8. Common Misconceptions and Pitfall Avoidance Guide

8.1 Top 10 IAM Anti-Patterns

#Anti-PatternWhy It's BadCorrect Practice
1Using the root account for daily operationsRoot account has all permissions; damage cannot be limited if compromisedCreate an IAM admin user; use root account only when necessary
2Giving everyone AdministratorAccessViolates least privilege; increases risk of mistakes and insider threatsGroup by role; grant only necessary permissions
3Hard-coding AK/SK in source codeAK/SK easily leaked via GitHub and hard to rotateUse IAM Roles, environment variables, or secrets management services
4Not rotating AK/SK for long periodsIncreases exposure window after credential leaksSet a 90-day rotation policy, or better — use temporary credentials
5Ignoring MFAAccount is immediately compromised if password is leakedEnable MFA for all IAM users, especially high-privilege users
6Not using CloudTrailCannot audit who did what; impossible to trace incidentsEnable CloudTrail and store logs in a separate audit account
7IAM Policies that are too permissivee.g., Resource: "*", Action: "*" — increases attack surfaceExplicitly specify resource ARNs and specific Actions
8Not cleaning up departed employees' IAM UsersZombie accounts can become backdoorsEstablish an offboarding process; immediately disable and delete IAM Users
9Not using IAM Access AnalyzerCannot discover overly permissive resource policies (e.g., public S3 buckets)Enable IAM Access Analyzer; regularly check for external access
10Not validating Policies in a test environmentApplying Policies directly in production may cause service outagesUse IAM Policy Simulator to test; validate in a test environment first

9. Glossary

English TermChinese TranslationExplanation
IAM (Identity and Access Management)身份与访问管理Cloud service for managing user identities and access permissions
RAM (Resource Access Management)资源访问管理Alibaba Cloud's IAM service name
Root Account根账号The owner account created when registering a cloud account; has the highest privileges
IAM UserIAM 用户/子账号A sub-identity created by the root account for daily operations
IAM RoleIAM 角色A temporary permission carrier with no long-term credentials; needs to be "assumed"
IAM PolicyIAM 策略JSON-formatted permission rule definition
ARN亚马逊资源名称Globally unique resource identifier
AK/SK访问密钥/密钥Credentials for programmatic cloud API access
STS安全令牌服务Service that provides temporary security credentials
MFA多因素认证Authentication method requiring two or more factors
SSO单点登录Authentication method allowing users to access multiple systems with a single login
ExternalId外部 IDSecurity identifier used to prevent confused deputy attacks
CloudTrail云审计服务Logging service that records all API calls and operations in a cloud account

Summary: Core Principles of Cloud Account Permission Management

Cloud account permission management is not a one-time effort — it needs to evolve continuously based on team size and business needs:

  1. Starting Phase (1-10 people):

    • Protect the root account (MFA + don't use the root account for daily operations)
    • Create an IAM admin user
    • Basic grouping (Developers, Admins)
  2. Growth Phase (10-50 people):

    • Refined permission grouping (frontend/backend, ops, product, etc.)
    • Use IAM Roles instead of AK/SK
    • Enable CloudTrail auditing
    • Regular permission reviews
  3. Maturity Phase (50+ people / multi-account):

    • Multi-account architecture (Dev, Staging, Prod separation)
    • Centralized log audit account
    • Automated permission reviews and alerts
    • Well-established permission request and approval workflows

Remember Three Core Principles:

  1. Principle of Least Privilege: Grant only necessary permissions; don't give AdministratorAccess
  2. No Long-Term Credentials: Prefer IAM Roles and temporary credentials to avoid AK/SK leaks
  3. Enable MFA: Especially for root accounts and high-privilege accounts — this is the most effective security measure

Further Reading: