Skip to content

Infrastructure as Code

Foreword

Have you ever experienced this nightmare: a production server goes down, but nobody remembers how it was originally configured? Manually logging into servers, running commands from memory, praying you don't make a typo — this is the daily reality of traditional operations. Infrastructure as Code (IaC) has completely changed all of this: using code to define and manage infrastructure, making server configuration as version-controllable, reproducible, and auditable as software.

What will you learn from this article?

After completing this chapter, you will gain:

  • Core concepts: Understand what IaC is and why it's the cornerstone of modern operations
  • Workflow understanding: Master Terraform's Write → Plan → Apply → Destroy four-stage workflow
  • Tool selection: Understand the pros and cons of mainstream tools like Terraform, Pulumi, and CloudFormation
  • Risk awareness: Understand the dangers of configuration drift and detection methods
  • Best practices: Master engineering management methods for IaC projects
ChapterContentCore Concepts
Chapter 1IaC ConceptsManual operations vs. code management
Chapter 2Terraform WorkflowWrite → Plan → Apply
Chapter 3Tool ComparisonTerraform, Pulumi, CDK
Chapter 4Configuration DriftDetection, prevention, remediation
Chapter 5Best PracticesModularization, state management, CI/CD

0. Big Picture: Why Does Infrastructure Need "Source Code" Too?

Imagine you're a chef. If you cook every dish by feel — a spoonful of salt today, two spoonfuls tomorrow — the taste will never be consistent. But if you write down the recipe — specifying the exact grams of each seasoning — anyone can reproduce the same taste.

Infrastructure management faces the same problem. A single server's configuration might involve dozens of parameters: operating system, network rules, security groups, storage volumes, environment variables, and more. Manual configuration is not only error-prone, but also irreproducible, unauditable, and irreversible.

Core Values of IaC

  • Reproducible: The same code produces the same result no matter how many times it's executed (idempotency)
  • Version controllable: Infrastructure changes are managed through Git — who changed what and why is clear at a glance
  • Auditable: All changes are recorded, meeting compliance requirements
  • Automatable: Automatic deployment through CI/CD pipelines eliminates human error
  • Collaborative: Team members review infrastructure changes through Pull Requests, just like code reviews

1. IaC Concepts: From "Manual Clicking" to "Code Declarations"

Traditional operations works like this: log into the cloud platform console, manually click to create servers, configure networks, and set up security groups. This approach works when managing a few servers, but becomes a nightmare when scaling to dozens or hundreds.

The core idea of IaC is: use declarative code to describe your desired infrastructure state, and let tools automatically implement it. You don't need to tell the tool "first create a VPC, then create a subnet, then create a security group" (imperative). Instead, you simply say "I want this kind of network environment" (declarative), and the tool automatically calculates the steps needed.

交互演示 ── 手动运维 vs 基础设施即代码
手动运维流程
1
🌐
登录云控制台
需要记住密码
2
🖥️
手动创建服务器
配置可能遗漏
3
🔧
配置安全组规则
容易开放过多端口
4
💾
挂载存储卷
大小可能选错
5
🔗
配置负载均衡
路由规则易出错
6
📋
手动记录到文档
文档很快过时
对比维度手动运维基础设施即代码
可重复性每次操作可能不同代码保证完全一致
速度分钟到小时级秒到分钟级
审计追踪依赖人工记录Git 历史自动记录
协作口头传达、截图Code Review、PR 流程
回滚几乎不可能git revert 一键回滚
DimensionManual OperationsInfrastructure as Code
Operation methodLog into console and clickWrite code files
ReproducibilityRelies on documentation and memoryCode is documentation, 100% reproducible
Change trackingNo records or incomplete recordsGit version control, complete history
Collaboration methodVerbal communication, document passingPull Request reviews
Rollback capabilityManual reverse operationsgit revert + re-apply
ConsistencyLarge differences between environmentsDev/test/production are completely identical

Declarative vs Imperative

  • Declarative: Describe "what I want," and the tool automatically figures out "how to do it." Terraform and CloudFormation use this approach. The advantage is good idempotency; the disadvantage is limited flexibility.
  • Imperative: Describe "how to do it," executing step by step. Ansible and Shell scripts use this approach. The advantage is flexibility; the disadvantage is difficulty ensuring idempotency.
  • Hybrid: Pulumi and AWS CDK are written in general-purpose programming languages, combining declarative state management with imperative flexibility.

2. Terraform Workflow: Write → Plan → Apply

Terraform is currently the most popular IaC tool, developed by HashiCorp. Its workflow is clear and intuitive, divided into four stages, much like software development's "code → review → deploy → cleanup."

交互演示 ── Terraform 工作流四阶段
📝Write
🔍Plan
🚀Apply
🗑️Destroy
📝Write ── 编写基础设施代码

用声明式语言(HCL)描述你期望的基础设施状态。代码就是文档,可以提交到 Git 进行版本管理和 Code Review。

Terminal
📄使用 .tf 文件描述资源
🔧HCL 语法简洁易读
📦支持模块化复用

Four-Stage Workflow

  1. Write: Write infrastructure definition files (.tf) using HCL (HashiCorp Configuration Language). Declare the resources you need: servers, databases, networks, etc.
  2. Plan: Run terraform plan. Terraform compares the current state with the desired state and generates an "execution plan" — telling you what resources it intends to create, modify, or delete. This is a safety net that lets you confirm changes before actually executing.
  3. Apply: After confirming the plan is correct, run terraform apply. Terraform creates or modifies resources according to the plan. After execution, the current state is saved to the state file (terraform.tfstate).
  4. Destroy: When no longer needed, run terraform destroy to clean up all resources and avoid unnecessary costs.
CommandPurposeModifies InfrastructureUse Case
terraform initInitialize project, download providersNoFirst use or adding new providers
terraform planPreview changes, generate execution planNoMust run before every change
terraform applyExecute changes, create/modify resourcesYesExecute after confirming plan
terraform destroyDestroy all resourcesYesClean up test environments, decommission services
terraform stateView/manage state fileDepends on operationState migration, resource import

3. Tool Comparison: Choosing the Right IaC Tool for You

There are multiple tools in the IaC space, each with different focuses. When choosing a tool, consider the team's tech stack, cloud platform, and project scale. There's no "best" tool — only the one most suitable for your scenario.

交互演示 ── 主流 IaC 工具对比
选择要对比的工具(至少选 2 个):
特性🟣Terraform🟠CloudFormation
厂商HashiCorpAWS
配置语言HCLYAML / JSON
声明式/命令式声明式声明式
多云支持原生多云仅 AWS
状态管理State 文件AWS 托管
学习曲线中等中等偏高
社区生态非常活跃AWS 生态
最佳场景多云/混合云纯 AWS 环境
点击下方工具名称查看详细介绍和代码示例
ToolLanguageCloud SupportLearning CurveUse Case
TerraformHCLMulti-cloud (AWS/Azure/GCP)MediumMulti-cloud environments, team collaboration
PulumiPython/TS/GoMulti-cloudLow (familiar programming language)Developer-friendly, complex logic
AWS CloudFormationJSON/YAMLAWS onlyMediumPure AWS environments
AWS CDKPython/TS/JavaAWS onlyLowAWS + programming language preference
AnsibleYAMLMulti-cloud + bare metalLowConfiguration management, hybrid environments

How to Choose?

  • Startups / Single cloud: CloudFormation (AWS) or the cloud platform's native tool for best ecosystem integration
  • Multi-cloud / Mid-to-large teams: Terraform — largest community, most providers, easiest hiring
  • Developer-led teams: Pulumi or CDK — write infrastructure in familiar programming languages with good IDE support
  • Need configuration management: Ansible — excels at server-internal configuration (installing software, modifying config files)

4. Configuration Drift: The Silent Time Bomb

Configuration drift is the most insidious enemy in IaC practice. It refers to the gradual divergence between the actual infrastructure state and the code-defined state.

How does this drift typically occur? Someone makes a "quick fix" for a production issue by directly logging into the console and manually changing a security group rule. Someone temporarily increases a server's configuration for debugging but forgets to change it back. These "small changes" accumulate over time, eventually causing serious misalignment between the code and the actual environment.

交互演示 ── 配置漂移:无声的定时炸弹
初始部署
手动修改
又一次修改
漂移加剧
IaC 检测
期望状态(代码定义)
🖥️
Server-A
Nginx 1.24 | 443 | 2GB
🖥️
Server-B
Nginx 1.24 | 443 | 2GB
🖥️
Server-C
Nginx 1.24 | 443 | 2GB
状态一致
实际状态(线上环境)
🖥️
Server-A
Nginx 1.24 | 443 | 2GB
🖥️
Server-B
Nginx 1.24 | 443 | 2GB
🖥️
Server-C
Nginx 1.24 | 443 | 2GB
第 0 步:通过 IaC 初始部署

团队使用 Terraform 部署了 3 台 Web 服务器,配置完全一致:Nginx 1.24、端口 443、2GB 内存。代码和实际状态完美匹配。

关键教训
🚫禁止手动修改线上环境,所有变更必须通过代码
🔍定期运行 terraform plan 检测漂移
🔒限制生产环境的 SSH 权限,减少人为干预
📋建立变更审批流程(PR → Review → Merge → Apply)

Dangers of Configuration Drift

  1. Irreproducible: The environment described by the code is inconsistent with the actual environment, causing problems when creating new environments
  2. Failed rollbacks: You think rolling back to the previous version will restore everything, but the actual environment has been manually modified
  3. Security risks: Manually opened ports and relaxed permissions may be forgotten, becoming attack vectors
  4. Audit failure: Compliance audits are based on code, but the code doesn't reflect the real state
Prevention MeasureDescription
Prohibit manual changesRestrict console operation permissions through IAM policies
Regular drift detectionPeriodically run terraform plan to check for differences
Auto-remediationAutomatically execute apply to restore consistency when drift is detected
Change auditEnable CloudTrail and other audit logs to track all change sources

5. Best Practices: Making IaC Projects Sustainable

IaC code, just like application code, needs good engineering practices to ensure maintainability. As infrastructure scales up, unstructured IaC code becomes another form of "technical debt."

交互演示 ── IaC 最佳实践
📂
实践一:基础设施代码纳入版本控制
像管理应用代码一样管理基础设施代码
✅ 推荐做法
所有 .tf 文件提交到 Git 仓库
使用分支策略(main / dev / feature)
通过 Pull Request 进行代码审查
在 CI 中自动运行 terraform plan
❌ 反面模式
在本地执行 apply 后不提交代码
直接在 main 分支上修改
将 .tfstate 文件提交到 Git
跳过 Code Review 直接部署
.gitignore 示例
# 忽略本地状态文件
*.tfstate
*.tfstate.backup
.terraform/

# 忽略敏感变量文件
*.tfvars
!example.tfvars
实践成熟度
入门
基础
进阶
成熟
卓越

Six Core Best Practices

  1. Modularization: Abstract reusable infrastructure into modules (like VPC modules, database modules) to avoid copy-pasting. Like writing functions — define once, call everywhere.
  2. Environment isolation: Development, testing, and production use independent state files and variable files, isolated through workspaces or directory structures.
  3. Remote state management: State files (tfstate) are stored in remote backends (S3 + DynamoDB), supporting team collaboration and state locking to prevent concurrent conflicts.
  4. Sensitive information management: Passwords, keys, and other sensitive data should not be written in code. Use tools like Vault or AWS Secrets Manager for management.
  5. CI/CD integration: Integrate terraform plan into the PR process; apply is automatically executed through pipelines, eliminating local manual operations.
  6. Code review: Infrastructure changes need Code Review just like application code, especially changes involving security groups and IAM policies.

Summary

Infrastructure as Code is the cornerstone of modern cloud-native operations. It transforms "indescribable manual operations" into "version-controllable code," turning infrastructure management from "art" into "engineering."

Key takeaways from this chapter:

  1. The essence of IaC: Use code to declare the desired state of infrastructure, and let tools automatically implement it
  2. Terraform workflow: Write → Plan → Apply in three steps, with Plan as the safety net
  3. Tool selection: Multi-cloud → Terraform, single cloud → native tools, developer teams → Pulumi
  4. Configuration drift: The most insidious risk, requiring dual protection through processes and tools
  5. Engineering management: Modularization, environment isolation, remote state, and CI/CD integration are all essential

Further Reading