feat: add dev-cicd skill + enhance dev-deploy

新增 dev-cicd(CI/CD 流水线设计/优化/排查):
- Gitea Actions 模板(Go/iOS/Web/Docker)
- Pipeline 优化(浅克隆/缓存/并发取消)
- 故障排查决策树(20+ 常见错误)
- 安全检查清单 + Runner 管理

增强 dev-deploy(部署执行):
- Docker Staging/Production 部署模板
- 部署前健康检查(证书/Docker/磁盘)
- 回滚策略(TestFlight/Docker/数据库)
- 部署监控(Feishu通知/ASC API)

技能总数: 28 (dev 分类: 7)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
dongliang
2026-04-06 11:10:13 +09:30
parent b5f44ac6aa
commit a58dc39795
4 changed files with 961 additions and 9 deletions

View File

@@ -273,19 +273,468 @@ xcodebuild -exportArchive \
---
## Docker 容器部署
## Docker Staging/Production 部署
### Staging自动
### 架构概览
Push 到 `develop` 分支自动触发 staging 部署。
### Production
```bash
./scripts/build-and-push.sh prod --detect --deploy --wait --verify
```
develop push → Build Image → Push ACR → SSH Deploy (staging) → Health Check
main push → Build Image → Push ACR → 人工审批 → SSH Deploy (prod) → Health Check
```
详见项目 `scripts/build-and-push.sh`
| 组件 | 说明 |
|------|------|
| 服务器 | 39.104.87.246(阿里云 ECS |
| Registry | Aliyun ACR: `crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com` |
| 镜像 | `xiaoqu-gateway`, `xiaoqu-web` |
| SSH Key | `~/.ssh/xiaoqu.pem` |
| 部署方式 | Docker Compose |
### 完整部署流程
```
1. 本地构建镜像 → docker build -t <image>:<tag>
2. 推送到 ACR → docker push <registry>/<image>:<tag>
3. SSH 到服务器 → docker compose pull + up -d
4. 健康检查 → curl /health
5. 通知 → 飞书 Webhook 发送部署结果
```
### Staging 部署develop 分支自动触发)
Push 到 `develop` 分支自动触发 staging 部署。流程:
```bash
# 1. 构建镜像tag 用 commit SHA 前 8 位)
TAG=$(git rev-parse --short=8 HEAD)
REGISTRY=crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com
docker build -t $REGISTRY/xiaoqu-gateway:$TAG -f gateway/Dockerfile .
docker build -t $REGISTRY/xiaoqu-web:$TAG -f web/Dockerfile .
# 2. 推送到 ACR
docker push $REGISTRY/xiaoqu-gateway:$TAG
docker push $REGISTRY/xiaoqu-web:$TAG
# 3. SSH 部署
ssh -i ~/.ssh/xiaoqu.pem root@39.104.87.246 "
cd /opt/xiaoqu/staging
export IMAGE_TAG=$TAG
docker compose pull
docker compose up -d
"
# 4. 健康检查
sleep 10
curl -sf http://39.104.87.246:8080/health || echo 'Health check failed!'
```
### Production 部署(手动审批)
Production 部署需要人工确认,不会自动触发:
```bash
# 使用 build-and-push 脚本
./scripts/build-and-push.sh prod --detect --deploy --wait --verify
# 或手动执行:
TAG=v1.2.3 # 使用语义化版本号
REGISTRY=crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com
# 构建 + 推送
docker build -t $REGISTRY/xiaoqu-gateway:$TAG -f gateway/Dockerfile .
docker build -t $REGISTRY/xiaoqu-web:$TAG -f web/Dockerfile .
docker push $REGISTRY/xiaoqu-gateway:$TAG
docker push $REGISTRY/xiaoqu-web:$TAG
# 部署(生产环境目录)
ssh -i ~/.ssh/xiaoqu.pem root@39.104.87.246 "
cd /opt/xiaoqu/production
export IMAGE_TAG=$TAG
docker compose pull
docker compose up -d
"
# 验证
curl -sf http://39.104.87.246/health && echo 'Production deploy OK'
```
### build-and-push 脚本模板
```bash
#!/bin/bash
# scripts/build-and-push.sh
set -euo pipefail
ENV=${1:-staging}
REGISTRY=crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com
SERVER=39.104.87.246
SSH_KEY=~/.ssh/xiaoqu.pem
IMAGES=(xiaoqu-gateway xiaoqu-web)
# 确定 tag
if [ "$ENV" = "prod" ]; then
TAG=${2:-$(git describe --tags --abbrev=0)}
else
TAG=$(git rev-parse --short=8 HEAD)
fi
echo "=== Deploying to $ENV with tag $TAG ==="
# 构建
for img in "${IMAGES[@]}"; do
echo "Building $img..."
docker build -t $REGISTRY/$img:$TAG -f ${img#xiaoqu-}/Dockerfile .
done
# 推送
for img in "${IMAGES[@]}"; do
echo "Pushing $img..."
docker push $REGISTRY/$img:$TAG
done
# 部署
DEPLOY_DIR=/opt/xiaoqu/$ENV
ssh -i $SSH_KEY root@$SERVER "
cd $DEPLOY_DIR
export IMAGE_TAG=$TAG
docker compose pull
docker compose up -d --remove-orphans
"
# 健康检查(重试 3 次)
echo "Waiting for health check..."
for i in 1 2 3; do
sleep 5
if curl -sf http://$SERVER/health > /dev/null 2>&1; then
echo "✓ Health check passed"
exit 0
fi
echo "Attempt $i failed, retrying..."
done
echo "✗ Health check failed after 3 attempts"
exit 1
```
### Docker Compose 示例
```yaml
# docker-compose.yml
version: "3.8"
services:
gateway:
image: crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com/xiaoqu-gateway:${IMAGE_TAG:-latest}
ports:
- "8080:8080"
environment:
- DATABASE_URL=postgres://...
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
web:
image: crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com/xiaoqu-web:${IMAGE_TAG:-latest}
ports:
- "3000:3000"
depends_on:
gateway:
condition: service_healthy
restart: unless-stopped
```
---
## 部署前健康检查
部署前进行预检,避免部署失败浪费时间。
### iOS 预检
```bash
preflight_ios() {
local errors=0
# 检查 Distribution 证书
if ! security find-identity -v -p codesigning | grep -q "Apple Distribution"; then
echo "ERROR: Apple Distribution 证书未安装"
((errors++))
fi
# 检查 Provisioning Profile 有效期
local profile_dir="$HOME/Library/MobileDevice/Provisioning Profiles"
if [ -d "$profile_dir" ]; then
for profile in "$profile_dir"/*.mobileprovision; do
local expiry
expiry=$(security cms -D -i "$profile" 2>/dev/null | plutil -extract ExpirationDate raw - 2>/dev/null)
if [ -n "$expiry" ]; then
local expiry_epoch
expiry_epoch=$(date -j -f "%Y-%m-%dT%H:%M:%SZ" "$expiry" "+%s" 2>/dev/null)
local now_epoch
now_epoch=$(date "+%s")
if [ "$expiry_epoch" -lt "$now_epoch" ]; then
echo "WARNING: Profile 已过期: $(basename "$profile")"
((errors++))
fi
fi
done
else
echo "ERROR: Provisioning Profiles 目录不存在"
((errors++))
fi
# 检查 API Key
if [ ! -f "${API_KEY_PATH:-/dev/null}" ]; then
echo "ERROR: ASC API Key (.p8) 文件不存在: $API_KEY_PATH"
((errors++))
fi
# 检查 Xcode
if ! xcode-select -p > /dev/null 2>&1; then
echo "ERROR: Xcode Command Line Tools 未安装"
((errors++))
fi
if [ $errors -gt 0 ]; then
echo "iOS 预检失败: $errors 个问题"
return 1
fi
echo "iOS 预检通过"
return 0
}
```
### Docker 预检
```bash
preflight_docker() {
local errors=0
# 检查 Docker daemon
if ! docker info > /dev/null 2>&1; then
echo "ERROR: Docker daemon 未运行"
((errors++))
fi
# 检查 ACR registry 可达
local registry=crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com
if ! docker login $registry --username dummy --password dummy 2>&1 | grep -qv "connection refused"; then
# login 会失败但不应该是 connection refused
echo "WARNING: ACR registry 可能不可达(将在 push 时验证)"
fi
# 检查 SSH 连通性
if ! ssh -i ~/.ssh/xiaoqu.pem -o ConnectTimeout=5 -o BatchMode=yes root@39.104.87.246 "echo ok" > /dev/null 2>&1; then
echo "ERROR: 无法 SSH 连接到部署服务器 39.104.87.246"
((errors++))
fi
# 检查服务器磁盘空间
local disk_usage
disk_usage=$(ssh -i ~/.ssh/xiaoqu.pem root@39.104.87.246 "df -h / | tail -1 | awk '{print \$5}' | tr -d '%'" 2>/dev/null)
if [ -n "$disk_usage" ] && [ "$disk_usage" -gt 85 ]; then
echo "WARNING: 服务器磁盘使用率 ${disk_usage}%(建议清理 docker system prune"
fi
# 检查本地磁盘空间
local local_disk
local_disk=$(df -h . | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$local_disk" -gt 90 ]; then
echo "ERROR: 本地磁盘使用率 ${local_disk}%,空间不足"
((errors++))
fi
if [ $errors -gt 0 ]; then
echo "Docker 预检失败: $errors 个问题"
return 1
fi
echo "Docker 预检通过"
return 0
}
```
---
## 回滚策略
### iOS TestFlight 回滚
TestFlight **无法真正回滚**已安装的版本,但有以下应急手段:
| 手段 | 说明 | API |
|------|------|-----|
| 停止分发 | 将 build 从测试中移除,用户不再收到更新 | `PATCH /v1/builds/{id}` 设置 `expired: true` |
| 过期 build | 强制过期有问题的 build | 同上 |
| 紧急热修 | 构建新版本覆盖上线 | 常规部署流程 |
```bash
# 通过 ASC API 停止分发某个 build
curl -X PATCH "https://api.appstoreconnect.apple.com/v1/builds/$BUILD_ID" \
-H "Authorization: Bearer $JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"data":{"type":"builds","id":"'$BUILD_ID'","attributes":{"expired":true}}}'
```
### Docker 回滚
Docker 回滚相对简单,拉取上一个正常版本的镜像重新部署即可:
```bash
# 1. 确定上一个正常的 tag
PREVIOUS_TAG=<previous-good-tag>
REGISTRY=crpi-q4nnuivosic0zc98.cn-beijing.personal.cr.aliyuncs.com
# 2. 在服务器上回滚
ssh -i ~/.ssh/xiaoqu.pem root@39.104.87.246 "
cd /opt/xiaoqu/production # 或 /opt/xiaoqu/staging
export IMAGE_TAG=$PREVIOUS_TAG
docker compose pull
docker compose up -d
"
# 3. 验证回滚成功
curl -sf http://39.104.87.246/health && echo 'Rollback OK'
```
### 数据库回滚注意事项
| 场景 | 策略 |
|------|------|
| 可逆 migration加列、加表 | 部署回滚后数据库无需回滚,旧代码忽略新列 |
| 不可逆 migration删列、改类型 | **必须先回滚 migration 再回滚代码**,否则旧代码报错 |
| 数据 migration | 评估是否需要补偿脚本,建议 migration 前做备份快照 |
```bash
# 数据库 migration 回滚示例(如果使用 golang-migrate
ssh -i ~/.ssh/xiaoqu.pem root@39.104.87.246 "
docker compose exec gateway migrate -path /migrations -database \$DATABASE_URL down 1
"
```
---
## 部署监控
### Post-deploy 健康检查模式
```bash
# 通用部署后验证函数
post_deploy_verify() {
local url=$1
local max_retries=${2:-5}
local interval=${3:-10}
echo "Verifying deployment at $url ..."
for i in $(seq 1 $max_retries); do
local status
status=$(curl -sf -o /dev/null -w "%{http_code}" "$url" 2>/dev/null || echo "000")
if [ "$status" = "200" ]; then
echo "Health check passed (attempt $i/$max_retries)"
return 0
fi
echo "Attempt $i/$max_retries: status=$status, retrying in ${interval}s..."
sleep $interval
done
echo "Health check FAILED after $max_retries attempts"
return 1
}
# 使用示例
post_deploy_verify "http://39.104.87.246/health" 5 10
```
### 飞书通知模板
部署完成后通过飞书 Webhook 发送通知:
```bash
# 部署成功通知
send_feishu_deploy_notification() {
local env=$1 # staging / production
local version=$2 # 版本号或 tag
local status=$3 # success / failure
local detail=$4 # 额外说明
local WEBHOOK_URL="<飞书群 Webhook 地址>"
if [ "$status" = "success" ]; then
local color="green"
local emoji="✅"
else
local color="red"
local emoji="❌"
fi
curl -s -X POST "$WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{
"msg_type": "interactive",
"card": {
"header": {
"title": {"tag": "plain_text", "content": "'"$emoji"' 部署通知 - '"$env"'"},
"template": "'"$color"'"
},
"elements": [
{"tag": "div", "text": {"tag": "lark_md", "content": "**环境**: '"$env"'\n**版本**: '"$version"'\n**状态**: '"$status"'\n**时间**: '"$(date '+%Y-%m-%d %H:%M:%S')"'\n**详情**: '"$detail"'"}}
]
}
}'
}
# 使用示例
send_feishu_deploy_notification "production" "v1.2.3" "success" "Gateway + Web 部署完成"
send_feishu_deploy_notification "staging" "abc12345" "failure" "Health check 超时"
```
### iOS TestFlight 构建状态监控
通过 ASC API 持续监控 build 处理状态:
```bash
# 监控 TestFlight build 处理状态
monitor_testflight_build() {
local build_id=$1
local jwt_token=$2
local max_wait=600 # 最长等待 10 分钟
local elapsed=0
while [ $elapsed -lt $max_wait ]; do
local response
response=$(curl -s "https://api.appstoreconnect.apple.com/v1/builds/$build_id" \
-H "Authorization: Bearer $jwt_token")
local state
state=$(echo "$response" | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['attributes']['processingState'])" 2>/dev/null)
echo "[$(date '+%H:%M:%S')] Build $build_id: $state"
case "$state" in
VALID)
echo "Build 处理完成,可用于测试"
return 0
;;
FAILED|INVALID)
echo "Build 处理失败: $state"
return 1
;;
PROCESSING)
sleep 30
((elapsed+=30))
;;
*)
sleep 15
((elapsed+=15))
;;
esac
done
echo "Build 处理超时(${max_wait}s"
return 1
}
```
---