--- name: dev-cicd description: CI/CD 流水线设计、优化与排查。适配 Gitea Actions + Go/Swift/Next.js/Docker 栈。当用户提到 CI、CD、流水线、pipeline、workflow、构建失败、runner 相关任务时自动激活。 --- # CI/CD 流水线技能 (dev-cicd) ## 概述 管理 Gitea Actions CI/CD 流水线的设计、优化和故障排查。适配技术栈: - **Git**: Gitea (self-hosted, GitHub Actions YAML 兼容) - **Backend**: Go (Gin + GORM) - **iOS**: Swift 6 + SwiftUI + TCA - **Web**: Next.js (React) - **Container**: Docker + Docker Compose - **Registry**: Aliyun ACR - **Runners**: self-hosted (Linux) + macos-arm64 (iOS) --- ## 命令参考 | 命令 | 说明 | |------|------| | `/cicd analyze` | 分析当前 workflow 找优化点 | | `/cicd troubleshoot` | 诊断流水线失败原因 | | `/cicd template [go\|ios\|web\|docker]` | 生成 workflow 模板 | | `/cicd status` | 查看最近 workflow 运行状态 | --- ## 1. Pipeline 设计 ### 1.1 Monorepo 路径过滤 仓库包含多个子项目,用 `paths` 只触发相关构建: ```yaml # .gitea/workflows/ci-cd.yml — Go + Web + Docker on: push: branches: [develop, main] paths: - 'gateway/**' - 'web/**' - 'docker/**' - 'scripts/**' # .gitea/workflows/ios-testflight.yml — iOS 独立 on: push: branches: [develop, main] paths: - 'ios/**' ``` ### 1.2 Pipeline 结构原则 ``` 快速反馈优先: 1. 静态检查 (lint/vet) — 秒级 2. 单元测试 (test) — 1-5 分钟 3. 构建 (build) — 2-10 分钟 4. 集成测试 (可选) — 5-15 分钟 5. 发布 (deploy) — 5-15 分钟 ``` ### 1.3 Go 后端模板 ```yaml jobs: ci: runs-on: self-hosted steps: - name: Checkout run: | cd ${{ github.workspace }} if [ -d .git ]; then git fetch --depth 1 origin ${{ github.ref_name }} git reset --hard origin/${{ github.ref_name }} else git clone --depth 1 --branch ${{ github.ref_name }} \ http://xiaoqu:${{ secrets.REPO_TOKEN }}@localhost:3000//.git . fi - name: Go Vet run: cd gateway && go vet ./... - name: Go Test run: cd gateway && go test ./... -count=1 -timeout 120s - name: Go Build run: cd gateway && go build ./cmd/gateway/ ``` ### 1.4 iOS 模板 ```yaml jobs: ios: runs-on: macos-arm64 if: "!contains(github.event.head_commit.message, '[skip ci]')" steps: - name: Checkout run: git clone --depth 1 --branch ${{ github.ref_name }} . - name: xcodegen run: /opt/homebrew/bin/xcodegen generate working-directory: ios - name: Test run: | set -o pipefail swift test 2>&1 | tee /tmp/test.log | tail -20 working-directory: ios - name: Deploy TestFlight env: KEYCHAIN_PASSWORD: ${{ secrets.KEYCHAIN_PASSWORD }} ASC_KEY_ID: ${{ secrets.ASC_KEY_ID }} ASC_ISSUER_ID: ${{ secrets.ASC_ISSUER_ID }} run: ./scripts/ios-testflight.sh ``` ### 1.5 Web (Next.js) 模板 ```yaml - name: Web Install run: cd web && npm ci --legacy-peer-deps - name: Web Build run: cd web && npm run build - name: Docker Build Web run: | docker build -t $REGISTRY/$WEB_IMAGE:${{ github.sha }} \ -t $REGISTRY/$WEB_IMAGE:latest ./web ``` ### 1.6 单 Job vs 多 Job | 场景 | 选择 | 原因 | |------|------|------| | Runner capacity=1 | 单 Job | 多 Job 串行 + 多次 checkout = 更慢 | | 多 Runner 可用 | 多 Job + needs | 并行加速 | | 不同 OS (Linux+macOS) | 分 Workflow | 不同 runner label | **当前推荐**:Linux runner 单 Job(Go+Web+Docker),macOS runner 单 Job(iOS)。 --- ## 2. 优化 ### 2.1 浅克隆 ```yaml # 首次 clone git clone --depth 1 --branch ${{ github.ref_name }} . # 增量 fetch git fetch --depth 1 origin ${{ github.ref_name }} git reset --hard origin/${{ github.ref_name }} ``` **效果**:仓库含大量二进制文件时,clone 时间从 30s+ 降到 3-5s。 **注意**:需要 push 时先 `git fetch --unshallow`。 ### 2.2 依赖缓存 Gitea Actions 不支持 `actions/cache`,但 self-hosted runner 可利用本地磁盘: ```yaml # Go modules — runner 上全局缓存 env: GOMODCACHE: /opt/runner-cache/go/mod GOCACHE: /opt/runner-cache/go/build # npm — 利用 node_modules 持久化 # self-hosted runner 的 workspace 在两次运行间保留 - run: | if [ -f web/node_modules/.cache-hash ] && \ [ "$(cat web/node_modules/.cache-hash)" = "$(md5sum web/package-lock.json | cut -d' ' -f1)" ]; then echo "npm cache hit, skip install" else cd web && npm ci --legacy-peer-deps md5sum package-lock.json | cut -d' ' -f1 > node_modules/.cache-hash fi # SPM — Xcode 自动缓存到 DerivedData,self-hosted runner 保留 ``` ### 2.3 并发取消 避免同一分支多次 push 排队等待: ```yaml concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true ``` ### 2.4 条件跳过 ```yaml # 跳过 CI Bot 的自动提交 if: "!contains(github.event.head_commit.message, '[skip ci]')" # 只在 develop 分支部署 if: github.ref == 'refs/heads/develop' ``` ### 2.5 构建产物复用 ```yaml # Build once, use in deploy - name: Build run: go build -o /tmp/gateway ./cmd/gateway/ - name: Docker Build run: | # 用已编译的二进制,不在 Docker 内重新编译 cp /tmp/gateway docker/ docker build -f docker/gateway.prebuilt.Dockerfile -t $IMAGE . ``` ### 2.6 Docker Context 瘦身 **问题**:`docker build` 会将整个 context 目录发送到 daemon。缺少 `.dockerignore` 时,`node_modules`(数百 MB)、`.next/`、`.git/` 等全部传入,导致 `transferring context: 768MB` 耗时 30s+。 **诊断**: ```bash # 检查 context 大小(模拟 docker build 发送量) du -sh --exclude=.git # 检查是否有 .dockerignore cat /.dockerignore 2>/dev/null || echo "缺少 .dockerignore!" ``` **`/cicd analyze` 必查项**:对每个有 Dockerfile 的目录检查 `.dockerignore` 是否存在。缺失则告警。 **标准 .dockerignore 模板**: ``` # Node.js node_modules .next .turbo coverage # Common .git .gitignore .env* *.md .vscode .idea ``` **效果**:Web 项目 context 从 768MB → ~10MB,Docker build 加速 10x。 **经验教训**:`.gitignore` 不等于 `.dockerignore`。Git 忽略的文件可能在 runner workspace 中存在(如 self-hosted runner 保留的 `node_modules` 缓存),docker build 会把它们全部打包传入。每个有 Dockerfile 的子目录**必须有 `.dockerignore`**。 --- ## 3. 故障排查 ### 3.1 决策树 ``` Pipeline 失败 ├── Workflow 没触发 │ ├── 检查 paths 过滤 → 改动不在匹配路径下 │ ├── 检查 branch 过滤 → 分支名不匹配 │ ├── 检查 [skip ci] → commit message 含跳过标记 │ └── Runner 离线 → Gitea Admin > Runners 检查状态 │ ├── Checkout 失败 │ ├── "Authentication failed" → REPO_TOKEN secret 过期/无效 │ ├── "Connection refused :3000" → Gitea 服务未运行 │ └── Checkout 很慢 → 加 --depth 1 浅克隆 │ ├── Go 构建失败 │ ├── "module not found" → GOPROXY 设置 / go mod tidy │ ├── "cannot find package" → go.sum 不完整 │ └── "go: version mismatch" → runner 上 Go 版本与 go.mod 不匹配 │ ├── iOS 构建失败 │ ├── "Macro must be enabled" → 加 -skipMacroValidation │ ├── "cannot find type" → xcodegen generate 未运行 │ ├── "errSecInternalComponent" → unlock-keychain + set-key-partition-list │ ├── "No signing certificate" → Xcode > Accounts 登录下载证书 │ ├── "Redundant Binary Upload" → 递增 CURRENT_PROJECT_VERSION │ └── "Missing required icon" → Assets.xcassets 缺 1024x1024 icon │ ├── Docker 构建失败/慢 │ ├── "Cannot connect to daemon" → Docker Desktop 未启动 │ ├── "unauthorized" / "denied" → docker login 凭据过期 或 ACR namespace 缺失 │ ├── "no space left" → docker system prune │ ├── "transferring context: XXX MB" 很慢 → 缺少 .dockerignore(node_modules 被传入) │ ├── build 成功但 push denied → 镜像路径缺 namespace(registry/namespace/image) │ ├── docker compose pull 超时 → 不带参数会拉 Docker Hub 上的 postgres/redis,只拉业务镜像 │ └── docker compose up -d 也会 pull → 加 `--no-deps gateway web` 只重启业务容器 │ └── 部署失败 ├── "Connection refused" (SSH) → 目标服务器 SSH 端口/密钥 ├── "health check failed" → 应用启动慢,增加重试等待 ├── "port already in use" → docker compose down 先停旧容器 ├── "no such service: xxx" → 服务器 compose 与 CI 配置不一致 ├── health check 失败但容器在跑 → curl URL 的端口与实际服务端口不匹配 ├── --no-deps 跳过了 nginx → health check 走 port 80 但 nginx 未启动 ├── gateway 无端口映射 → prod compose 不暴露端口,用 docker exec 检查 └── nginx crash "upstream not allowed" → nginx.conf mount 到 /etc/nginx/nginx.conf 覆盖主配置,改 /etc/nginx/conf.d/default.conf ``` ### 3.2 常见错误速查 | 错误 | 原因 | 修复 | |------|------|------| | `errSecInternalComponent` | SSH 会话无法访问 Keychain | `security unlock-keychain` + `set-key-partition-list` | | `Macro "X" must be enabled` | Swift Macros 安全限制 | `-skipMacroValidation` | | `cannot find type 'Foo'` | xcodeproj 未包含新文件 | `xcodegen generate` | | `Redundant Binary Upload` | build number 重复 | 递增 `CURRENT_PROJECT_VERSION` | | `Cloud signing permission error` | API Key 权限不足或 Issuer ID 错误 | 用手动签名 + 本地 profile | | `HTTP 401 Unauthorized` (ASC API) | JWT 缺少 `kid` header | `headers={"kid": KEY_ID}` | | `No profiles for bundle id` | 无 distribution profile | 在 Apple Developer 创建并安装 | | `transferring context: 768MB` | 缺 .dockerignore | 创建 .dockerignore 排除 node_modules/.next/.git | | `denied: requested access` (push) | ACR 镜像路径缺 namespace | registry/**namespace**/image | | `docker compose pull` 超时 | 拉了 Docker Hub 的 postgres/redis | `docker compose pull gateway web` 只拉业务镜像 | | `docker compose up -d` 也超时 | up 隐含 pull 所有 service | `docker compose up -d --no-deps gateway web` | | health check 失败但容器在跑 | curl URL 端口 ≠ 服务端口 | 检查 nginx(80) vs gateway(8080),直接 `curl :8080/health` | | `--no-deps` 后 nginx 没启动 | nginx 被 no-deps 跳过 | 显式加 `--no-deps gateway web nginx` | | `no such service: xxx` | 服务器 compose 缺 service | SSH 检查实际 compose 文件 | | gateway healthy 但 curl 不通 | prod compose 无端口映射 | `docker exec wget -q -O- localhost:8080/health` | | nginx `upstream not allowed` | nginx.conf mount 到 /etc/nginx/nginx.conf | 改 mount 到 `/etc/nginx/conf.d/default.conf` | | `missing icon file 120x120` | 无 App Icon asset | 创建 Assets.xcassets + AppIcon | | `UIInterfaceOrientation` iPad | 缺 iPad 方向声明 | 四方向 + `UIRequiresFullScreen` | ### 3.3 调试技巧 ```bash # 查看 Gitea runner 状态 curl -s -H "Authorization: token " \ http:///api/v1/repos///actions/runners # 查看最近 workflow 运行 curl -s -H "Authorization: token " \ http:///api/v1/repos///actions/runs?limit=5 # 本地模拟 CI 环境 # Go docker run -v $(pwd):/app -w /app golang:1.25 go build ./cmd/gateway/ # iOS — 只能在 macOS 上 ssh bjwework "cd ~/workspace/xiaoqu-ai/ios && swift test" ``` --- ## 4. 安全 ### 4.1 Secrets 管理 ```bash # 通过 Gitea API 配置 secrets(不要手动编辑 workflow 文件) curl -X PUT -H "Authorization: token " \ -H "Content-Type: application/json" \ "http:///api/v1/repos///actions/secrets/" \ -d '{"data": ""}' ``` **必需 Secrets 清单**: | Secret | 用途 | 轮换周期 | |--------|------|---------| | `REPO_TOKEN` | Git clone 认证 | 按需 | | `ACR_USERNAME` / `ACR_PASSWORD` | Docker 镜像推送 | 90 天 | | `SSH_PRIVATE_KEY` | 服务器部署 | 按需 | | `KEYCHAIN_PASSWORD` | macOS 签名解锁 | 改密码时 | | `ASC_KEY_ID` / `ASC_ISSUER_ID` | App Store Connect | 按需 | | `FEISHU_WEBHOOK` | 通知 | 不过期 | ### 4.2 防泄漏检查清单 - [ ] `.gitignore` 包含 `.env`、`*.p8`、`*.pem`、`*.mobileprovision` - [ ] Workflow 中无硬编码密码/token(全走 `${{ secrets.* }}`) - [ ] 脚本用 `${VAR:?error}` 强制要求环境变量(不用默认值暴露凭据) - [ ] Docker 镜像不包含 `.env` 文件(Dockerfile 有 `.dockerignore`) - [ ] Git remote URL 不含 token(用 secrets 注入) ### 4.3 提交前检查 ```bash # 扫描即将提交的文件是否含密钥 git diff --cached --name-only | xargs grep -lE \ '(PRIVATE KEY|password|secret|token|apikey)' 2>/dev/null ``` --- ## 5. 监控 ### 5.1 查看 Pipeline 状态 ```bash # 最近运行 curl -s -H "Authorization: token " \ "http:///api/v1/repos///actions/runs?limit=5" | \ python3 -c " import json, sys for r in json.load(sys.stdin).get('workflow_runs', []): print(f\"{r['id']} | {r['display_title'][:40]} | {r['status']} | {r['conclusion']}\") " ``` ### 5.2 飞书通知模板 ```yaml # 成功/失败通知(在 workflow 最后一步 if: always()) - name: Notify if: always() run: | STATUS="${{ job.status }}" EMOJI=$([ "$STATUS" = "success" ] && echo "✅" || echo "❌") COLOR=$([ "$STATUS" = "success" ] && echo "green" || echo "red") cat > /tmp/notify.json << EOF { "msg_type": "interactive", "card": { "header": { "title": {"tag": "plain_text", "content": "$EMOJI $STATUS"}, "template": "$COLOR" }, "elements": [{ "tag": "div", "text": {"tag": "lark_md", "content": "**分支**: ${{ github.ref_name }}\n**提交**: ${{ github.sha }}\n**触发**: ${{ github.event.head_commit.message }}"} }] } } EOF curl -s -X POST "${{ secrets.FEISHU_WEBHOOK }}" \ -H "Content-Type: application/json" -d @/tmp/notify.json || true ``` ### 5.3 构建时间追踪 在 workflow 首尾加时间戳: ```yaml steps: - name: Start Timer run: echo "BUILD_START=$(date +%s)" >> $GITHUB_ENV # ... 构建步骤 ... - name: Report Duration if: always() run: | DURATION=$(( $(date +%s) - $BUILD_START )) echo "Build duration: ${DURATION}s" ``` --- ## 6. Runner 管理 ### 6.1 Runner 类型 | Runner | 标签 | 用途 | 位置 | |--------|------|------|------| | xiaoqu-runner | `self-hosted` | Go + Web + Docker | 阿里云 39.104.65.241 | | bjwework-macos | `macos-arm64` | iOS + Swift | Tailscale 100.69.230.116 | ### 6.2 新增 Runner ```bash # 1. 获取注册 token curl -s -H "Authorization: token " \ "http:///api/v1/repos///actions/runners/registration-token" # 2. 注册 ./act_runner register --no-interactive \ --instance http:// \ --token \ --name \ --labels