refactor: 通用技能按类别拆分为独立目录
skills/ → skills-dev(9), skills-req(10), skills-ops(4), skills-integration(8), skills-biz(4), skills-workflow(7) generate-marketplace.py 改为自动扫描所有 skills-* 目录。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
8
skills-ops/ops-tools-plugin/.claude-plugin/plugin.json
Normal file
8
skills-ops/ops-tools-plugin/.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"name": "ops-tools-plugin",
|
||||
"description": "Plugin for ops-tools",
|
||||
"version": "1.0.0",
|
||||
"author": {
|
||||
"name": "qiudl"
|
||||
}
|
||||
}
|
||||
246
skills-ops/ops-tools-plugin/ai-proj-deploy.md
Normal file
246
skills-ops/ops-tools-plugin/ai-proj-deploy.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# AI-Proj 部署指南
|
||||
|
||||
**创建时间**: 2026-01-29 11:50:00 CST
|
||||
**父技能**: ops-tools
|
||||
|
||||
## 环境概览
|
||||
|
||||
| 环境 | 服务器 | 域名 | 镜像标签 |
|
||||
|------|--------|------|----------|
|
||||
| 生产 | tools_ai_proj (152.136.104.251) | https://ai.pipexerp.com | `latest` |
|
||||
| 测试 | singapore (43.134.28.147) | http://staging.ai.pipexerp.com | `test` |
|
||||
|
||||
## 镜像信息
|
||||
|
||||
| 服务 | 镜像 |
|
||||
|------|------|
|
||||
| 后端 | `saltthing123/ai-proj-backend` |
|
||||
| 前端 | `saltthing123/ai-proj-frontend` |
|
||||
|
||||
## 标准部署流程
|
||||
|
||||
### 部署到测试环境
|
||||
|
||||
```bash
|
||||
cd /path/to/new-ai-proj
|
||||
|
||||
# 构建后端 test 镜像
|
||||
docker buildx build --platform linux/amd64 -f backend/Dockerfile --target production \
|
||||
-t saltthing123/ai-proj-backend:test --push backend/
|
||||
|
||||
# 构建前端 test 镜像
|
||||
docker buildx build --platform linux/amd64 -f frontend/Dockerfile.prod --target production \
|
||||
--build-arg REACT_APP_API_URL=https://staging.ai.pipexerp.com/api/v1 \
|
||||
--build-arg REACT_APP_API_BASE_URL=https://staging.ai.pipexerp.com/api/v1 \
|
||||
--build-arg REACT_APP_ENV=staging \
|
||||
-t saltthing123/ai-proj-frontend:test --push frontend/
|
||||
|
||||
# 部署到测试服务器
|
||||
ssh singapore "cd /opt/ai-project-staging && sudo docker-compose pull && sudo docker-compose up -d"
|
||||
```
|
||||
|
||||
### 部署到生产环境
|
||||
|
||||
```bash
|
||||
cd /path/to/new-ai-proj
|
||||
|
||||
# 构建后端 latest 镜像
|
||||
docker buildx build --platform linux/amd64 -f backend/Dockerfile --target production \
|
||||
-t saltthing123/ai-proj-backend:latest --push backend/
|
||||
|
||||
# 构建前端 latest 镜像
|
||||
docker buildx build --platform linux/amd64 -f frontend/Dockerfile.prod --target production \
|
||||
--build-arg REACT_APP_API_URL=https://ai.pipexerp.com/api/v1 \
|
||||
--build-arg REACT_APP_API_BASE_URL=https://ai.pipexerp.com/api/v1 \
|
||||
--build-arg REACT_APP_ENV=production \
|
||||
-t saltthing123/ai-proj-frontend:latest --push frontend/
|
||||
|
||||
# 部署到生产服务器
|
||||
ssh tools_ai_proj "cd /opt/ai-project && \
|
||||
docker compose -f deploy/tencent-cloud/docker-compose.dockerhub.yml pull && \
|
||||
docker compose -f deploy/tencent-cloud/docker-compose.dockerhub.yml up -d"
|
||||
```
|
||||
|
||||
## 新加坡服务器 Build(备选方案)
|
||||
|
||||
本地网络慢时,在新加坡服务器构建:
|
||||
|
||||
```bash
|
||||
# 后端
|
||||
ssh singapore "cd ~/projects/new-ai-proj && git pull && \
|
||||
docker build --platform linux/amd64 -f backend/Dockerfile --target production \
|
||||
-t saltthing123/ai-proj-backend:latest ./backend && \
|
||||
docker push saltthing123/ai-proj-backend:latest"
|
||||
|
||||
# 前端(生产)
|
||||
ssh singapore "cd ~/projects/new-ai-proj && \
|
||||
docker build --platform linux/amd64 -f frontend/Dockerfile.prod --target production \
|
||||
--build-arg REACT_APP_API_URL=https://ai.pipexerp.com/api/v1 \
|
||||
--build-arg REACT_APP_API_BASE_URL=https://ai.pipexerp.com/api/v1 \
|
||||
--build-arg REACT_APP_ENV=production \
|
||||
-t saltthing123/ai-proj-frontend:latest ./frontend && \
|
||||
docker push saltthing123/ai-proj-frontend:latest"
|
||||
```
|
||||
|
||||
## 自动部署(Webhook)
|
||||
|
||||
**状态**: 已启用 (2026-01-16)
|
||||
|
||||
```
|
||||
git push main → Gitea webhook → Jenkins ai-proj → 生产自动部署
|
||||
```
|
||||
|
||||
## 服务管理
|
||||
|
||||
```bash
|
||||
# 查看容器状态
|
||||
ssh tools_ai_proj "docker ps --format 'table {{.Names}}\t{{.Status}}'"
|
||||
|
||||
# 查看日志
|
||||
ssh tools_ai_proj "docker logs -f ai_backend_prod --tail 100"
|
||||
|
||||
# 重启服务
|
||||
ssh tools_ai_proj "docker restart ai_backend_prod"
|
||||
|
||||
# 健康检查
|
||||
curl -s https://ai.pipexerp.com/api/v1/health | jq .
|
||||
```
|
||||
|
||||
## 测试环境管理
|
||||
|
||||
```bash
|
||||
# 查看状态
|
||||
ssh singapore "sudo docker-compose -f /opt/ai-project-staging/docker-compose.yml ps"
|
||||
|
||||
# 查看日志
|
||||
ssh singapore "sudo docker logs -f ai_backend_staging --tail 100"
|
||||
|
||||
# 健康检查
|
||||
ssh singapore "curl -s -H 'Host: staging.ai.pipexerp.com' http://127.0.0.1/api/v1/health"
|
||||
```
|
||||
|
||||
## Docker Volumes 配置
|
||||
|
||||
**重要**: 数据卷必须标记为 `external: true`
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
postgres_prod_data:
|
||||
external: true
|
||||
name: ai-project_postgres_prod_data
|
||||
redis_prod_data:
|
||||
external: true
|
||||
name: ai-project_redis_prod_data
|
||||
```
|
||||
|
||||
## 数据库操作
|
||||
|
||||
```bash
|
||||
# 运行迁移
|
||||
ssh tools_ai_proj "cd /opt/ai-project/backend/migrations && \
|
||||
for file in \$(ls *.sql | grep -v _down.sql | sort); do \
|
||||
docker exec -i ai_postgres_prod psql -U ai_prod_user -d ai_project_prod < \"\$file\"; \
|
||||
done"
|
||||
|
||||
# 备份
|
||||
ssh tools_ai_proj "docker exec ai_postgres_prod pg_dump -U ai_prod_user ai_project_prod > /tmp/backup.sql"
|
||||
```
|
||||
|
||||
## 用户管理
|
||||
|
||||
> **重要**: 密码哈希必须使用 bcrypt **cost 12**,这是后端 `utils/password.go` 中的 `DefaultCost` 值。
|
||||
|
||||
### 创建用户完整流程
|
||||
|
||||
由于 bcrypt 哈希包含 `$` 字符会被 shell 解释,必须使用文件传输方式:
|
||||
|
||||
```bash
|
||||
# 1. 生成密码哈希(在有 Go 环境的机器上执行)
|
||||
cd /path/to/new-ai-proj/backend
|
||||
cat > /tmp/genhash.go << 'EOF'
|
||||
package main
|
||||
import ("fmt"; "golang.org/x/crypto/bcrypt")
|
||||
func main() {
|
||||
hash, _ := bcrypt.GenerateFromPassword([]byte("用户密码"), 12)
|
||||
fmt.Println(string(hash))
|
||||
}
|
||||
EOF
|
||||
HASH=$(go run /tmp/genhash.go)
|
||||
echo "Generated hash: $HASH"
|
||||
|
||||
# 2. 创建 SQL 文件
|
||||
cat > /tmp/create_user.sql << EOF
|
||||
INSERT INTO users (username, email, password_hash, user_type, role, status, created_at, updated_at)
|
||||
VALUES ('newuser', 'newuser@example.com', '$HASH', 'system', 'admin', 'active', NOW(), NOW());
|
||||
EOF
|
||||
|
||||
# 3. 传输并执行
|
||||
scp /tmp/create_user.sql tools_ai_proj:/tmp/
|
||||
ssh tools_ai_proj "docker cp /tmp/create_user.sql ai_postgres_prod:/tmp/ && \
|
||||
docker exec ai_postgres_prod psql -U ai_prod_user -d ai_project_prod -f /tmp/create_user.sql"
|
||||
|
||||
# 4. 验证
|
||||
curl -s -X POST "https://ai.pipexerp.com/api/v1/auth/login" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"newuser","password":"用户密码"}' | jq '.success'
|
||||
```
|
||||
|
||||
### 用户角色
|
||||
|
||||
| user_type | role | 权限 |
|
||||
|-----------|------|------|
|
||||
| system | admin | 系统管理员 |
|
||||
| system | user | 系统用户 |
|
||||
| tenant | admin | 租户管理员 |
|
||||
| tenant | user | 租户用户 |
|
||||
|
||||
### 重置密码
|
||||
|
||||
```bash
|
||||
# 生成新哈希并更新
|
||||
cd /path/to/new-ai-proj/backend
|
||||
HASH=$(go run -e 'package main; import ("fmt";"golang.org/x/crypto/bcrypt"); func main() { h,_:=bcrypt.GenerateFromPassword([]byte("新密码"),12); fmt.Println(string(h)) }' 2>/dev/null || cat > /tmp/h.go << 'E'
|
||||
package main
|
||||
import ("fmt";"golang.org/x/crypto/bcrypt")
|
||||
func main() { h,_:=bcrypt.GenerateFromPassword([]byte("新密码"),12); fmt.Println(string(h)) }
|
||||
E
|
||||
go run /tmp/h.go)
|
||||
|
||||
cat > /tmp/reset.sql << EOF
|
||||
UPDATE users SET password_hash = '$HASH' WHERE username = 'targetuser';
|
||||
EOF
|
||||
|
||||
scp /tmp/reset.sql tools_ai_proj:/tmp/
|
||||
ssh tools_ai_proj "docker cp /tmp/reset.sql ai_postgres_prod:/tmp/ && \
|
||||
docker exec ai_postgres_prod psql -U ai_prod_user -d ai_project_prod -f /tmp/reset.sql"
|
||||
```
|
||||
|
||||
### 常见错误
|
||||
|
||||
| 问题 | 原因 | 解决 |
|
||||
|------|------|------|
|
||||
| 登录失败 | bcrypt cost 不是 12 | 用 Go 重新生成 cost 12 的哈希 |
|
||||
| 哈希被截断 | shell 解释了 $ 符号 | 使用文件传输方式 |
|
||||
|
||||
### 已创建的系统用户
|
||||
|
||||
| 用户名 | 邮箱 | user_type | role | 创建时间 |
|
||||
|--------|------|-----------|------|----------|
|
||||
| qiudl | qiudl@zhiyuncai.com | system | admin | - |
|
||||
| jiaxiang | jiaxiang@joylodging.com | system | admin | 2026-01 |
|
||||
| haiqing | haiqing@joylodging.com | system | admin | 2026-02 |
|
||||
|
||||
> 注意:密码信息不在文档中记录,如需重置请使用上述重置密码流程
|
||||
|
||||
## 前端构建注意事项
|
||||
|
||||
必须同时设置两个 URL 变量:
|
||||
- `REACT_APP_API_URL`
|
||||
- `REACT_APP_API_BASE_URL`
|
||||
|
||||
否则会使用 `.env.production` 中的生产 URL。
|
||||
|
||||
验证镜像中的 URL:
|
||||
```bash
|
||||
docker exec <container> sh -c 'grep -oE "https://[a-zA-Z0-9.-]*pipexerp[a-zA-Z0-9./-]*" /usr/share/nginx/html/static/js/main*.js | sort | uniq -c'
|
||||
```
|
||||
100
skills-ops/ops-tools-plugin/coolbuy-deploy.md
Normal file
100
skills-ops/ops-tools-plugin/coolbuy-deploy.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# Coolbuy-PaaS 部署指南
|
||||
|
||||
**创建时间**: 2026-01-29 11:50:00 CST
|
||||
**父技能**: ops-tools
|
||||
|
||||
## 仓库信息
|
||||
|
||||
| 仓库 | 地址 | 说明 |
|
||||
|------|------|------|
|
||||
| coolbuy-paas | git@gitea.pipexerp.com:pipexerp/coolbuy-paas.git | 租户业务系统 |
|
||||
| coolbuy-platform | git@gitea.pipexerp.com:pipexerp/coolbuy-platform.git | 平台管理端 |
|
||||
| coolbuy-legacy | git@gitea.pipexerp.com:pipexerp/coolbuy-legacy.git | 遗留项目 |
|
||||
|
||||
## 镜像信息
|
||||
|
||||
| 服务 | 镜像 | Dockerfile |
|
||||
|------|------|------------|
|
||||
| Auth | saltthing123/coolbuy-paas-auth | auth-service/Dockerfile |
|
||||
| Foundation | saltthing123/coolbuy-paas-foundation | foundation-service/Dockerfile |
|
||||
| ERP | saltthing123/coolbuy-paas-erp | erp-service/Dockerfile |
|
||||
| Web | saltthing123/coolbuy-paas-web | web/Dockerfile |
|
||||
|
||||
## 生产环境
|
||||
|
||||
| 项目 | 值 |
|
||||
|------|-----|
|
||||
| 服务器 IP | 39.106.88.83 |
|
||||
| 架构 | AMD64 |
|
||||
| 部署目录 | /opt/coolbuy-paas |
|
||||
| Web 端口 | 8888 |
|
||||
|
||||
## 部署流程(本地构建 + Jenkins 部署)
|
||||
|
||||
### 步骤 1: 本地构建并推送
|
||||
|
||||
```bash
|
||||
cd /path/to/coolbuy-paas
|
||||
|
||||
# 构建单个服务(AMD64 架构)
|
||||
docker buildx build --platform linux/amd64 -t saltthing123/coolbuy-paas-web:latest ./web --push
|
||||
|
||||
# 构建所有服务
|
||||
./scripts/build-and-push.sh --push --platform linux/amd64
|
||||
```
|
||||
|
||||
### 步骤 2: 触发 Jenkins 部署
|
||||
|
||||
```bash
|
||||
source ~/.config/devops/credentials.env
|
||||
|
||||
# 部署到生产
|
||||
curl -X POST "$JENKINS_URL/job/coolbuy-paas/buildWithParameters" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" \
|
||||
--data "ACTION=deploy-prod&IMAGE_TAG=latest"
|
||||
|
||||
# 部署到测试
|
||||
curl -X POST "$JENKINS_URL/job/coolbuy-paas/buildWithParameters" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" \
|
||||
--data "ACTION=deploy-test&IMAGE_TAG=latest"
|
||||
```
|
||||
|
||||
### 一键部署命令
|
||||
|
||||
```bash
|
||||
cd /path/to/coolbuy-paas && \
|
||||
docker buildx build --platform linux/amd64 -t saltthing123/coolbuy-paas-web:latest ./web --push && \
|
||||
source ~/.config/devops/credentials.env && \
|
||||
curl -X POST "$JENKINS_URL/job/coolbuy-paas/buildWithParameters" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" \
|
||||
--data "ACTION=deploy-prod&IMAGE_TAG=latest"
|
||||
```
|
||||
|
||||
## 查看构建状态
|
||||
|
||||
```bash
|
||||
source ~/.config/devops/credentials.env
|
||||
|
||||
# 构建状态
|
||||
curl -s "$JENKINS_URL/job/coolbuy-paas/lastBuild/api/json" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" | jq '.result, .building'
|
||||
|
||||
# 构建日志
|
||||
curl -s "$JENKINS_URL/job/coolbuy-paas/lastBuild/consoleText" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" | tail -50
|
||||
```
|
||||
|
||||
## 检查镜像架构
|
||||
|
||||
```bash
|
||||
# 本地镜像
|
||||
docker inspect saltthing123/coolbuy-paas-web:latest | grep Architecture
|
||||
|
||||
# DockerHub 镜像
|
||||
docker manifest inspect saltthing123/coolbuy-paas-web:latest | grep architecture
|
||||
```
|
||||
|
||||
## 重要提醒
|
||||
|
||||
- 生产服务器为 AMD64 架构,必须使用 `--platform linux/amd64`
|
||||
- 禁止在 Jenkins 服务器构建镜像,所有镜像本地构建后推送到 DockerHub
|
||||
837
skills-ops/ops-tools-plugin/db-backup.md
Normal file
837
skills-ops/ops-tools-plugin/db-backup.md
Normal file
@@ -0,0 +1,837 @@
|
||||
# 数据库备份与恢复 Skill
|
||||
|
||||
**父技能**: ops-tools
|
||||
**适用范围**: 全局(所有项目数据库)
|
||||
**创建时间**: 2026-01-15 07:30:00 ACDT
|
||||
**最后更新**: 2026-02-02
|
||||
|
||||
---
|
||||
|
||||
## 技能概述
|
||||
|
||||
全局数据库备份技能,适用于所有项目的 PostgreSQL 数据库。涵盖迁移前备份、自动备份、数据恢复和灾难恢复策略。
|
||||
|
||||
**核心原则**:
|
||||
- ⚠️ **任何数据库迁移操作前必须先备份**
|
||||
- 保留策略:最近 7 天 + 每月 1 个永久备份
|
||||
- 存储位置:服务器本地 `/backup/` 目录
|
||||
|
||||
---
|
||||
|
||||
## 数据库清单
|
||||
|
||||
| 数据库 | 服务器 | 容器 | 用途 | 备份路径 |
|
||||
|--------|--------|------|------|----------|
|
||||
| ai_project_prod | tools_ai_proj | ai_postgres_prod | AI-Proj 生产 | /backup/ai-project/database/ |
|
||||
| ai_project_staging | singapore | ai_postgres_staging | AI-Proj 测试 | /backup/ai-project-staging/ |
|
||||
| coolbuy_prod | coolbuy-dev | postgres | Coolbuy 3.0 | /backup/coolbuy/ |
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 迁移前快速备份(必读)
|
||||
|
||||
> **重要**:执行任何 `UPDATE`、`DELETE`、`ALTER`、数据迁移等操作前,**必须先执行备份**。
|
||||
|
||||
### 一键备份命令
|
||||
|
||||
```bash
|
||||
# AI-Proj 生产数据库 - 迁移前备份
|
||||
ssh tools_ai_proj 'REASON="pre_migration_$(date +%Y%m%d_%H%M%S)" && \
|
||||
docker exec ai_postgres_prod pg_dump -U ai_prod_user -Fc ai_project_prod \
|
||||
> /backup/ai-project/database/ai_project_${REASON}.dump && \
|
||||
echo "✓ 备份完成: /backup/ai-project/database/ai_project_${REASON}.dump"'
|
||||
|
||||
# AI-Proj 测试数据库 - 迁移前备份
|
||||
ssh singapore 'REASON="pre_migration_$(date +%Y%m%d_%H%M%S)" && \
|
||||
sudo docker exec ai_postgres_staging pg_dump -U ai_staging_user -Fc ai_project_staging \
|
||||
> /backup/ai-project-staging/ai_project_staging_${REASON}.dump && \
|
||||
echo "✓ 备份完成"'
|
||||
```
|
||||
|
||||
### 带原因的备份(推荐)
|
||||
|
||||
```bash
|
||||
# 指定备份原因,方便追溯
|
||||
ssh tools_ai_proj 'REASON="migrate_project_165_to_167" && \
|
||||
docker exec ai_postgres_prod pg_dump -U ai_prod_user -Fc ai_project_prod \
|
||||
> /backup/ai-project/database/ai_project_$(date +%Y%m%d_%H%M%S)_${REASON}.dump && \
|
||||
ls -lh /backup/ai-project/database/ | tail -3'
|
||||
```
|
||||
|
||||
### 备份后验证
|
||||
|
||||
```bash
|
||||
# 验证备份文件
|
||||
ssh tools_ai_proj 'ls -lh /backup/ai-project/database/ | tail -5'
|
||||
|
||||
# 检查备份文件大小(应该 > 10MB)
|
||||
ssh tools_ai_proj 'stat --printf="%s bytes\n" /backup/ai-project/database/ai_project_*.dump | tail -1'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 快速恢复命令
|
||||
|
||||
### 从最新备份恢复
|
||||
|
||||
```bash
|
||||
# 1. 找到最新备份
|
||||
ssh tools_ai_proj 'ls -lt /backup/ai-project/database/*.dump | head -3'
|
||||
|
||||
# 2. 恢复(使用 pg_restore)
|
||||
ssh tools_ai_proj 'BACKUP_FILE="/backup/ai-project/database/ai_project_XXXXXXXX.dump" && \
|
||||
docker stop ai_backend_prod && \
|
||||
docker exec ai_postgres_prod pg_restore -U ai_prod_user -d ai_project_prod --clean --if-exists -Fc "$BACKUP_FILE" && \
|
||||
docker start ai_backend_prod && \
|
||||
echo "✓ 恢复完成"'
|
||||
|
||||
# 3. 验证
|
||||
curl -s https://ai.pipexerp.com/api/v1/health | jq .
|
||||
```
|
||||
|
||||
### 恢复到特定时间点
|
||||
|
||||
```bash
|
||||
# 列出所有备份,找到目标时间点
|
||||
ssh tools_ai_proj 'ls -lht /backup/ai-project/database/*.dump'
|
||||
|
||||
# 恢复指定备份
|
||||
ssh tools_ai_proj 'docker exec ai_postgres_prod pg_restore \
|
||||
-U ai_prod_user -d ai_project_prod --clean --if-exists -Fc \
|
||||
/backup/ai-project/database/ai_project_20260202_180000_migrate_project_165_to_167.dump'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 保留策略
|
||||
|
||||
### 策略说明
|
||||
|
||||
| 类型 | 保留时间 | 清理规则 |
|
||||
|------|----------|----------|
|
||||
| 每日备份 | 7 天 | 超过 7 天自动删除 |
|
||||
| 月度备份 | 永久 | 每月 1 号的备份永久保留 |
|
||||
| 迁移前备份 | 30 天 | 带 `pre_migration` 标记的保留 30 天 |
|
||||
|
||||
### 自动清理脚本
|
||||
|
||||
```bash
|
||||
# /opt/scripts/cleanup-backups.sh
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="/backup/ai-project/database"
|
||||
|
||||
# 删除超过 7 天的每日备份(保留月度备份)
|
||||
find "$BACKUP_DIR" -name "*.dump" -mtime +7 ! -name "*_01_*" -delete
|
||||
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +7 ! -name "*_01_*" -delete
|
||||
|
||||
# 删除超过 30 天的迁移前备份
|
||||
find "$BACKUP_DIR" -name "*pre_migration*" -mtime +30 -delete
|
||||
|
||||
echo "$(date): Cleanup completed" >> /var/log/backup-cleanup.log
|
||||
```
|
||||
|
||||
### Cron 配置
|
||||
|
||||
```cron
|
||||
# 每天凌晨 3 点清理旧备份
|
||||
0 3 * * * /opt/scripts/cleanup-backups.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 快速参考
|
||||
|
||||
| 操作 | 命令 |
|
||||
|------|------|
|
||||
| 手动执行备份 | `ssh tools_ai_proj "/opt/ai-project/deploy/scripts/backup-database.sh"` |
|
||||
| 查看本地备份 | `ssh tools_ai_proj "ls -lh /backup/ai-project/database/"` |
|
||||
| 查看备份日志 | `ssh tools_ai_proj "tail -f /var/log/ai-project-backup.log"` |
|
||||
| 触发 OSS 同步 | `ssh tools_ai_proj "/opt/ai-project/deploy/scripts/backup-to-oss.sh"` |
|
||||
| 列出 OSS 备份 | `ssh tools_ai_proj "ossutil ls oss://fnos2026/ai-project/backups/ --config-file ~/.ossutilconfig"` |
|
||||
| 下载最新备份 | `ssh tools_ai_proj "ossutil cp oss://fnos2026/ai-project/backups/latest.sql.gz /tmp/ --config-file ~/.ossutilconfig"` |
|
||||
| 验证备份完整性 | `ssh tools_ai_proj "gzip -t /backup/ai-project/database/latest.sql.gz"` |
|
||||
|
||||
---
|
||||
|
||||
## 备份架构
|
||||
|
||||
### 双层备份策略
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ AI-Proj 生产服务器 │
|
||||
│ (tools_ai_proj: 152.136.104.251) │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ PostgreSQL 数据库 (ai_postgres_prod) │
|
||||
│ │ │
|
||||
│ │ 每天 02:00 (Cron) │
|
||||
│ ▼ │
|
||||
│ 本地备份 (/backup/ai-project/database/) │
|
||||
│ │ - gzip 压缩 │
|
||||
│ │ - 30 天保留 │
|
||||
│ │ - 完整性验证 │
|
||||
│ │ - 符号链接 (latest.sql.gz) │
|
||||
│ │ │
|
||||
│ │ 每天 02:30 (Cron) │
|
||||
│ ▼ │
|
||||
│ OSS 同步 (backup-to-oss.sh) │
|
||||
│ │ │
|
||||
└────────────┼─────────────────────────────────────────────┘
|
||||
│
|
||||
│ 互联网 (623 KB/s)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 阿里云对象存储 (OSS) │
|
||||
│ 北京区域 │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ Bucket: fnos2026 │
|
||||
│ 路径: /ai-project/backups/ │
|
||||
│ │
|
||||
│ ├── YYYYMMDD/ │
|
||||
│ │ └── ai_project_YYYYMMDD_HHMMSS.sql.gz │
|
||||
│ └── latest.sql.gz (最新备份) │
|
||||
│ │
|
||||
│ ✅ 异地容灾 (99.9% 可用性) │
|
||||
│ ✅ 30 天自动清理 │
|
||||
│ ✅ 成本: ~¥0.25/月 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 备份时间表
|
||||
|
||||
| 时间 | 操作 | 脚本 | 日志文件 |
|
||||
|------|------|------|----------|
|
||||
| 02:00 | 本地数据库备份 | `/opt/ai-project/deploy/scripts/backup-database.sh` | `/var/log/ai-project-backup.log` |
|
||||
| 02:30 | OSS 异地同步 | `/opt/ai-project/deploy/scripts/backup-to-oss.sh` | `/var/log/ai-project-oss-sync.log` |
|
||||
|
||||
---
|
||||
|
||||
## 自动备份配置
|
||||
|
||||
### 本地备份
|
||||
|
||||
**脚本位置**: `/opt/ai-project/deploy/scripts/backup-database.sh`
|
||||
|
||||
**功能特性**:
|
||||
- ✅ PostgreSQL pg_dump 完整备份
|
||||
- ✅ gzip 压缩
|
||||
- ✅ 按日期目录组织
|
||||
- ✅ 30 天自动清理
|
||||
- ✅ 备份完整性验证
|
||||
- ✅ 符号链接指向最新备份
|
||||
|
||||
**Cron 配置**:
|
||||
```cron
|
||||
0 2 * * * /opt/ai-project/deploy/scripts/backup-database.sh >> /var/log/ai-project-backup.log 2>&1
|
||||
```
|
||||
|
||||
**手动执行**:
|
||||
```bash
|
||||
ssh tools_ai_proj "/opt/ai-project/deploy/scripts/backup-database.sh"
|
||||
```
|
||||
|
||||
**查看日志**:
|
||||
```bash
|
||||
ssh tools_ai_proj "tail -f /var/log/ai-project-backup.log"
|
||||
```
|
||||
|
||||
**备份目录结构**:
|
||||
```
|
||||
/backup/ai-project/database/
|
||||
├── 20260115/
|
||||
│ ├── ai_project_20260115_020001.sql.gz (13M)
|
||||
│ └── ai_project_20260115_120000.sql.gz (13M)
|
||||
├── 20260116/
|
||||
│ └── ai_project_20260116_020001.sql.gz (13M)
|
||||
└── latest.sql.gz -> 20260116/ai_project_20260116_020001.sql.gz
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 阿里云 OSS 异地备份
|
||||
|
||||
**配置时间**: 2026-01-15 01:12:00 CST
|
||||
**首次同步**: 2026-01-15 02:30:01 CST
|
||||
|
||||
### OSS 配置信息
|
||||
|
||||
| 配置项 | 值 |
|
||||
|--------|-----|
|
||||
| Endpoint | oss-cn-beijing.aliyuncs.com |
|
||||
| Bucket | fnos2026 |
|
||||
| 存储路径 | oss://fnos2026/ai-project/backups/ |
|
||||
| 保留策略 | 30 天自动清理 |
|
||||
| 预计成本 | ~¥0.25/月 |
|
||||
|
||||
### 凭据配置
|
||||
|
||||
**存储位置**: `~/.config/devops/credentials.env` (权限 600)
|
||||
|
||||
```bash
|
||||
OSS_ENDPOINT="oss-cn-beijing.aliyuncs.com"
|
||||
OSS_BUCKET="fnos2026"
|
||||
OSS_ACCESS_KEY_ID="LTAI5tEARCztp3Bj3FUYd9rh"
|
||||
OSS_ACCESS_KEY_SECRET="RSvwURFo2cgF1krSgeriyrAUIqQyGE"
|
||||
```
|
||||
|
||||
**加载凭据**:
|
||||
```bash
|
||||
source ~/.config/devops/credentials.env
|
||||
```
|
||||
|
||||
### ossutil 工具
|
||||
|
||||
**版本**: v1.7.15
|
||||
**安装位置**: `/usr/local/bin/ossutil`
|
||||
**安装时间**: 2026-01-15 00:45:00 CST
|
||||
|
||||
**安装步骤**:
|
||||
```bash
|
||||
wget https://gosspublic.alicdn.com/ossutil/1.7.15/ossutil64
|
||||
sudo mv ossutil64 /usr/local/bin/ossutil
|
||||
sudo chmod +x /usr/local/bin/ossutil
|
||||
```
|
||||
|
||||
**配置**:
|
||||
```bash
|
||||
source ~/.config/devops/credentials.env
|
||||
ossutil config -e ${OSS_ENDPOINT} \
|
||||
-i ${OSS_ACCESS_KEY_ID} \
|
||||
-k ${OSS_ACCESS_KEY_SECRET} \
|
||||
-L CH \
|
||||
--config-file ~/.ossutilconfig
|
||||
```
|
||||
|
||||
**测试连接**:
|
||||
```bash
|
||||
ossutil ls oss://${OSS_BUCKET}/
|
||||
```
|
||||
|
||||
### 自动同步脚本
|
||||
|
||||
**脚本位置**: `/opt/ai-project/deploy/scripts/backup-to-oss.sh`
|
||||
|
||||
**功能特性**:
|
||||
- ✅ 同步当天备份目录到 OSS
|
||||
- ✅ 上传 latest.sql.gz
|
||||
- ✅ 自动清理 30 天前的旧备份
|
||||
- ✅ 备份统计报告
|
||||
- ✅ 彩色日志输出
|
||||
|
||||
**Cron 配置**:
|
||||
```cron
|
||||
30 2 * * * /opt/ai-project/deploy/scripts/backup-to-oss.sh >> /var/log/ai-project-oss-sync.log 2>&1
|
||||
```
|
||||
|
||||
**手动执行**:
|
||||
```bash
|
||||
ssh tools_ai_proj "/opt/ai-project/deploy/scripts/backup-to-oss.sh"
|
||||
```
|
||||
|
||||
**查看日志**:
|
||||
```bash
|
||||
ssh tools_ai_proj "tail -f /var/log/ai-project-oss-sync.log"
|
||||
```
|
||||
|
||||
### 常用 OSS 操作
|
||||
|
||||
```bash
|
||||
# 加载凭据
|
||||
source ~/.config/devops/credentials.env
|
||||
|
||||
# 列出所有备份文件
|
||||
ssh tools_ai_proj "ossutil ls oss://${OSS_BUCKET}/ai-project/backups/ -r --config-file ~/.ossutilconfig"
|
||||
|
||||
# 查看备份统计
|
||||
ssh tools_ai_proj "ossutil du oss://${OSS_BUCKET}/ai-project/backups/ --config-file ~/.ossutilconfig"
|
||||
|
||||
# 下载特定日期的备份
|
||||
ssh tools_ai_proj "ossutil cp oss://${OSS_BUCKET}/ai-project/backups/20260115/ai_project_20260115_*.sql.gz /tmp/ --config-file ~/.ossutilconfig"
|
||||
|
||||
# 下载最新备份
|
||||
ssh tools_ai_proj "ossutil cp oss://${OSS_BUCKET}/ai-project/backups/latest.sql.gz /tmp/ --config-file ~/.ossutilconfig"
|
||||
|
||||
# 查看备份文件详情
|
||||
ssh tools_ai_proj "ossutil stat oss://${OSS_BUCKET}/ai-project/backups/latest.sql.gz --config-file ~/.ossutilconfig"
|
||||
|
||||
# 手动清理特定日期的备份
|
||||
ssh tools_ai_proj "ossutil rm oss://${OSS_BUCKET}/ai-project/backups/20260101/ -r -f --config-file ~/.ossutilconfig"
|
||||
```
|
||||
|
||||
### 备份验证
|
||||
|
||||
```bash
|
||||
# 验证最新备份是否上传成功
|
||||
ssh tools_ai_proj "ossutil stat oss://${OSS_BUCKET}/ai-project/backups/latest.sql.gz --config-file ~/.ossutilconfig"
|
||||
|
||||
# 下载并测试备份完整性
|
||||
ssh tools_ai_proj "
|
||||
ossutil cp oss://${OSS_BUCKET}/ai-project/backups/latest.sql.gz /tmp/test_restore.sql.gz --config-file ~/.ossutilconfig
|
||||
gzip -t /tmp/test_restore.sql.gz && echo '✓ 备份文件完整' || echo '✗ 备份文件损坏'
|
||||
rm /tmp/test_restore.sql.gz
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 手动备份
|
||||
|
||||
### 完整备份
|
||||
|
||||
```bash
|
||||
# 连接到生产服务器
|
||||
ssh tools_ai_proj
|
||||
|
||||
# 导出数据库
|
||||
docker exec ai_postgres_prod pg_dump -U ai_prod_user ai_project_prod \
|
||||
--no-owner --no-acl --clean --if-exists \
|
||||
> /tmp/ai_project_backup_$(date +%Y%m%d_%H%M%S).sql
|
||||
|
||||
# 压缩备份
|
||||
gzip /tmp/ai_project_backup_*.sql
|
||||
|
||||
# 验证备份完整性
|
||||
gzip -t /tmp/ai_project_backup_*.sql.gz
|
||||
```
|
||||
|
||||
### 下载到本地
|
||||
|
||||
**直接下载** (如果网络良好):
|
||||
```bash
|
||||
scp tools_ai_proj:/tmp/ai_project_backup_*.sql.gz /tmp/
|
||||
```
|
||||
|
||||
**通过跳板机优化传输** (高延迟环境):
|
||||
```bash
|
||||
# 使用新加坡跳板机中转(澳洲 → 新加坡 → 腾讯云)
|
||||
scp tools_ai_proj:/tmp/ai_project_backup_*.sql.gz singapore:/tmp/
|
||||
scp singapore:/tmp/ai_project_backup_*.sql.gz /tmp/
|
||||
|
||||
# 清理跳板机临时文件
|
||||
ssh singapore "rm /tmp/ai_project_backup_*.sql.gz"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 数据库恢复
|
||||
|
||||
### 场景 1: 从 OSS 备份恢复(推荐)
|
||||
|
||||
```bash
|
||||
# 1. 从 OSS 下载最新备份
|
||||
ssh tools_ai_proj "
|
||||
source ~/.config/devops/credentials.env
|
||||
ossutil cp oss://fnos2026/ai-project/backups/latest.sql.gz /tmp/restore.sql.gz --config-file ~/.ossutilconfig -f
|
||||
"
|
||||
|
||||
# 2. 验证文件完整性
|
||||
ssh tools_ai_proj "gzip -t /tmp/restore.sql.gz"
|
||||
|
||||
# 3. 停止后端服务
|
||||
ssh tools_ai_proj "docker stop ai_backend_prod"
|
||||
|
||||
# 4. 恢复数据库
|
||||
ssh tools_ai_proj "
|
||||
gunzip -c /tmp/restore.sql.gz | \
|
||||
docker exec -i ai_postgres_prod psql -U ai_prod_user ai_project_prod
|
||||
"
|
||||
|
||||
# 5. 启动后端服务
|
||||
ssh tools_ai_proj "docker start ai_backend_prod"
|
||||
|
||||
# 6. 验证服务
|
||||
curl -s https://ai.pipexerp.com/api/v1/health | jq .
|
||||
|
||||
# 7. 清理临时文件
|
||||
ssh tools_ai_proj "rm /tmp/restore.sql.gz"
|
||||
```
|
||||
|
||||
### 场景 2: 从本地备份恢复
|
||||
|
||||
```bash
|
||||
ssh tools_ai_proj
|
||||
|
||||
# 停止后端服务
|
||||
docker stop ai_backend_prod
|
||||
|
||||
# 恢复数据库
|
||||
gunzip -c /backup/ai-project/database/latest.sql.gz | \
|
||||
docker exec -i ai_postgres_prod psql -U ai_prod_user ai_project_prod
|
||||
|
||||
# 启动后端服务
|
||||
docker start ai_backend_prod
|
||||
```
|
||||
|
||||
### 场景 3: 从本地开发环境恢复到生产(完整重建)
|
||||
|
||||
```bash
|
||||
# 1. 本地导出
|
||||
pg_dump -U donglinlai ai_project_local \
|
||||
--no-owner --no-acl --clean --if-exists \
|
||||
--exclude-table=audit_logs \
|
||||
> /tmp/ai_project_clean.sql
|
||||
|
||||
# 2. 压缩
|
||||
gzip /tmp/ai_project_clean.sql
|
||||
|
||||
# 3. 通过新加坡跳板机传输(优化高延迟)
|
||||
scp /tmp/ai_project_clean.sql.gz singapore:/tmp/
|
||||
ssh singapore "scp /tmp/ai_project_clean.sql.gz tools_ai_proj:/tmp/"
|
||||
|
||||
# 4. 生产环境恢复
|
||||
ssh tools_ai_proj
|
||||
|
||||
# 停止后端服务
|
||||
docker stop ai_backend_prod
|
||||
|
||||
# 完全重建数据库(避免依赖冲突)
|
||||
docker exec ai_postgres_prod psql -U ai_prod_user postgres \
|
||||
-c 'DROP DATABASE IF EXISTS ai_project_prod;'
|
||||
|
||||
docker exec ai_postgres_prod psql -U ai_prod_user postgres \
|
||||
-c 'CREATE DATABASE ai_project_prod OWNER ai_prod_user;'
|
||||
|
||||
# 恢复数据
|
||||
gunzip -c /tmp/ai_project_clean.sql.gz | \
|
||||
docker exec -i ai_postgres_prod psql -U ai_prod_user ai_project_prod
|
||||
|
||||
# 创建可能缺失的表
|
||||
docker exec -i ai_postgres_prod psql -U ai_prod_user ai_project_prod << 'EOF'
|
||||
CREATE TABLE IF NOT EXISTS audit_logs (
|
||||
id SERIAL PRIMARY KEY,
|
||||
user_id INTEGER,
|
||||
action VARCHAR(100),
|
||||
resource_type VARCHAR(100),
|
||||
resource_id VARCHAR(100),
|
||||
details TEXT,
|
||||
ip_address VARCHAR(50),
|
||||
user_agent TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
EOF
|
||||
|
||||
# 运行数据库迁移(如果有)
|
||||
cd /opt/ai-project/backend/migrations
|
||||
for file in $(ls *.sql | grep -v _down.sql | sort); do
|
||||
echo "Running migration: $file"
|
||||
docker exec -i ai_postgres_prod psql -U ai_prod_user ai_project_prod < "$file"
|
||||
done
|
||||
|
||||
# 启动后端服务
|
||||
docker start ai_backend_prod
|
||||
|
||||
# 清理临时文件
|
||||
rm /tmp/ai_project_clean.sql.gz
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 最佳实践
|
||||
|
||||
### Docker Volumes 安全配置
|
||||
|
||||
**关键规则**: 所有生产数据卷必须配置为 `external: true`
|
||||
|
||||
**正确配置** (`deploy/tencent-cloud/docker-compose.dockerhub.yml`):
|
||||
```yaml
|
||||
volumes:
|
||||
postgres_prod_data:
|
||||
external: true
|
||||
name: ai-project_postgres_prod_data
|
||||
redis_prod_data:
|
||||
external: true
|
||||
name: ai-project_redis_prod_data
|
||||
```
|
||||
|
||||
**危险配置** (会被 `docker compose down` 删除):
|
||||
```yaml
|
||||
volumes:
|
||||
postgres_prod_data:
|
||||
redis_prod_data:
|
||||
```
|
||||
|
||||
**验证**:
|
||||
```bash
|
||||
ssh tools_ai_proj "docker volume ls | grep ai-project"
|
||||
# 应该看到:
|
||||
# ai-project_postgres_prod_data
|
||||
# ai-project_redis_prod_data
|
||||
```
|
||||
|
||||
### 备份策略
|
||||
|
||||
1. **每日自动备份** - 使用 cron 定时任务
|
||||
2. **双层备份** - 本地 + 阿里云 OSS
|
||||
3. **定期验证** - 每周测试备份恢复流程
|
||||
4. **保留策略** - 30 天自动清理
|
||||
|
||||
### pg_dump 最佳参数
|
||||
|
||||
```bash
|
||||
# 跨服务器迁移
|
||||
pg_dump --no-owner --no-acl --clean --if-exists --exclude-table=<problem_table>
|
||||
|
||||
# 参数说明:
|
||||
# --no-owner 不恢复对象所有者(避免用户名冲突)
|
||||
# --no-acl 不恢复访问权限(避免权限问题)
|
||||
# --clean 包含 DROP 语句(完全替换)
|
||||
# --if-exists DROP 前检查存在(避免错误)
|
||||
# --exclude-table 排除问题表(如有 JSON 格式问题的表)
|
||||
```
|
||||
|
||||
### 数据恢复检查清单
|
||||
|
||||
在执行恢复前,务必检查以下项目:
|
||||
|
||||
- [ ] 确认备份文件完整性(gzip -t 验证)
|
||||
- [ ] 停止相关应用服务(避免数据不一致)
|
||||
- [ ] 完全重建数据库(DROP + CREATE,避免依赖冲突)
|
||||
- [ ] 恢复后创建缺失的表(如被排除的表)
|
||||
- [ ] 运行数据库迁移(确保表结构最新)
|
||||
- [ ] 验证数据完整性(检查关键表行数)
|
||||
- [ ] 测试应用功能(登录、关键业务流程)
|
||||
- [ ] 清理临时文件(备份文件、SQL 文件)
|
||||
|
||||
### 网络传输优化
|
||||
|
||||
**场景**: 跨地域高延迟环境(如澳洲 → 腾讯云)
|
||||
|
||||
**问题**: 直连延迟 370ms+,大文件传输极慢
|
||||
|
||||
**方案**: 使用地理位置中间的跳板机
|
||||
|
||||
```bash
|
||||
# 直连(慢): 澳洲 → 腾讯云 (370ms+)
|
||||
scp file.gz tools_ai_proj:/tmp/
|
||||
|
||||
# 优化(快): 澳洲 → 新加坡 → 腾讯云
|
||||
scp file.gz singapore:/tmp/
|
||||
ssh singapore "scp /tmp/file.gz tools_ai_proj:/tmp/"
|
||||
```
|
||||
|
||||
**新加坡跳板机信息**:
|
||||
- 别名: singapore
|
||||
- IP: 43.134.28.147
|
||||
- 用户: ubuntu
|
||||
- SSH Key: ~/.ssh/singpore.pem
|
||||
|
||||
---
|
||||
|
||||
## 监控与告警
|
||||
|
||||
### 每周检查清单
|
||||
|
||||
**建议执行频率**: 每周一次
|
||||
|
||||
```bash
|
||||
# 1. 检查本地备份
|
||||
ssh tools_ai_proj "ls -lh /backup/ai-project/database/$(date +%Y%m%d)/"
|
||||
|
||||
# 2. 检查 OSS 备份
|
||||
ssh tools_ai_proj "ossutil stat oss://fnos2026/ai-project/backups/latest.sql.gz --config-file ~/.ossutilconfig"
|
||||
|
||||
# 3. 检查 cron 日志
|
||||
ssh tools_ai_proj "tail -20 /var/log/ai-project-backup.log"
|
||||
ssh tools_ai_proj "tail -20 /var/log/ai-project-oss-sync.log"
|
||||
|
||||
# 4. 验证备份大小(应该在 10-20M 范围)
|
||||
ssh tools_ai_proj "du -sh /backup/ai-project/database/$(date +%Y%m%d)/"
|
||||
|
||||
# 5. 测试备份完整性
|
||||
ssh tools_ai_proj "
|
||||
gzip -t /backup/ai-project/database/latest.sql.gz && \
|
||||
echo '✓ 本地备份完整' || echo '✗ 本地备份损坏'
|
||||
"
|
||||
```
|
||||
|
||||
### 备份失败排查
|
||||
|
||||
如果备份或同步失败:
|
||||
|
||||
1. **检查磁盘空间**:
|
||||
```bash
|
||||
ssh tools_ai_proj "df -h"
|
||||
```
|
||||
|
||||
2. **检查 PostgreSQL 容器状态**:
|
||||
```bash
|
||||
ssh tools_ai_proj "docker ps | grep postgres"
|
||||
```
|
||||
|
||||
3. **检查 ossutil 配置**:
|
||||
```bash
|
||||
ssh tools_ai_proj "cat ~/.ossutilconfig"
|
||||
```
|
||||
|
||||
4. **测试 OSS 连接**:
|
||||
```bash
|
||||
ssh tools_ai_proj "ossutil ls oss://fnos2026/ --config-file ~/.ossutilconfig"
|
||||
```
|
||||
|
||||
5. **手动运行脚本查看详细错误**:
|
||||
```bash
|
||||
ssh tools_ai_proj "/opt/ai-project/deploy/scripts/backup-database.sh"
|
||||
ssh tools_ai_proj "/opt/ai-project/deploy/scripts/backup-to-oss.sh"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 成本估算
|
||||
|
||||
**基于当前数据量** (13MB/天):
|
||||
|
||||
| 项目 | 计算 | 月成本 |
|
||||
|------|------|--------|
|
||||
| OSS 存储 | 13MB × 30天 = 390MB × ¥0.12/GB | ¥0.05 |
|
||||
| OSS 流量 | 13MB × 30天 = 390MB × ¥0.50/GB | ¥0.20 |
|
||||
| **总计** | | **¥0.25** |
|
||||
|
||||
---
|
||||
|
||||
## 故障案例
|
||||
|
||||
### 2026-01-15: 生产数据库丢失事件
|
||||
|
||||
**事件时间**: 2026-01-15 00:00:00 - 00:46:00 CST
|
||||
|
||||
**事件**: Jenkins 部署时 `docker compose down` 删除了非 external volumes
|
||||
|
||||
**影响**: 生产数据库完全清空,所有用户无法登录
|
||||
|
||||
**恢复过程**:
|
||||
1. 从本地开发环境导出完整数据(41 用户、54 项目、4,722 任务)
|
||||
2. 使用新加坡跳板机优化传输(解决 370ms+ 延迟)
|
||||
3. 完全重建数据库避免依赖冲突
|
||||
4. 重置所有管理员密码
|
||||
|
||||
**恢复完成时间**: 2026-01-15 00:46:00 CST
|
||||
|
||||
**预防措施**:
|
||||
1. ✅ 所有数据卷标记为 `external: true` (完成时间: 2026-01-15 00:46:00)
|
||||
2. ✅ Jenkinsfile 添加自动数据库迁移 (完成时间: 2026-01-15 00:46:00)
|
||||
3. ✅ 临时禁用 webhook 自动部署 (完成时间: 2026-01-15 00:46:00)
|
||||
4. ✅ 配置自动备份策略(本地 + OSS 双层备份)(完成时间: 2026-01-15 02:30:46)
|
||||
|
||||
**详细记录**: 见思源笔记 `devops/运维记录/2026-01-15 AI-Proj生产数据库恢复记录`
|
||||
|
||||
---
|
||||
|
||||
## 用户管理相关
|
||||
|
||||
数据库用户管理(创建用户、重置密码)请参考:
|
||||
- **ops-tools/SKILL.md** - "AI-Proj 用户管理" 章节
|
||||
- **ai-proj-deploy.md** - "用户管理" 章节
|
||||
|
||||
**关键注意事项**:
|
||||
- 密码哈希使用 bcrypt **cost 12**(后端 `utils/password.go` 的 `DefaultCost`)
|
||||
- 由于 `$` 字符问题,SQL 必须通过文件传输方式执行
|
||||
|
||||
---
|
||||
|
||||
## 相关资源
|
||||
|
||||
- **父技能**: ops-tools/skill.md
|
||||
- **备份脚本**: `/opt/ai-project/deploy/scripts/backup-database.sh`
|
||||
- **OSS 同步脚本**: `/opt/ai-project/deploy/scripts/backup-to-oss.sh`
|
||||
- **凭据文件**: `~/.config/devops/credentials.env`
|
||||
- **SiYuan 笔记**: `devops/运维记录/2026-01-15 AI-Proj生产数据库恢复记录`
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## 数据库迁移标准流程
|
||||
|
||||
> **强制要求**:任何数据库迁移操作必须遵循以下流程。
|
||||
|
||||
### 迁移检查清单
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 数据库迁移标准流程 │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. 迁移前备份 ⬅️ 必须 │
|
||||
│ ssh tools_ai_proj 'docker exec ai_postgres_prod \ │
|
||||
│ pg_dump -U ai_prod_user -Fc ai_project_prod \ │
|
||||
│ > /backup/ai-project/database/pre_migration.dump' │
|
||||
│ │
|
||||
│ 2. 验证备份文件 │
|
||||
│ ssh tools_ai_proj 'ls -lh /backup/.../pre_migration.dump'│
|
||||
│ │
|
||||
│ 3. 记录迁移前状态 │
|
||||
│ SELECT COUNT(*) FROM <table>; │
|
||||
│ │
|
||||
│ 4. 执行迁移(使用事务) │
|
||||
│ BEGIN; ... COMMIT; │
|
||||
│ │
|
||||
│ 5. 验证迁移结果 │
|
||||
│ SELECT COUNT(*) FROM <table>; │
|
||||
│ │
|
||||
│ 6. 如有问题,恢复备份 │
|
||||
│ pg_restore -Fc pre_migration.dump │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 迁移 SQL 模板
|
||||
|
||||
```sql
|
||||
-- ============================================
|
||||
-- 迁移脚本模板
|
||||
-- 执行前请先备份!
|
||||
-- ============================================
|
||||
|
||||
BEGIN;
|
||||
|
||||
-- 迁移前统计
|
||||
\echo '=== 迁移前统计 ==='
|
||||
SELECT 'table_name' as info, COUNT(*) as count FROM table_name WHERE condition;
|
||||
|
||||
-- 执行迁移
|
||||
\echo '=== 执行迁移 ==='
|
||||
UPDATE table_name SET column = new_value WHERE condition;
|
||||
|
||||
-- 迁移后统计
|
||||
\echo '=== 迁移后统计 ==='
|
||||
SELECT 'table_name' as info, COUNT(*) as count FROM table_name WHERE condition;
|
||||
|
||||
-- 确认无误后提交
|
||||
COMMIT;
|
||||
|
||||
\echo '=== 迁移完成 ==='
|
||||
```
|
||||
|
||||
### 迁移失败回滚
|
||||
|
||||
```bash
|
||||
# 1. 停止后端服务
|
||||
ssh tools_ai_proj 'docker stop ai_backend_prod'
|
||||
|
||||
# 2. 恢复备份
|
||||
ssh tools_ai_proj 'docker exec ai_postgres_prod pg_restore \
|
||||
-U ai_prod_user -d ai_project_prod --clean --if-exists -Fc \
|
||||
/backup/ai-project/database/ai_project_XXXXXXXX_pre_migration.dump'
|
||||
|
||||
# 3. 启动后端服务
|
||||
ssh tools_ai_proj 'docker start ai_backend_prod'
|
||||
|
||||
# 4. 验证服务
|
||||
curl -s https://ai.pipexerp.com/api/v1/health | jq .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 版本历史
|
||||
|
||||
| 版本 | 日期 | 变更 |
|
||||
|------|------|------|
|
||||
| 2.0 | 2026-02-02 | 升级为全局技能:新增迁移前备份流程、多数据库支持、7天+月度保留策略、快速恢复命令 |
|
||||
| 1.0 | 2026-01-15 | 初始版本:AI-Proj 备份与 OSS 同步 |
|
||||
|
||||
---
|
||||
|
||||
**文档创建时间**: 2026-01-15 07:30:00 ACDT
|
||||
**最后更新时间**: 2026-02-02
|
||||
**文档状态**: ✅ 正常运行
|
||||
41
skills-ops/ops-tools-plugin/deploy-check.sh
Executable file
41
skills-ops/ops-tools-plugin/deploy-check.sh
Executable file
@@ -0,0 +1,41 @@
|
||||
#!/bin/bash
|
||||
# 部署状态检查脚本
|
||||
# 用法: ./deploy-check.sh [ai-proj|pipeXerp]
|
||||
|
||||
set -e
|
||||
|
||||
TOOLS_SERVER="root@101.200.136.200"
|
||||
TOOLS_KEY="~/.ssh/tools.pem"
|
||||
JOB_NAME="${1:-ai-proj}"
|
||||
|
||||
echo "======================================"
|
||||
echo "Jenkins Job: $JOB_NAME"
|
||||
echo "======================================"
|
||||
|
||||
ssh -i $TOOLS_KEY -o ConnectTimeout=5 $TOOLS_SERVER << EOF
|
||||
echo "--- 最近 5 次构建 ---"
|
||||
ls -lt /var/lib/jenkins/jobs/$JOB_NAME/builds/ 2>/dev/null | head -6
|
||||
|
||||
echo ""
|
||||
echo "--- 最近成功构建 ---"
|
||||
if [ -L /var/lib/jenkins/jobs/$JOB_NAME/builds/lastSuccessfulBuild ]; then
|
||||
BUILD_NUM=\$(readlink /var/lib/jenkins/jobs/$JOB_NAME/builds/lastSuccessfulBuild)
|
||||
echo "Build #\$BUILD_NUM"
|
||||
if [ -f "/var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log" ]; then
|
||||
echo "构建时间: \$(stat -c %y /var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log 2>/dev/null || stat -f %Sm /var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log)"
|
||||
fi
|
||||
else
|
||||
echo "无成功构建记录"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "--- 最近失败构建 ---"
|
||||
if [ -L /var/lib/jenkins/jobs/$JOB_NAME/builds/lastFailedBuild ]; then
|
||||
BUILD_NUM=\$(readlink /var/lib/jenkins/jobs/$JOB_NAME/builds/lastFailedBuild)
|
||||
echo "Build #\$BUILD_NUM"
|
||||
echo "错误日志(最后 20 行):"
|
||||
tail -20 /var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log 2>/dev/null || echo "无法读取日志"
|
||||
else
|
||||
echo "无失败构建记录"
|
||||
fi
|
||||
EOF
|
||||
97
skills-ops/ops-tools-plugin/gitea-pr.sh
Executable file
97
skills-ops/ops-tools-plugin/gitea-pr.sh
Executable file
@@ -0,0 +1,97 @@
|
||||
#!/bin/bash
|
||||
# Gitea PR 操作脚本
|
||||
# 用法:
|
||||
# ./gitea-pr.sh list # 列出 PR
|
||||
# ./gitea-pr.sh create <title> <head> # 创建 PR
|
||||
# ./gitea-pr.sh merge <pr-number> # 合并 PR
|
||||
|
||||
set -e
|
||||
|
||||
# 加载凭据
|
||||
source ~/.config/devops/credentials.env
|
||||
|
||||
REPO="Tools/new-ai-proj"
|
||||
ACTION="${1:-list}"
|
||||
|
||||
case "$ACTION" in
|
||||
list)
|
||||
echo "=== 当前 Pull Requests ==="
|
||||
curl -s "$GITEA_URL/api/v1/repos/$REPO/pulls?state=open" \
|
||||
-H "Authorization: token $GITEA_TOKEN" | \
|
||||
python3 -c "
|
||||
import sys, json
|
||||
prs = json.load(sys.stdin)
|
||||
if not prs:
|
||||
print('没有开放的 PR')
|
||||
else:
|
||||
for pr in prs:
|
||||
print(f\"#{pr['number']} [{pr['state']}] {pr['title']}\")
|
||||
print(f\" {pr['head']['ref']} -> {pr['base']['ref']}\")
|
||||
print(f\" 作者: {pr['user']['login']}\")
|
||||
print()
|
||||
"
|
||||
;;
|
||||
|
||||
create)
|
||||
TITLE="$2"
|
||||
HEAD="$3"
|
||||
BASE="${4:-main}"
|
||||
|
||||
if [ -z "$TITLE" ] || [ -z "$HEAD" ]; then
|
||||
echo "用法: $0 create <title> <head-branch> [base-branch]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "创建 PR: $TITLE"
|
||||
echo "分支: $HEAD -> $BASE"
|
||||
|
||||
curl -s -X POST "$GITEA_URL/api/v1/repos/$REPO/pulls" \
|
||||
-H "Authorization: token $GITEA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"title\":\"$TITLE\",\"head\":\"$HEAD\",\"base\":\"$BASE\"}" | \
|
||||
python3 -c "
|
||||
import sys, json
|
||||
pr = json.load(sys.stdin)
|
||||
if 'number' in pr:
|
||||
print(f\"PR 创建成功! #{pr['number']}\")
|
||||
print(f\"URL: {pr['html_url']}\")
|
||||
else:
|
||||
print(f\"创建失败: {pr.get('message', '未知错误')}\")
|
||||
"
|
||||
;;
|
||||
|
||||
merge)
|
||||
PR_NUM="$2"
|
||||
|
||||
if [ -z "$PR_NUM" ]; then
|
||||
echo "用法: $0 merge <pr-number>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "合并 PR #$PR_NUM..."
|
||||
|
||||
curl -s -X POST "$GITEA_URL/api/v1/repos/$REPO/pulls/$PR_NUM/merge" \
|
||||
-H "Authorization: token $GITEA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"Do":"merge"}' | \
|
||||
python3 -c "
|
||||
import sys, json
|
||||
try:
|
||||
result = json.load(sys.stdin)
|
||||
if result:
|
||||
print(f\"结果: {result}\")
|
||||
else:
|
||||
print('PR 合并成功!')
|
||||
except:
|
||||
print('PR 合并成功!')
|
||||
"
|
||||
;;
|
||||
|
||||
*)
|
||||
echo "用法: $0 {list|create|merge}"
|
||||
echo " list - 列出所有开放的 PR"
|
||||
echo " create <title> <head> - 创建新 PR"
|
||||
echo " merge <pr-number> - 合并 PR"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
84
skills-ops/ops-tools-plugin/health-check.sh
Executable file
84
skills-ops/ops-tools-plugin/health-check.sh
Executable file
@@ -0,0 +1,84 @@
|
||||
#!/bin/bash
|
||||
# 服务器健康检查脚本
|
||||
# 用法: ./health-check.sh [aliyun|website|all]
|
||||
|
||||
set -e
|
||||
|
||||
TOOLS_SERVER="root@101.200.136.200"
|
||||
TOOLS_KEY="~/.ssh/tools.pem"
|
||||
WEBSITE_SERVER="root@192.144.137.14"
|
||||
WEBSITE_KEY="~/.ssh/officialWebsite.pem"
|
||||
|
||||
check_tools_server() {
|
||||
echo "======================================"
|
||||
echo "Tools 服务器 (101.200.136.200)"
|
||||
echo "======================================"
|
||||
|
||||
ssh -i $TOOLS_KEY -o ConnectTimeout=5 $TOOLS_SERVER << 'EOF'
|
||||
echo "--- 系统负载 ---"
|
||||
uptime
|
||||
|
||||
echo ""
|
||||
echo "--- 内存使用 ---"
|
||||
free -h
|
||||
|
||||
echo ""
|
||||
echo "--- 磁盘使用 ---"
|
||||
df -h | grep -E '^/dev|Filesystem'
|
||||
|
||||
echo ""
|
||||
echo "--- Docker 容器 ---"
|
||||
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
|
||||
|
||||
echo ""
|
||||
echo "--- 系统服务 ---"
|
||||
echo -n "Jenkins: "; systemctl is-active jenkins
|
||||
echo -n "Nginx: "; systemctl is-active nginx
|
||||
echo -n "Docker: "; systemctl is-active docker
|
||||
|
||||
echo ""
|
||||
echo "--- 端口检查 ---"
|
||||
netstat -tlnp 2>/dev/null | grep -E ':3000|:8080|:10022|:5000' | awk '{print $4, $7}'
|
||||
EOF
|
||||
}
|
||||
|
||||
check_website_server() {
|
||||
echo "======================================"
|
||||
echo "Website 服务器 (192.144.137.14)"
|
||||
echo "======================================"
|
||||
|
||||
ssh -i $WEBSITE_KEY -o ConnectTimeout=5 $WEBSITE_SERVER << 'EOF'
|
||||
echo "--- 系统负载 ---"
|
||||
uptime
|
||||
|
||||
echo ""
|
||||
echo "--- 内存使用 ---"
|
||||
free -h
|
||||
|
||||
echo ""
|
||||
echo "--- 磁盘使用 ---"
|
||||
df -h | grep -E '^/dev|Filesystem'
|
||||
|
||||
echo ""
|
||||
echo "--- Nginx 状态 ---"
|
||||
systemctl is-active nginx || echo "nginx not running"
|
||||
EOF
|
||||
}
|
||||
|
||||
case "${1:-all}" in
|
||||
aliyun|tools)
|
||||
check_tools_server
|
||||
;;
|
||||
website)
|
||||
check_website_server
|
||||
;;
|
||||
all)
|
||||
check_tools_server
|
||||
echo ""
|
||||
check_website_server
|
||||
;;
|
||||
*)
|
||||
echo "用法: $0 [aliyun|website|all]"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
76
skills-ops/ops-tools-plugin/incidents.md
Normal file
76
skills-ops/ops-tools-plugin/incidents.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# 重大事件记录
|
||||
|
||||
**创建时间**: 2026-01-29 11:50:00 CST
|
||||
**父技能**: ops-tools
|
||||
|
||||
---
|
||||
|
||||
## 2026-01-17: Melbourne 服务器 VNC 配置失误导致系统崩溃
|
||||
|
||||
**事件时间**: 2026-01-17 08:49:00 - 09:02:00+ ACDT
|
||||
|
||||
**事件**: 执行 `sudo pkill -9 -u coolbuy-dev` 导致 Melbourne 服务器崩溃失联
|
||||
|
||||
**后果**:
|
||||
- macOS 图形系统崩溃
|
||||
- Tailscale VPN 中断
|
||||
- SSH 完全无法连接
|
||||
|
||||
**根本原因**: macOS 图形系统依赖用户会话,强制终止会导致 WindowServer 崩溃
|
||||
|
||||
**教训**:
|
||||
```bash
|
||||
# 永远不要在远程 macOS 上执行
|
||||
sudo pkill -9 -u <username>
|
||||
sudo killall -9 -u <username>
|
||||
```
|
||||
|
||||
**解决方案**: 需要物理访问恢复
|
||||
|
||||
---
|
||||
|
||||
## 2026-01-16: AI-Proj Webhook 自动部署重新启用
|
||||
|
||||
**事件时间**: 2026-01-16 11:06:01 CST
|
||||
|
||||
**事件**: Gitea webhook 重新启用,恢复 main 分支自动部署
|
||||
|
||||
**验证结果**:
|
||||
- Jenkins build #64 自动触发成功
|
||||
- 数据库数据完整保留(664条需求记录验证通过)
|
||||
- External volumes 配置有效
|
||||
|
||||
**部署流程**: `开发者 merge PR → Gitea webhook → Jenkins → 生产自动部署`
|
||||
|
||||
---
|
||||
|
||||
## 2026-01-15: AI-Proj 生产数据库丢失与恢复
|
||||
|
||||
**事件时间**: 2026-01-15 00:00:00 - 00:46:00 CST
|
||||
|
||||
**事件**: Jenkins 部署时 `docker compose down` 删除了非 external volumes
|
||||
|
||||
**影响**: 生产数据库完全清空,所有用户无法登录
|
||||
|
||||
**恢复过程**:
|
||||
1. 从本地开发环境恢复完整数据(41 用户、54 项目、4,722 任务)
|
||||
2. 使用新加坡跳板机优化传输
|
||||
3. 完全重建数据库
|
||||
|
||||
**恢复完成**: 2026-01-15 00:46:00 CST
|
||||
|
||||
**预防措施**:
|
||||
1. 所有数据卷标记为 `external: true`
|
||||
2. Jenkinsfile 添加自动数据库迁移
|
||||
3. 配置自动备份策略(本地 + 阿里云 OSS)
|
||||
|
||||
---
|
||||
|
||||
## 事件记录规范
|
||||
|
||||
记录重大事件时必须包含:
|
||||
- **事件时间**: 完整的开始和结束时间
|
||||
- **事件描述**: 清晰简洁的事件说明
|
||||
- **影响范围**: 受影响的系统和用户
|
||||
- **恢复过程**: 详细的恢复步骤
|
||||
- **预防措施**: 包含完成时间
|
||||
46
skills-ops/ops-tools-plugin/jenkins-build.sh
Executable file
46
skills-ops/ops-tools-plugin/jenkins-build.sh
Executable file
@@ -0,0 +1,46 @@
|
||||
#!/bin/bash
|
||||
# Jenkins 构建触发脚本
|
||||
# 用法: ./jenkins-build.sh [job-name] [env]
|
||||
# 示例: ./jenkins-build.sh ai-proj staging
|
||||
|
||||
set -e
|
||||
|
||||
# 加载凭据
|
||||
source ~/.config/devops/credentials.env
|
||||
|
||||
JOB_NAME="${1:-ai-proj}"
|
||||
DEPLOY_ENV="${2:-staging}"
|
||||
|
||||
echo "触发 Jenkins 构建..."
|
||||
echo "Job: $JOB_NAME"
|
||||
echo "环境: $DEPLOY_ENV"
|
||||
echo ""
|
||||
|
||||
# 触发构建
|
||||
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST \
|
||||
"$JENKINS_URL/job/$JOB_NAME/buildWithParameters" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" \
|
||||
--data "DEPLOY_ENV=$DEPLOY_ENV&SKIP_TESTS=false")
|
||||
|
||||
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
|
||||
|
||||
if [ "$HTTP_CODE" = "201" ]; then
|
||||
echo "构建已触发成功!"
|
||||
echo ""
|
||||
echo "查看构建状态: $JENKINS_URL/job/$JOB_NAME/"
|
||||
|
||||
# 等待 2 秒后获取构建号
|
||||
sleep 2
|
||||
BUILD_INFO=$(curl -s "$JENKINS_URL/job/$JOB_NAME/lastBuild/api/json" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" 2>/dev/null)
|
||||
|
||||
if [ -n "$BUILD_INFO" ]; then
|
||||
BUILD_NUM=$(echo "$BUILD_INFO" | python3 -c "import sys,json; print(json.load(sys.stdin).get('number','N/A'))" 2>/dev/null || echo "N/A")
|
||||
BUILD_STATUS=$(echo "$BUILD_INFO" | python3 -c "import sys,json; d=json.load(sys.stdin); print('进行中' if d.get('building') else d.get('result','未知'))" 2>/dev/null || echo "未知")
|
||||
echo "构建号: #$BUILD_NUM"
|
||||
echo "状态: $BUILD_STATUS"
|
||||
fi
|
||||
else
|
||||
echo "构建触发失败! HTTP 状态码: $HTTP_CODE"
|
||||
exit 1
|
||||
fi
|
||||
165
skills-ops/ops-tools-plugin/mcp-config.md
Normal file
165
skills-ops/ops-tools-plugin/mcp-config.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Claude Code MCP 配置指南
|
||||
|
||||
**创建时间**: 2026-01-29 11:50:00 CST
|
||||
**父技能**: ops-tools
|
||||
|
||||
## 概述
|
||||
|
||||
MCP (Model Context Protocol) 是 Claude Code 与外部服务集成的标准协议。
|
||||
|
||||
## 配置文件位置
|
||||
|
||||
| 配置文件 | 作用域 | 说明 |
|
||||
|----------|--------|------|
|
||||
| `~/.claude/.mcp.json` | 用户级(推荐) | 所有项目共享 |
|
||||
| `.claude/mcp.json` | 项目级 | 仅当前项目生效 |
|
||||
|
||||
## 配置模板(stdio 模式)
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"<服务名称>": {
|
||||
"type": "stdio",
|
||||
"command": "node",
|
||||
"args": ["<服务入口文件路径>"],
|
||||
"env": {
|
||||
"API_BASE": "<API基础URL>",
|
||||
"API_TOKEN": "<PAT令牌>",
|
||||
"NODE_ENV": "production"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**参数说明**:
|
||||
|
||||
| 参数 | 说明 | 示例 |
|
||||
|------|------|------|
|
||||
| `type` | 传输类型,固定为 `stdio` | `stdio` |
|
||||
| `command` | 启动命令 | `node` |
|
||||
| `args` | 入口文件路径 | `["dist/index.js"]` |
|
||||
| `env` | 环境变量 | API 地址、PAT 令牌 |
|
||||
|
||||
## ai-proj MCP 配置
|
||||
|
||||
**配置文件** (`~/.claude/.mcp.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"ai-proj": {
|
||||
"type": "stdio",
|
||||
"command": "node",
|
||||
"args": ["/Users/coolbuy-dev/coding/new-ai-proj/mcp-task-bridge/dist/index.js"],
|
||||
"env": {
|
||||
"TASK_API_BASE": "https://ai.pipexerp.com/api/v1",
|
||||
"TASK_API_TOKEN": "aiproj_pk_2ecf8f8728b70afd4420af3875f4f7505c9fe8231a4771972b0f385aa1c75099",
|
||||
"NODE_ENV": "production"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 环境变量
|
||||
|
||||
| 变量 | 说明 |
|
||||
|------|------|
|
||||
| `TASK_API_BASE` | `https://ai.pipexerp.com/api/v1` |
|
||||
| `TASK_API_TOKEN` | `aiproj_pk_2ecf8f8728b70afd4420af3875f4f7505c9fe8231a4771972b0f385aa1c75099` |
|
||||
|
||||
### 前置条件
|
||||
|
||||
```bash
|
||||
# 编译 mcp-task-bridge
|
||||
cd /Users/coolbuy-dev/coding/new-ai-proj/mcp-task-bridge
|
||||
npm run build
|
||||
```
|
||||
|
||||
### 验证步骤
|
||||
|
||||
```bash
|
||||
# 1. 手动测试服务
|
||||
cd /Users/coolbuy-dev/coding/new-ai-proj/mcp-task-bridge
|
||||
TASK_API_BASE="https://ai.pipexerp.com/api/v1" \
|
||||
TASK_API_TOKEN="aiproj_pk_2ecf8f8728b70afd4420af3875f4f7505c9fe8231a4771972b0f385aa1c75099" \
|
||||
node dist/index.js
|
||||
|
||||
# 2. 测试 API 连接
|
||||
curl -s "https://ai.pipexerp.com/api/v1/projects" \
|
||||
-H "Authorization: Bearer aiproj_pk_2ecf8f8728b70afd4420af3875f4f7505c9fe8231a4771972b0f385aa1c75099" | head -c 200
|
||||
|
||||
# 3. Claude Code 重连
|
||||
/mcp
|
||||
```
|
||||
|
||||
## 开发新 MCP 服务
|
||||
|
||||
### 步骤 1: 创建项目
|
||||
|
||||
```bash
|
||||
mkdir my-mcp-service && cd my-mcp-service
|
||||
npm init -y
|
||||
npm install @modelcontextprotocol/sdk
|
||||
```
|
||||
|
||||
### 步骤 2: 实现服务
|
||||
|
||||
```typescript
|
||||
// src/index.ts
|
||||
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
|
||||
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
|
||||
|
||||
const server = new Server({
|
||||
name: "my-mcp-service",
|
||||
version: "1.0.0",
|
||||
}, { capabilities: { tools: {} } });
|
||||
|
||||
// 注册和实现工具...
|
||||
|
||||
const transport = new StdioServerTransport();
|
||||
await server.connect(transport);
|
||||
```
|
||||
|
||||
### 步骤 3: 编译和配置
|
||||
|
||||
```bash
|
||||
npm run build
|
||||
vim ~/.claude/.mcp.json # 添加配置
|
||||
claude # 重启 Claude Code
|
||||
```
|
||||
|
||||
## 故障排查
|
||||
|
||||
### MCP 连接失败
|
||||
|
||||
```bash
|
||||
# 1. 检查配置文件
|
||||
cat ~/.claude/.mcp.json | jq .
|
||||
|
||||
# 2. 检查服务文件
|
||||
ls -la <服务路径>/dist/index.js
|
||||
|
||||
# 3. 手动运行服务
|
||||
<环境变量> node dist/index.js
|
||||
```
|
||||
|
||||
**常见原因**:
|
||||
- 服务未编译 → `npm run build`
|
||||
- Token 无效 → 重新生成 PAT
|
||||
- `/mcp` 无效 → 重启 Claude Code
|
||||
|
||||
### 认证失败 (401)
|
||||
|
||||
```bash
|
||||
curl -s "<API_BASE>/auth/me" -H "Authorization: Bearer <TOKEN>"
|
||||
```
|
||||
|
||||
## 相关文件
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `~/.claude/.mcp.json` | MCP 配置 |
|
||||
| `mcp-task-bridge/dist/index.js` | ai-proj MCP 服务 |
|
||||
41
skills-ops/ops-tools-plugin/scripts/deploy-check.sh
Executable file
41
skills-ops/ops-tools-plugin/scripts/deploy-check.sh
Executable file
@@ -0,0 +1,41 @@
|
||||
#!/bin/bash
|
||||
# 部署状态检查脚本
|
||||
# 用法: ./deploy-check.sh [ai-proj|pipeXerp]
|
||||
|
||||
set -e
|
||||
|
||||
TOOLS_SERVER="root@101.200.136.200"
|
||||
TOOLS_KEY="~/.ssh/tools.pem"
|
||||
JOB_NAME="${1:-ai-proj}"
|
||||
|
||||
echo "======================================"
|
||||
echo "Jenkins Job: $JOB_NAME"
|
||||
echo "======================================"
|
||||
|
||||
ssh -i $TOOLS_KEY -o ConnectTimeout=5 $TOOLS_SERVER << EOF
|
||||
echo "--- 最近 5 次构建 ---"
|
||||
ls -lt /var/lib/jenkins/jobs/$JOB_NAME/builds/ 2>/dev/null | head -6
|
||||
|
||||
echo ""
|
||||
echo "--- 最近成功构建 ---"
|
||||
if [ -L /var/lib/jenkins/jobs/$JOB_NAME/builds/lastSuccessfulBuild ]; then
|
||||
BUILD_NUM=\$(readlink /var/lib/jenkins/jobs/$JOB_NAME/builds/lastSuccessfulBuild)
|
||||
echo "Build #\$BUILD_NUM"
|
||||
if [ -f "/var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log" ]; then
|
||||
echo "构建时间: \$(stat -c %y /var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log 2>/dev/null || stat -f %Sm /var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log)"
|
||||
fi
|
||||
else
|
||||
echo "无成功构建记录"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "--- 最近失败构建 ---"
|
||||
if [ -L /var/lib/jenkins/jobs/$JOB_NAME/builds/lastFailedBuild ]; then
|
||||
BUILD_NUM=\$(readlink /var/lib/jenkins/jobs/$JOB_NAME/builds/lastFailedBuild)
|
||||
echo "Build #\$BUILD_NUM"
|
||||
echo "错误日志(最后 20 行):"
|
||||
tail -20 /var/lib/jenkins/jobs/$JOB_NAME/builds/\$BUILD_NUM/log 2>/dev/null || echo "无法读取日志"
|
||||
else
|
||||
echo "无失败构建记录"
|
||||
fi
|
||||
EOF
|
||||
97
skills-ops/ops-tools-plugin/scripts/gitea-pr.sh
Executable file
97
skills-ops/ops-tools-plugin/scripts/gitea-pr.sh
Executable file
@@ -0,0 +1,97 @@
|
||||
#!/bin/bash
|
||||
# Gitea PR 操作脚本
|
||||
# 用法:
|
||||
# ./gitea-pr.sh list # 列出 PR
|
||||
# ./gitea-pr.sh create <title> <head> # 创建 PR
|
||||
# ./gitea-pr.sh merge <pr-number> # 合并 PR
|
||||
|
||||
set -e
|
||||
|
||||
# 加载凭据
|
||||
source ~/.config/devops/credentials.env
|
||||
|
||||
REPO="Tools/new-ai-proj"
|
||||
ACTION="${1:-list}"
|
||||
|
||||
case "$ACTION" in
|
||||
list)
|
||||
echo "=== 当前 Pull Requests ==="
|
||||
curl -s "$GITEA_URL/api/v1/repos/$REPO/pulls?state=open" \
|
||||
-H "Authorization: token $GITEA_TOKEN" | \
|
||||
python3 -c "
|
||||
import sys, json
|
||||
prs = json.load(sys.stdin)
|
||||
if not prs:
|
||||
print('没有开放的 PR')
|
||||
else:
|
||||
for pr in prs:
|
||||
print(f\"#{pr['number']} [{pr['state']}] {pr['title']}\")
|
||||
print(f\" {pr['head']['ref']} -> {pr['base']['ref']}\")
|
||||
print(f\" 作者: {pr['user']['login']}\")
|
||||
print()
|
||||
"
|
||||
;;
|
||||
|
||||
create)
|
||||
TITLE="$2"
|
||||
HEAD="$3"
|
||||
BASE="${4:-main}"
|
||||
|
||||
if [ -z "$TITLE" ] || [ -z "$HEAD" ]; then
|
||||
echo "用法: $0 create <title> <head-branch> [base-branch]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "创建 PR: $TITLE"
|
||||
echo "分支: $HEAD -> $BASE"
|
||||
|
||||
curl -s -X POST "$GITEA_URL/api/v1/repos/$REPO/pulls" \
|
||||
-H "Authorization: token $GITEA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"title\":\"$TITLE\",\"head\":\"$HEAD\",\"base\":\"$BASE\"}" | \
|
||||
python3 -c "
|
||||
import sys, json
|
||||
pr = json.load(sys.stdin)
|
||||
if 'number' in pr:
|
||||
print(f\"PR 创建成功! #{pr['number']}\")
|
||||
print(f\"URL: {pr['html_url']}\")
|
||||
else:
|
||||
print(f\"创建失败: {pr.get('message', '未知错误')}\")
|
||||
"
|
||||
;;
|
||||
|
||||
merge)
|
||||
PR_NUM="$2"
|
||||
|
||||
if [ -z "$PR_NUM" ]; then
|
||||
echo "用法: $0 merge <pr-number>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "合并 PR #$PR_NUM..."
|
||||
|
||||
curl -s -X POST "$GITEA_URL/api/v1/repos/$REPO/pulls/$PR_NUM/merge" \
|
||||
-H "Authorization: token $GITEA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"Do":"merge"}' | \
|
||||
python3 -c "
|
||||
import sys, json
|
||||
try:
|
||||
result = json.load(sys.stdin)
|
||||
if result:
|
||||
print(f\"结果: {result}\")
|
||||
else:
|
||||
print('PR 合并成功!')
|
||||
except:
|
||||
print('PR 合并成功!')
|
||||
"
|
||||
;;
|
||||
|
||||
*)
|
||||
echo "用法: $0 {list|create|merge}"
|
||||
echo " list - 列出所有开放的 PR"
|
||||
echo " create <title> <head> - 创建新 PR"
|
||||
echo " merge <pr-number> - 合并 PR"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
84
skills-ops/ops-tools-plugin/scripts/health-check.sh
Executable file
84
skills-ops/ops-tools-plugin/scripts/health-check.sh
Executable file
@@ -0,0 +1,84 @@
|
||||
#!/bin/bash
|
||||
# 服务器健康检查脚本
|
||||
# 用法: ./health-check.sh [aliyun|website|all]
|
||||
|
||||
set -e
|
||||
|
||||
TOOLS_SERVER="root@101.200.136.200"
|
||||
TOOLS_KEY="~/.ssh/tools.pem"
|
||||
WEBSITE_SERVER="root@192.144.137.14"
|
||||
WEBSITE_KEY="~/.ssh/officialWebsite.pem"
|
||||
|
||||
check_tools_server() {
|
||||
echo "======================================"
|
||||
echo "Tools 服务器 (101.200.136.200)"
|
||||
echo "======================================"
|
||||
|
||||
ssh -i $TOOLS_KEY -o ConnectTimeout=5 $TOOLS_SERVER << 'EOF'
|
||||
echo "--- 系统负载 ---"
|
||||
uptime
|
||||
|
||||
echo ""
|
||||
echo "--- 内存使用 ---"
|
||||
free -h
|
||||
|
||||
echo ""
|
||||
echo "--- 磁盘使用 ---"
|
||||
df -h | grep -E '^/dev|Filesystem'
|
||||
|
||||
echo ""
|
||||
echo "--- Docker 容器 ---"
|
||||
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
|
||||
|
||||
echo ""
|
||||
echo "--- 系统服务 ---"
|
||||
echo -n "Jenkins: "; systemctl is-active jenkins
|
||||
echo -n "Nginx: "; systemctl is-active nginx
|
||||
echo -n "Docker: "; systemctl is-active docker
|
||||
|
||||
echo ""
|
||||
echo "--- 端口检查 ---"
|
||||
netstat -tlnp 2>/dev/null | grep -E ':3000|:8080|:10022|:5000' | awk '{print $4, $7}'
|
||||
EOF
|
||||
}
|
||||
|
||||
check_website_server() {
|
||||
echo "======================================"
|
||||
echo "Website 服务器 (192.144.137.14)"
|
||||
echo "======================================"
|
||||
|
||||
ssh -i $WEBSITE_KEY -o ConnectTimeout=5 $WEBSITE_SERVER << 'EOF'
|
||||
echo "--- 系统负载 ---"
|
||||
uptime
|
||||
|
||||
echo ""
|
||||
echo "--- 内存使用 ---"
|
||||
free -h
|
||||
|
||||
echo ""
|
||||
echo "--- 磁盘使用 ---"
|
||||
df -h | grep -E '^/dev|Filesystem'
|
||||
|
||||
echo ""
|
||||
echo "--- Nginx 状态 ---"
|
||||
systemctl is-active nginx || echo "nginx not running"
|
||||
EOF
|
||||
}
|
||||
|
||||
case "${1:-all}" in
|
||||
aliyun|tools)
|
||||
check_tools_server
|
||||
;;
|
||||
website)
|
||||
check_website_server
|
||||
;;
|
||||
all)
|
||||
check_tools_server
|
||||
echo ""
|
||||
check_website_server
|
||||
;;
|
||||
*)
|
||||
echo "用法: $0 [aliyun|website|all]"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
46
skills-ops/ops-tools-plugin/scripts/jenkins-build.sh
Executable file
46
skills-ops/ops-tools-plugin/scripts/jenkins-build.sh
Executable file
@@ -0,0 +1,46 @@
|
||||
#!/bin/bash
|
||||
# Jenkins 构建触发脚本
|
||||
# 用法: ./jenkins-build.sh [job-name] [env]
|
||||
# 示例: ./jenkins-build.sh ai-proj staging
|
||||
|
||||
set -e
|
||||
|
||||
# 加载凭据
|
||||
source ~/.config/devops/credentials.env
|
||||
|
||||
JOB_NAME="${1:-ai-proj}"
|
||||
DEPLOY_ENV="${2:-staging}"
|
||||
|
||||
echo "触发 Jenkins 构建..."
|
||||
echo "Job: $JOB_NAME"
|
||||
echo "环境: $DEPLOY_ENV"
|
||||
echo ""
|
||||
|
||||
# 触发构建
|
||||
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST \
|
||||
"$JENKINS_URL/job/$JOB_NAME/buildWithParameters" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" \
|
||||
--data "DEPLOY_ENV=$DEPLOY_ENV&SKIP_TESTS=false")
|
||||
|
||||
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
|
||||
|
||||
if [ "$HTTP_CODE" = "201" ]; then
|
||||
echo "构建已触发成功!"
|
||||
echo ""
|
||||
echo "查看构建状态: $JENKINS_URL/job/$JOB_NAME/"
|
||||
|
||||
# 等待 2 秒后获取构建号
|
||||
sleep 2
|
||||
BUILD_INFO=$(curl -s "$JENKINS_URL/job/$JOB_NAME/lastBuild/api/json" \
|
||||
-u "$JENKINS_USER:$JENKINS_TOKEN" 2>/dev/null)
|
||||
|
||||
if [ -n "$BUILD_INFO" ]; then
|
||||
BUILD_NUM=$(echo "$BUILD_INFO" | python3 -c "import sys,json; print(json.load(sys.stdin).get('number','N/A'))" 2>/dev/null || echo "N/A")
|
||||
BUILD_STATUS=$(echo "$BUILD_INFO" | python3 -c "import sys,json; d=json.load(sys.stdin); print('进行中' if d.get('building') else d.get('result','未知'))" 2>/dev/null || echo "未知")
|
||||
echo "构建号: #$BUILD_NUM"
|
||||
echo "状态: $BUILD_STATUS"
|
||||
fi
|
||||
else
|
||||
echo "构建触发失败! HTTP 状态码: $HTTP_CODE"
|
||||
exit 1
|
||||
fi
|
||||
44
skills-ops/ops-tools-plugin/scripts/service-restart.sh
Executable file
44
skills-ops/ops-tools-plugin/scripts/service-restart.sh
Executable file
@@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
# 服务重启脚本
|
||||
# 用法: ./service-restart.sh <service-name>
|
||||
# 支持: gitea, jenkins, nginx, registry
|
||||
|
||||
set -e
|
||||
|
||||
TOOLS_SERVER="root@101.200.136.200"
|
||||
TOOLS_KEY="~/.ssh/tools.pem"
|
||||
SERVICE_NAME="$1"
|
||||
|
||||
if [ -z "$SERVICE_NAME" ]; then
|
||||
echo "用法: $0 <service-name>"
|
||||
echo "支持的服务: gitea, jenkins, nginx, registry, docker"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "正在重启服务: $SERVICE_NAME ..."
|
||||
|
||||
case "$SERVICE_NAME" in
|
||||
gitea)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "docker restart gitea && docker logs --tail 10 gitea"
|
||||
;;
|
||||
registry)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "docker restart registry && docker ps | grep registry"
|
||||
;;
|
||||
jenkins)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "systemctl restart jenkins && systemctl status jenkins --no-pager"
|
||||
;;
|
||||
nginx)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "nginx -t && systemctl restart nginx && systemctl status nginx --no-pager"
|
||||
;;
|
||||
docker)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "systemctl restart docker && docker ps"
|
||||
;;
|
||||
*)
|
||||
echo "不支持的服务: $SERVICE_NAME"
|
||||
echo "支持的服务: gitea, jenkins, nginx, registry, docker"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
echo ""
|
||||
echo "服务 $SERVICE_NAME 重启完成"
|
||||
44
skills-ops/ops-tools-plugin/service-restart.sh
Executable file
44
skills-ops/ops-tools-plugin/service-restart.sh
Executable file
@@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
# 服务重启脚本
|
||||
# 用法: ./service-restart.sh <service-name>
|
||||
# 支持: gitea, jenkins, nginx, registry
|
||||
|
||||
set -e
|
||||
|
||||
TOOLS_SERVER="root@101.200.136.200"
|
||||
TOOLS_KEY="~/.ssh/tools.pem"
|
||||
SERVICE_NAME="$1"
|
||||
|
||||
if [ -z "$SERVICE_NAME" ]; then
|
||||
echo "用法: $0 <service-name>"
|
||||
echo "支持的服务: gitea, jenkins, nginx, registry, docker"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "正在重启服务: $SERVICE_NAME ..."
|
||||
|
||||
case "$SERVICE_NAME" in
|
||||
gitea)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "docker restart gitea && docker logs --tail 10 gitea"
|
||||
;;
|
||||
registry)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "docker restart registry && docker ps | grep registry"
|
||||
;;
|
||||
jenkins)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "systemctl restart jenkins && systemctl status jenkins --no-pager"
|
||||
;;
|
||||
nginx)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "nginx -t && systemctl restart nginx && systemctl status nginx --no-pager"
|
||||
;;
|
||||
docker)
|
||||
ssh -i $TOOLS_KEY $TOOLS_SERVER "systemctl restart docker && docker ps"
|
||||
;;
|
||||
*)
|
||||
echo "不支持的服务: $SERVICE_NAME"
|
||||
echo "支持的服务: gitea, jenkins, nginx, registry, docker"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
echo ""
|
||||
echo "服务 $SERVICE_NAME 重启完成"
|
||||
1560
skills-ops/ops-tools-plugin/skills/SKILL.md
Normal file
1560
skills-ops/ops-tools-plugin/skills/SKILL.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user