chore: enable auto-deploy for saltthing.top
- Added version comment for deployment tracking - Auto-deploy configured on fnos with 5-minute sync interval 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
391
ops-unified-management-plan.md
Normal file
391
ops-unified-management-plan.md
Normal file
@@ -0,0 +1,391 @@
|
||||
# OPS 统一管理方案设计文档
|
||||
|
||||
> 创建日期: 2025-12-18
|
||||
> 状态: 待实施
|
||||
|
||||
---
|
||||
|
||||
## 一、背景与现状
|
||||
|
||||
### 1.1 组织架构
|
||||
- **公司数量**: 5 个公司,业务各不相同
|
||||
- **研发团队**: 2 个,分别在成都和北京
|
||||
- **开发人员**: 约 15 人,分布分散,含 home office
|
||||
- **运维归属**: 运维人员归属某一个公司,但服务所有公司
|
||||
|
||||
### 1.2 基础设施
|
||||
- **服务器数量**: 约 30 台
|
||||
- **云服务商**: 阿里云、腾讯云为主
|
||||
- **现有工具**: JumpServer(未充分使用)、监控系统、Jenkins
|
||||
|
||||
### 1.3 当前痛点
|
||||
| 痛点 | 描述 |
|
||||
|------|------|
|
||||
| 响应优先级冲突 | 多个公司同时有需求,不知道先处理谁 |
|
||||
| 权限/安全边界模糊 | 各公司数据和系统隔离不够清晰 |
|
||||
| 两地协作困难 | 成都北京团队配合有障碍 |
|
||||
| 服务器及账户管理繁琐 | **最大痛点**,密钥散落、共享密钥、密码混用 |
|
||||
|
||||
### 1.4 JumpServer 未用起来的原因
|
||||
- 体验问题:多次跳转导致连接不稳定
|
||||
- AI 工具/自动化需要直连服务器,堡垒机模式不适用
|
||||
- 服务器之间不能直连,跳转增多
|
||||
|
||||
---
|
||||
|
||||
## 二、目标状态
|
||||
|
||||
1. **一个入口管所有** — 统一平台,全局视图
|
||||
2. **按公司隔离但统一视角** — 资源逻辑隔离,总负责人有全局视图
|
||||
3. **自动化优先** — 人员变动时权限自动同步
|
||||
4. **兼容 AI 工具** — 支持直连,无多跳延迟
|
||||
|
||||
---
|
||||
|
||||
## 三、解决方案
|
||||
|
||||
### 3.1 整体架构:Headscale 组网
|
||||
|
||||
```
|
||||
Headscale 控制器
|
||||
(身份认证和节点发现)
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
▼ ▼ ▼
|
||||
服务器A ◄────────► 服务器B ◄────────► 开发者
|
||||
100.64.0.1 100.64.0.2 100.64.0.100
|
||||
|
||||
特点:所有节点点对点直连,控制器不转发流量
|
||||
```
|
||||
|
||||
### 3.2 为什么选择 Headscale
|
||||
|
||||
| 对比项 | 传统堡垒机 | Headscale 组网 |
|
||||
|--------|-----------|---------------|
|
||||
| 连接方式 | 所有流量经堡垒机中转 | 点对点直连 |
|
||||
| 服务器互访 | 需多跳 | 直连 |
|
||||
| AI 工具支持 | 体验差 | 原生支持 |
|
||||
| 延迟 | 高 | 低 |
|
||||
| 安全性 | 依赖堡垒机 | 私网隔离 + ACL |
|
||||
|
||||
### 3.3 User/Namespace 划分
|
||||
|
||||
```
|
||||
# 开发环境
|
||||
company-a-dev → A公司开发服务器 + A公司开发者
|
||||
company-b-dev → B公司开发服务器 + B公司开发者
|
||||
company-c-dev → C公司开发服务器 + C公司开发者
|
||||
company-d-dev → D公司开发服务器 + D公司开发者
|
||||
company-e-dev → E公司开发服务器 + E公司开发者
|
||||
|
||||
# 生产环境
|
||||
company-a-prod → A公司生产服务器
|
||||
company-b-prod → B公司生产服务器
|
||||
company-c-prod → C公司生产服务器
|
||||
company-d-prod → D公司生产服务器
|
||||
company-e-prod → E公司生产服务器
|
||||
|
||||
# 管理角色
|
||||
ops → 运维人员(可访问所有)
|
||||
cicd → Jenkins(只访问生产做发布)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 四、技术实现
|
||||
|
||||
### 4.1 Headscale 部署
|
||||
|
||||
#### 目录结构
|
||||
```bash
|
||||
mkdir -p /opt/headscale/{config,data}
|
||||
```
|
||||
|
||||
#### config.yaml
|
||||
```yaml
|
||||
server_url: https://hs.yourdomain.com:443
|
||||
listen_addr: 0.0.0.0:8080
|
||||
metrics_listen_addr: 0.0.0.0:9090
|
||||
|
||||
ip_prefixes:
|
||||
- 100.64.0.0/10
|
||||
|
||||
database:
|
||||
type: sqlite
|
||||
sqlite:
|
||||
path: /var/lib/headscale/db.sqlite
|
||||
|
||||
acl_policy_path: /etc/headscale/acl.yaml
|
||||
```
|
||||
|
||||
#### docker-compose.yml
|
||||
```yaml
|
||||
version: '3'
|
||||
services:
|
||||
headscale:
|
||||
image: headscale/headscale:latest
|
||||
container_name: headscale
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "8080:8080"
|
||||
- "9090:9090"
|
||||
volumes:
|
||||
- ./config:/etc/headscale
|
||||
- ./data:/var/lib/headscale
|
||||
command: serve
|
||||
```
|
||||
|
||||
#### Nginx 反向代理
|
||||
```nginx
|
||||
server {
|
||||
listen 443 ssl;
|
||||
server_name hs.yourdomain.com;
|
||||
|
||||
ssl_certificate /path/to/cert.pem;
|
||||
ssl_certificate_key /path/to/key.pem;
|
||||
|
||||
location / {
|
||||
proxy_pass http://127.0.0.1:8080;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 ACL 配置
|
||||
|
||||
```yaml
|
||||
# /opt/headscale/config/acl.yaml
|
||||
|
||||
groups:
|
||||
group:ops: ["ops"]
|
||||
group:cicd: ["cicd"]
|
||||
group:all-prod:
|
||||
- "company-a-prod"
|
||||
- "company-b-prod"
|
||||
- "company-c-prod"
|
||||
- "company-d-prod"
|
||||
- "company-e-prod"
|
||||
|
||||
acls:
|
||||
# 开发者只能访问自己公司的开发环境
|
||||
- action: accept
|
||||
src: ["company-a-dev"]
|
||||
dst: ["company-a-dev:*"]
|
||||
- action: accept
|
||||
src: ["company-b-dev"]
|
||||
dst: ["company-b-dev:*"]
|
||||
- action: accept
|
||||
src: ["company-c-dev"]
|
||||
dst: ["company-c-dev:*"]
|
||||
- action: accept
|
||||
src: ["company-d-dev"]
|
||||
dst: ["company-d-dev:*"]
|
||||
- action: accept
|
||||
src: ["company-e-dev"]
|
||||
dst: ["company-e-dev:*"]
|
||||
|
||||
# 运维访问所有
|
||||
- action: accept
|
||||
src: ["group:ops"]
|
||||
dst: ["*:*"]
|
||||
|
||||
# CI/CD 访问生产
|
||||
- action: accept
|
||||
src: ["group:cicd"]
|
||||
dst: ["group:all-prod:22"]
|
||||
|
||||
# 同公司生产环境服务器互访
|
||||
- action: accept
|
||||
src: ["company-a-prod"]
|
||||
dst: ["company-a-prod:*"]
|
||||
- action: accept
|
||||
src: ["company-b-prod"]
|
||||
dst: ["company-b-prod:*"]
|
||||
- action: accept
|
||||
src: ["company-c-prod"]
|
||||
dst: ["company-c-prod:*"]
|
||||
- action: accept
|
||||
src: ["company-d-prod"]
|
||||
dst: ["company-d-prod:*"]
|
||||
- action: accept
|
||||
src: ["company-e-prod"]
|
||||
dst: ["company-e-prod:*"]
|
||||
```
|
||||
|
||||
### 4.3 User 创建命令
|
||||
|
||||
```bash
|
||||
# 开发环境
|
||||
docker exec -it headscale headscale users create company-a-dev
|
||||
docker exec -it headscale headscale users create company-b-dev
|
||||
docker exec -it headscale headscale users create company-c-dev
|
||||
docker exec -it headscale headscale users create company-d-dev
|
||||
docker exec -it headscale headscale users create company-e-dev
|
||||
|
||||
# 生产环境
|
||||
docker exec -it headscale headscale users create company-a-prod
|
||||
docker exec -it headscale headscale users create company-b-prod
|
||||
docker exec -it headscale headscale users create company-c-prod
|
||||
docker exec -it headscale headscale users create company-d-prod
|
||||
docker exec -it headscale headscale users create company-e-prod
|
||||
|
||||
# 管理角色
|
||||
docker exec -it headscale headscale users create ops
|
||||
docker exec -it headscale headscale users create cicd
|
||||
```
|
||||
|
||||
### 4.4 生成 AuthKey
|
||||
|
||||
```bash
|
||||
# 示例:为 company-a-dev 生成可复用的 key
|
||||
docker exec -it headscale headscale preauthkeys create \
|
||||
--user company-a-dev \
|
||||
--expiration 720h \
|
||||
--reusable
|
||||
```
|
||||
|
||||
### 4.5 客户端接入
|
||||
|
||||
#### 服务器端(Linux)
|
||||
```bash
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
tailscale up --login-server https://hs.yourdomain.com --authkey <key>
|
||||
```
|
||||
|
||||
#### 开发者电脑
|
||||
```bash
|
||||
# Mac/Windows 安装 Tailscale 客户端后
|
||||
tailscale up --login-server https://hs.yourdomain.com --authkey <key>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 五、实施 Checklist
|
||||
|
||||
### 阶段一:准备工作
|
||||
- [ ] 1.1 准备 Headscale 控制器服务器(1核1G,公网IP)
|
||||
- [ ] 1.2 准备域名和 SSL 证书
|
||||
- [ ] 1.3 梳理服务器清单(30台,标注公司、环境)
|
||||
- [ ] 1.4 梳理人员清单(15人,标注公司、位置)
|
||||
|
||||
### 阶段二:部署 Headscale
|
||||
- [ ] 2.1 创建目录结构
|
||||
- [ ] 2.2 创建 config.yaml
|
||||
- [ ] 2.3 创建 acl.yaml
|
||||
- [ ] 2.4 创建 docker-compose.yml
|
||||
- [ ] 2.5 启动服务
|
||||
- [ ] 2.6 配置 Nginx + HTTPS
|
||||
- [ ] 2.7 验证服务可访问
|
||||
|
||||
### 阶段三:创建用户和 AuthKey
|
||||
- [ ] 3.1 创建开发环境 users(5个)
|
||||
- [ ] 3.2 创建生产环境 users(5个)
|
||||
- [ ] 3.3 创建管理 users(ops, cicd)
|
||||
- [ ] 3.4 为每个 user 生成 preauthkey
|
||||
- [ ] 3.5 验证 ACL 配置
|
||||
|
||||
### 阶段四:服务器接入
|
||||
- [ ] 4.1 试点 1-2 台开发服务器
|
||||
- [ ] 4.2 批量接入开发环境服务器
|
||||
- [ ] 4.3 接入生产环境服务器
|
||||
- [ ] 4.4 接入 Jenkins 服务器
|
||||
- [ ] 4.5 制作服务器 IP 对照表
|
||||
|
||||
### 阶段五:开发者接入
|
||||
- [ ] 5.1 编写开发者接入文档
|
||||
- [ ] 5.2 运维人员先试用
|
||||
- [ ] 5.3 第一批:成都核心开发(3-5人)
|
||||
- [ ] 5.4 第二批:北京核心开发(3-5人)
|
||||
- [ ] 5.5 第三批:其余开发者
|
||||
- [ ] 5.6 验证 AI 工具能否正常使用
|
||||
|
||||
### 阶段六:并行运行期(1-2周)
|
||||
- [ ] 6.1 保持公网 22 端口开放
|
||||
- [ ] 6.2 收集反馈
|
||||
- [ ] 6.3 解决问题
|
||||
- [ ] 6.4 监控 Headscale 服务稳定性
|
||||
|
||||
### 阶段七:切换完成
|
||||
- [ ] 7.1 确认全员适应
|
||||
- [ ] 7.2 关闭服务器公网 22 端口
|
||||
- [ ] 7.3 废弃旧 SSH 密钥
|
||||
- [ ] 7.4 更新 Jenkins 部署配置
|
||||
- [ ] 7.5 JumpServer 处置决策
|
||||
|
||||
### 阶段八:文档和规范
|
||||
- [ ] 8.1 更新运维文档
|
||||
- [ ] 8.2 制定权限申请流程
|
||||
- [ ] 8.3 制定密钥轮换机制
|
||||
|
||||
---
|
||||
|
||||
## 六、时间预估
|
||||
|
||||
| 阶段 | 工作量 |
|
||||
|------|--------|
|
||||
| 准备工作 | 1 天 |
|
||||
| 部署 Headscale | 半天 |
|
||||
| 创建用户和配置 | 半天 |
|
||||
| 服务器接入 | 1-2 天 |
|
||||
| 开发者接入 | 2-3 天 |
|
||||
| 并行运行 | 1-2 周 |
|
||||
| 切换完成 | 1 天 |
|
||||
|
||||
**总计:约 2-3 周完成全部切换**
|
||||
|
||||
---
|
||||
|
||||
## 七、后续规划(第二、三层)
|
||||
|
||||
完成 Headscale 组网后,可继续推进:
|
||||
|
||||
### 第二层:统一身份入口
|
||||
- 搭建 LDAP/KeyCloak 作为统一身份源
|
||||
- JumpServer、Jenkins、监控、Git 对接身份源
|
||||
- 入职/离职一键开通/回收账号
|
||||
|
||||
### 第三层:多云账号治理
|
||||
- Terraform 管理多云资源
|
||||
- 云控制台权限收紧
|
||||
- 按公司打标签,分账
|
||||
|
||||
---
|
||||
|
||||
## 八、风险与应对
|
||||
|
||||
| 风险 | 应对措施 |
|
||||
|------|----------|
|
||||
| Headscale 控制器宕机 | 已连接的节点仍可互通,影响新节点加入 |
|
||||
| ACL 配置错误 | 先在测试环境验证,逐步放开 |
|
||||
| 开发者抵触 | 并行期充分沟通,收集反馈改进 |
|
||||
| 紧急情况无法访问 | 保留 1-2 台服务器的公网端口作为应急入口 |
|
||||
|
||||
---
|
||||
|
||||
## 九、相关命令速查
|
||||
|
||||
```bash
|
||||
# 查看所有节点
|
||||
docker exec -it headscale headscale nodes list
|
||||
|
||||
# 查看所有用户
|
||||
docker exec -it headscale headscale users list
|
||||
|
||||
# 查看某用户的节点
|
||||
docker exec -it headscale headscale nodes list -u company-a-dev
|
||||
|
||||
# 删除节点
|
||||
docker exec -it headscale headscale nodes delete -i <node_id>
|
||||
|
||||
# 验证 ACL
|
||||
docker exec -it headscale headscale policy validate /etc/headscale/acl.yaml
|
||||
|
||||
# 查看 preauthkeys
|
||||
docker exec -it headscale headscale preauthkeys list -u company-a-dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*文档结束*
|
||||
Reference in New Issue
Block a user