move claude-marketplace to ai-proj-helper
This commit is contained in:
8
plugins/openclaw-ops-plugin/.claude-plugin/plugin.json
Normal file
8
plugins/openclaw-ops-plugin/.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"name": "openclaw-ops-plugin",
|
||||
"description": "Plugin for openclaw-ops",
|
||||
"version": "1.0.0",
|
||||
"author": {
|
||||
"name": "qiudl"
|
||||
}
|
||||
}
|
||||
797
plugins/openclaw-ops-plugin/skills/SKILL.md
Normal file
797
plugins/openclaw-ops-plugin/skills/SKILL.md
Normal file
@@ -0,0 +1,797 @@
|
||||
# OpenClaw 运维技能
|
||||
|
||||
OpenClaw 容器化部署、运维监控、故障排查完整指南。
|
||||
|
||||
## 目录
|
||||
|
||||
- [服务器部署](#服务器部署)
|
||||
- [容器管理](#容器管理)
|
||||
- [性能优化](#性能优化)
|
||||
- [故障排查](#故障排查)
|
||||
- [最佳实践](#最佳实践)
|
||||
|
||||
---
|
||||
|
||||
## 服务器部署
|
||||
|
||||
### 懒猫算力仓 (lazycat)
|
||||
|
||||
**服务器信息**:
|
||||
- 主机名:haiqing.heiyu.space
|
||||
- SSH 别名:lazycat, lanmao
|
||||
- 用途:OpenClaw 算力服务
|
||||
- 系统:Debian-based Linux
|
||||
- 容器平台:lzc-docker
|
||||
|
||||
**OpenClaw 容器信息**:
|
||||
- 容器 ID:5f3bf33e090b
|
||||
- 镜像:registry.lazycat.cloud/openclaw:1.1.5
|
||||
- OpenClaw 版本:2026.2.9
|
||||
- 容器名:iamxiaoelzcappopenclaw-openclaw-1
|
||||
|
||||
**访问方式**:
|
||||
```bash
|
||||
# SSH 连接
|
||||
ssh lazycat
|
||||
|
||||
# 进入容器
|
||||
ssh lazycat "lzc-docker exec -it 5f3bf33e090b bash"
|
||||
|
||||
# 启动 OpenClaw TUI
|
||||
openclaw-tui # 使用本地快捷脚本
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 容器管理
|
||||
|
||||
### 快捷访问脚本
|
||||
|
||||
**~/bin/openclaw-tui**:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# OpenClaw TUI 快捷访问脚本(自动启动 Gateway)
|
||||
|
||||
set -e
|
||||
|
||||
echo "🦞 连接到龙虾服务器 (懒猫)..."
|
||||
echo ""
|
||||
|
||||
# 检查并启动 Gateway
|
||||
echo "检查 OpenClaw Gateway 状态..."
|
||||
GATEWAY_STATUS=$(ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status" 2>/dev/null | grep "RPC probe" || echo "failed")
|
||||
|
||||
if echo "$GATEWAY_STATUS" | grep -q "ok"; then
|
||||
echo "✅ Gateway 已运行"
|
||||
else
|
||||
echo "🔧 启动 Gateway..."
|
||||
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'nohup openclaw gateway run > /tmp/gateway.log 2>&1 &'" 2>/dev/null
|
||||
sleep 2
|
||||
echo "✅ Gateway 已启动"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "启动 OpenClaw TUI..."
|
||||
|
||||
# SSH 到懒猫服务器,然后进入 Docker 容器并启动 OpenClaw TUI
|
||||
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b bash -c 'openclaw tui'"
|
||||
```
|
||||
|
||||
### 容器操作命令
|
||||
|
||||
```bash
|
||||
# 查看容器状态
|
||||
ssh lazycat "lzc-docker ps | grep openclaw"
|
||||
|
||||
# 查看容器日志
|
||||
ssh lazycat "lzc-docker logs -f 5f3bf33e090b --tail 100"
|
||||
|
||||
# 重启容器
|
||||
ssh lazycat "lzc-docker restart 5f3bf33e090b"
|
||||
|
||||
# 查看容器资源使用
|
||||
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
|
||||
|
||||
# 进入容器 shell
|
||||
ssh lazycat "lzc-docker exec -it 5f3bf33e090b bash"
|
||||
```
|
||||
|
||||
### Gateway 管理
|
||||
|
||||
```bash
|
||||
# 检查 Gateway 状态
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status"
|
||||
|
||||
# 启动 Gateway(前台)
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway run"
|
||||
|
||||
# 启动 Gateway(后台)
|
||||
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'nohup openclaw gateway run > /tmp/gateway.log 2>&1 &'"
|
||||
|
||||
# 停止 Gateway
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -f 'openclaw-gateway'"
|
||||
|
||||
# 查看 Gateway 日志
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -f /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 性能优化
|
||||
|
||||
### 资源配置
|
||||
|
||||
**当前配置**(接近系统上限,确保充足性能):
|
||||
- 内存限制:30GB(系统 97%)
|
||||
- 内存+交换:32GB
|
||||
- CPU 限制:8.0 核心(系统 100%)
|
||||
- 进程限制:10,000 个
|
||||
|
||||
```bash
|
||||
# 查看资源限制配置
|
||||
ssh lazycat "lzc-docker inspect 5f3bf33e090b --format='
|
||||
内存限制: {{.HostConfig.Memory}} bytes
|
||||
内存+交换: {{.HostConfig.MemorySwap}} bytes
|
||||
CPU配额: {{.HostConfig.CpuQuota}}
|
||||
CPU周期: {{.HostConfig.CpuPeriod}}
|
||||
PID限制: {{.HostConfig.PidsLimit}}
|
||||
'"
|
||||
|
||||
# 查看实时资源使用
|
||||
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
|
||||
```
|
||||
|
||||
**配置说明**:
|
||||
- 懒猫算力仓的主要职责是提供 OpenClaw 服务
|
||||
- 资源限制设置为接近系统上限,确保有充足资源运行
|
||||
- 同时提供基本的失控保护机制
|
||||
|
||||
### 自动化优化措施
|
||||
|
||||
#### 1. 定期自动重启(每周日 03:00)
|
||||
|
||||
**目的**:清理累积的僵尸进程,释放资源
|
||||
|
||||
**查看状态**:
|
||||
```bash
|
||||
# 查看定时任务状态
|
||||
ssh lazycat "systemctl status openclaw-restart.timer"
|
||||
|
||||
# 查看重启日志
|
||||
ssh lazycat "tail -50 /var/log/openclaw-restart.log"
|
||||
|
||||
# 手动执行重启
|
||||
ssh lazycat "/root/restart-openclaw.sh"
|
||||
```
|
||||
|
||||
**配置文件**:
|
||||
- Service: `/etc/systemd/system/openclaw-restart.service`
|
||||
- Timer: `/etc/systemd/system/openclaw-restart.timer`
|
||||
- 脚本: `/root/restart-openclaw.sh`
|
||||
|
||||
**重启脚本** (`/root/restart-openclaw.sh`):
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# OpenClaw 容器定期重启脚本
|
||||
# 每周日凌晨3点执行
|
||||
|
||||
LOG_FILE='/var/log/openclaw-restart.log'
|
||||
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 开始重启 OpenClaw 容器" >> $LOG_FILE
|
||||
|
||||
# 重启容器
|
||||
/lzcsys/bin/lzc-docker restart 5f3bf33e090b >> $LOG_FILE 2>&1
|
||||
|
||||
# 等待容器启动
|
||||
sleep 10
|
||||
|
||||
# 检查健康状态
|
||||
STATUS=$(/lzcsys/bin/lzc-docker inspect -f '{{.State.Health.Status}}' 5f3bf33e090b 2>/dev/null || echo 'unknown')
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 重启完成,健康状态: $STATUS" >> $LOG_FILE
|
||||
|
||||
# 检查僵尸进程
|
||||
ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l)
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 当前僵尸进程数: $ZOMBIE_COUNT" >> $LOG_FILE
|
||||
echo "----------------------------------------" >> $LOG_FILE
|
||||
```
|
||||
|
||||
#### 2. 僵尸进程自动监控(每小时检查)
|
||||
|
||||
**目的**:监控僵尸进程数量,超过阈值自动重启容器
|
||||
|
||||
**查看状态**:
|
||||
```bash
|
||||
# 查看监控状态
|
||||
ssh lazycat "systemctl status openclaw-zombie-monitor.timer"
|
||||
|
||||
# 查看监控日志
|
||||
ssh lazycat "tail -50 /var/log/openclaw-zombie-monitor.log"
|
||||
|
||||
# 手动检查僵尸进程
|
||||
ssh lazycat "/root/monitor-openclaw-zombies.sh"
|
||||
|
||||
# 直接查看僵尸进程数
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l"
|
||||
```
|
||||
|
||||
**监控参数**:
|
||||
- 检查频率:每小时
|
||||
- 触发阈值:50 个僵尸进程
|
||||
- 自动操作:重启容器
|
||||
|
||||
**监控脚本** (`/root/monitor-openclaw-zombies.sh`):
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# OpenClaw 僵尸进程监控脚本
|
||||
# 当僵尸进程超过50个时自动重启容器
|
||||
|
||||
ZOMBIE_THRESHOLD=50
|
||||
CONTAINER_ID='5f3bf33e090b'
|
||||
LOG_FILE='/var/log/openclaw-zombie-monitor.log'
|
||||
|
||||
# 检查僵尸进程数量
|
||||
ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec $CONTAINER_ID ps aux 2>/dev/null | grep -c 'Z' || echo '0')
|
||||
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 僵尸进程数: $ZOMBIE_COUNT" >> $LOG_FILE
|
||||
|
||||
if [ $ZOMBIE_COUNT -gt $ZOMBIE_THRESHOLD ]; then
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ⚠️ 僵尸进程超过阈值($ZOMBIE_THRESHOLD),执行自动重启" >> $LOG_FILE
|
||||
|
||||
# 重启容器
|
||||
/lzcsys/bin/lzc-docker restart $CONTAINER_ID >> $LOG_FILE 2>&1
|
||||
|
||||
# 等待容器启动
|
||||
sleep 10
|
||||
|
||||
# 再次检查
|
||||
NEW_ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec $CONTAINER_ID ps aux 2>/dev/null | grep -c 'Z' || echo '0')
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 重启后僵尸进程数: $NEW_ZOMBIE_COUNT" >> $LOG_FILE
|
||||
echo "----------------------------------------" >> $LOG_FILE
|
||||
fi
|
||||
```
|
||||
|
||||
#### 3. 全面健康检查
|
||||
|
||||
```bash
|
||||
# 一键健康检查脚本
|
||||
ssh lazycat "
|
||||
echo '=== 系统负载 ===' && uptime &&
|
||||
echo '' && echo '=== 僵尸进程 ===' &&
|
||||
lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l &&
|
||||
echo '' && echo '=== 容器资源 ===' &&
|
||||
lzc-docker stats --no-stream 5f3bf33e090b &&
|
||||
echo '' && echo '=== Gateway 状态 ===' &&
|
||||
lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe' &&
|
||||
echo '' && echo '=== 容器健康 ===' &&
|
||||
lzc-docker inspect 5f3bf33e090b --format='Status: {{.State.Status}}, Health: {{.State.Health.Status}}'
|
||||
"
|
||||
|
||||
# 查看所有定时任务
|
||||
ssh lazycat "systemctl list-timers | grep openclaw"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 故障排查
|
||||
|
||||
### Tower 反复崩溃(已修复 2026-02-16)
|
||||
|
||||
**现象**:
|
||||
- Tower 日志显示反复崩溃:`[tower] OpenClaw crashed: exit status 1`
|
||||
- Gateway 启动失败:`gateway already running (pid xxx); lock timeout`
|
||||
- 僵尸 Gateway 进程堆积,无法回收
|
||||
- 日志中出现多个僵尸进程:`[openclaw-gatewa] <defunct>`
|
||||
|
||||
**典型错误日志**:
|
||||
```
|
||||
[22:19:39] [tower] OpenClaw crashed: exit status 1
|
||||
[22:24:52] [tower] OpenClaw crashed: signal: killed
|
||||
[22:27:33] Gateway failed to start: gateway already running (pid 2005)
|
||||
[22:27:33] If the gateway is supervised, stop it with: openclaw gateway stop
|
||||
```
|
||||
|
||||
**根本原因**:
|
||||
- Tower 作为容器 PID 1 进程,不是专业的 init 进程
|
||||
- 缺少子进程回收(reaping)机制,导致僵尸进程未被清理
|
||||
- 僵尸进程占用锁文件和端口(18789),阻塞新 Gateway 启动
|
||||
- 容器 PID 1 是 `/usr/local/bin/tower`,没有僵尸进程回收能力
|
||||
|
||||
**诊断命令**:
|
||||
```bash
|
||||
# 查看 PID 1 进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
|
||||
|
||||
# 查看僵尸进程详情
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'defunct'"
|
||||
|
||||
# 检查端口占用
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
|
||||
|
||||
# 查看进程树
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps auxf | head -30"
|
||||
```
|
||||
|
||||
**永久解决方案(已实施)**:
|
||||
|
||||
使用 **tini** 作为容器 PID 1,自动回收僵尸进程。
|
||||
|
||||
```bash
|
||||
# 1. 在容器中安装 tini(专业 init 进程)
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'apt-get update -qq && apt-get install -y tini'"
|
||||
|
||||
# 2. 修改 entrypoint 使用 tini 包装 tower
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b sed -i 's|exec /usr/local/bin/tower|exec /usr/bin/tini -- /usr/local/bin/tower|g' /usr/local/bin/clawdbot-entrypoint.sh"
|
||||
|
||||
# 3. 验证修改
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b grep 'exec.*tower' /usr/local/bin/clawdbot-entrypoint.sh"
|
||||
# 应该看到: exec /usr/bin/tini -- /usr/local/bin/tower ...
|
||||
|
||||
# 4. 重启容器使修改生效
|
||||
ssh lazycat "lzc-docker restart 5f3bf33e090b"
|
||||
|
||||
# 5. 验证 tini 已成为 PID 1
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
|
||||
# 输出应显示: PID 1 -> /usr/bin/tini -- /usr/local/bin/tower ...
|
||||
|
||||
# 6. 检查进程树
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps auxf | head -15"
|
||||
```
|
||||
|
||||
**修复后的进程架构**:
|
||||
```
|
||||
PID 1: /usr/bin/tini (专业 init 进程,自动回收僵尸进程)
|
||||
└─ PID 58: tower
|
||||
└─ PID 64: openclaw
|
||||
└─ PID 72: openclaw-gateway
|
||||
```
|
||||
|
||||
**修复效果**:
|
||||
- ✅ Tini 作为 PID 1,自动回收所有僵尸进程
|
||||
- ✅ 僵尸进程数量从 5+ 个降至 1-2 个(健康水平)
|
||||
- ✅ Tower 稳定运行,不再反复崩溃
|
||||
- ✅ Gateway 启动正常,无锁文件冲突
|
||||
- ✅ RPC probe 持续显示 ok
|
||||
|
||||
**注意事项**:
|
||||
- ⚠️ 当前修改在运行容器内,**容器重建后需重新应用**
|
||||
- 💡 建议向镜像维护者(懒猫云)提交 PR,在 Dockerfile 中添加 tini
|
||||
- 📌 每次从镜像重新创建容器时,需要重新执行上述步骤 1-4
|
||||
|
||||
**镜像级永久修复**(建议提交给懒猫云):
|
||||
|
||||
在 OpenClaw 镜像的 Dockerfile 中添加:
|
||||
```dockerfile
|
||||
# 安装 tini
|
||||
RUN apt-get update && apt-get install -y tini && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# 或使用更轻量的安装方式
|
||||
ADD https://github.com/krallin/tini/releases/download/v0.19.0/tini /usr/bin/tini
|
||||
RUN chmod +x /usr/bin/tini
|
||||
|
||||
# 在 entrypoint 脚本中使用 tini 包装(已在当前镜像的 entrypoint 中修改)
|
||||
```
|
||||
|
||||
### 僵尸进程过多
|
||||
|
||||
**现象**:
|
||||
- 僵尸进程数超过 50 个
|
||||
- Gateway 响应变慢
|
||||
- 容器内存占用升高
|
||||
|
||||
**诊断**:
|
||||
```bash
|
||||
# 查看僵尸进程详情
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'Z'"
|
||||
|
||||
# 统计僵尸进程数量
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
|
||||
|
||||
# 查看僵尸进程父进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -eo pid,ppid,stat,comm | grep 'Z'"
|
||||
```
|
||||
|
||||
**解决方案**:
|
||||
```bash
|
||||
# 方案 1:重启容器(推荐)
|
||||
ssh lazycat "lzc-docker restart 5f3bf33e090b"
|
||||
|
||||
# 方案 2:手动清理 Gateway 进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-gateway'"
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway run &"
|
||||
|
||||
# 验证清理效果
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
|
||||
```
|
||||
|
||||
### Gateway 无响应
|
||||
|
||||
**现象**:
|
||||
- `RPC probe: failed` 或超时
|
||||
- TUI 连接失败:`gateway not connected`
|
||||
- Dashboard 无法访问
|
||||
|
||||
**诊断**:
|
||||
```bash
|
||||
# 检查 Gateway 进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep gateway"
|
||||
|
||||
# 检查端口监听
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
|
||||
|
||||
# 查看 Gateway 日志
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -100 /tmp/openclaw/openclaw-*.log"
|
||||
|
||||
# 测试本地连接
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b curl -I http://127.0.0.1:18789"
|
||||
```
|
||||
|
||||
**解决方案**:
|
||||
```bash
|
||||
# 1. 杀死所有 Gateway 进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-gateway'"
|
||||
|
||||
# 2. 启动新的 Gateway
|
||||
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'openclaw gateway run > /tmp/gateway.log 2>&1 &'"
|
||||
|
||||
# 3. 等待启动
|
||||
sleep 5
|
||||
|
||||
# 4. 验证状态
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status"
|
||||
```
|
||||
|
||||
### 多个 OpenClaw TUI 实例运行(已修复 2026-02-16)
|
||||
|
||||
**现象**:
|
||||
- 每次启动 OpenClaw TUI 前需要 `pkill -9 openclaw`
|
||||
- 启动失败或端口冲突
|
||||
- 多个 `openclaw-tui` 进程在后台运行
|
||||
- 容器资源占用异常高
|
||||
|
||||
**诊断**:
|
||||
```bash
|
||||
# 检查运行中的 OpenClaw 进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep openclaw"
|
||||
|
||||
# 通常会看到多个 openclaw-tui 实例:
|
||||
# PID 3041 - openclaw-tui (pts/0)
|
||||
# PID 6338 - openclaw-tui (pts/1)
|
||||
# PID 7223 - openclaw-tui (pts/2)
|
||||
|
||||
# 检查端口占用
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
|
||||
```
|
||||
|
||||
**根本原因**:
|
||||
- 每次运行 `openclaw tui` 都启动新进程
|
||||
- 退出 TUI 时进程没有完全清理
|
||||
- 多个实例同时运行导致资源竞争
|
||||
|
||||
**永久解决方案(已实施)**:
|
||||
|
||||
**1. 创建自动清理脚本**(容器中):
|
||||
|
||||
```bash
|
||||
# 在容器中创建 /usr/local/bin/openclaw-clean
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c \"cat > /usr/local/bin/openclaw-clean << 'EOF'
|
||||
#!/bin/bash
|
||||
# OpenClaw 清理并重启脚本
|
||||
|
||||
# 清理所有非 Tower 管理的 openclaw 进程
|
||||
echo '🧹 清理旧的 OpenClaw 进程...'
|
||||
pkill -9 -f 'openclaw-tui' || true
|
||||
pkill -9 -f 'openclaw tui' || true
|
||||
|
||||
# 等待进程完全退出
|
||||
sleep 1
|
||||
|
||||
# 检查剩余进程
|
||||
REMAINING=\\\$(ps aux | grep -E 'openclaw' | grep -v 'openclaw-gateway' | grep -v 'tower' | grep -v 'grep' | wc -l)
|
||||
if [ \\\$REMAINING -gt 0 ]; then
|
||||
echo '⚠️ 警告:还有 '\\\$REMAINING' 个 openclaw 进程'
|
||||
else
|
||||
echo '✅ 清理完成'
|
||||
fi
|
||||
|
||||
# 启动 OpenClaw TUI
|
||||
echo ''
|
||||
echo '🦞 启动 OpenClaw TUI...'
|
||||
exec openclaw tui
|
||||
EOF
|
||||
chmod +x /usr/local/bin/openclaw-clean\""
|
||||
```
|
||||
|
||||
**2. 更新本地 openclaw-tui 脚本**:
|
||||
|
||||
修改 `~/bin/openclaw-tui` 的最后一行:
|
||||
|
||||
```bash
|
||||
# 修改前
|
||||
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b bash -c 'openclaw tui'"
|
||||
|
||||
# 修改后
|
||||
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b openclaw-clean"
|
||||
```
|
||||
|
||||
**修复效果**:
|
||||
- ✅ 每次启动自动清理旧进程
|
||||
- ✅ 不再需要手动 `pkill -9 openclaw`
|
||||
- ✅ 避免多实例导致的资源浪费
|
||||
- ✅ 一条命令 `openclaw-tui` 搞定所有
|
||||
|
||||
**使用方法**:
|
||||
|
||||
```bash
|
||||
# 以前(需要手动清理)
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'pkill -9 openclaw && openclaw tui'"
|
||||
|
||||
# 现在(自动清理)
|
||||
openclaw-tui # 一条命令搞定!
|
||||
```
|
||||
|
||||
**手动清理**(如果需要):
|
||||
|
||||
```bash
|
||||
# 清理所有 openclaw-tui 进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-tui'"
|
||||
|
||||
# 验证清理结果
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep openclaw | grep -v tower | grep -v openclaw-gateway"
|
||||
```
|
||||
|
||||
### 容器内存不足
|
||||
|
||||
**现象**:
|
||||
- 容器内存使用率超过 90%
|
||||
- OOM (Out of Memory) 错误
|
||||
- 进程被 killed
|
||||
|
||||
**诊断**:
|
||||
```bash
|
||||
# 检查内存使用
|
||||
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
|
||||
|
||||
# 查看内存限制
|
||||
ssh lazycat "lzc-docker inspect 5f3bf33e090b --format='{{.HostConfig.Memory}}'"
|
||||
|
||||
# 查看系统总内存
|
||||
ssh lazycat "free -h"
|
||||
```
|
||||
|
||||
**解决方案**:
|
||||
```bash
|
||||
# 调整内存限制(如果当前限制过低)
|
||||
# 注意:懒猫算力仓已设置为 30GB,一般不需要调整
|
||||
|
||||
# 如确需调整,使用以下命令
|
||||
ssh lazycat "lzc-docker update 5f3bf33e090b --memory=30g --memory-swap=32g"
|
||||
|
||||
# 重启容器使配置生效
|
||||
ssh lazycat "lzc-docker restart 5f3bf33e090b"
|
||||
```
|
||||
|
||||
### 自动重启失败
|
||||
|
||||
**现象**:
|
||||
- systemd timer 未触发
|
||||
- 重启脚本执行失败
|
||||
- 日志显示 `lzc-docker: command not found`
|
||||
|
||||
**诊断**:
|
||||
```bash
|
||||
# 检查 timer 状态
|
||||
ssh lazycat "systemctl status openclaw-restart.timer"
|
||||
|
||||
# 检查 service 状态
|
||||
ssh lazycat "systemctl status openclaw-restart.service"
|
||||
|
||||
# 查看 service 日志
|
||||
ssh lazycat "journalctl -u openclaw-restart.service -n 50"
|
||||
|
||||
# 查看脚本日志
|
||||
ssh lazycat "tail -50 /var/log/openclaw-restart.log"
|
||||
|
||||
# 手动测试脚本
|
||||
ssh lazycat "bash -x /root/restart-openclaw.sh"
|
||||
```
|
||||
|
||||
**解决方案**:
|
||||
|
||||
问题通常是脚本中 `lzc-docker` 命令找不到(PATH 问题)。
|
||||
|
||||
```bash
|
||||
# 确认 lzc-docker 路径
|
||||
ssh lazycat "which lzc-docker"
|
||||
# 输出: /lzcsys/bin/lzc-docker
|
||||
|
||||
# 确保脚本使用完整路径
|
||||
ssh lazycat "grep 'lzc-docker' /root/restart-openclaw.sh"
|
||||
# 应该看到: /lzcsys/bin/lzc-docker
|
||||
|
||||
# 如果使用的是相对路径,需要修改
|
||||
ssh lazycat "sed -i 's|lzc-docker|/lzcsys/bin/lzc-docker|g' /root/restart-openclaw.sh"
|
||||
ssh lazycat "sed -i 's|lzc-docker|/lzcsys/bin/lzc-docker|g' /root/monitor-openclaw-zombies.sh"
|
||||
|
||||
# 重新加载 systemd 配置
|
||||
ssh lazycat "systemctl daemon-reload"
|
||||
|
||||
# 测试执行
|
||||
ssh lazycat "/root/restart-openclaw.sh"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 最佳实践
|
||||
|
||||
### 1. 定期健康检查
|
||||
|
||||
建议每天执行一次全面健康检查:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# OpenClaw 健康检查脚本
|
||||
|
||||
echo "🔍 OpenClaw 健康检查 - $(date)"
|
||||
echo "================================"
|
||||
|
||||
# 容器状态
|
||||
echo -e "\n📦 容器状态:"
|
||||
ssh lazycat "lzc-docker ps --filter id=5f3bf33e090b --format 'Status: {{.Status}}'"
|
||||
|
||||
# PID 1 进程
|
||||
echo -e "\n🏗️ PID 1 进程:"
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
|
||||
|
||||
# 僵尸进程数
|
||||
echo -e "\n👻 僵尸进程:"
|
||||
ZOMBIE_COUNT=$(ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'")
|
||||
echo "僵尸进程数: $ZOMBIE_COUNT"
|
||||
if [ $ZOMBIE_COUNT -gt 10 ]; then
|
||||
echo "⚠️ 警告:僵尸进程较多,建议重启容器"
|
||||
fi
|
||||
|
||||
# 资源使用
|
||||
echo -e "\n💾 资源使用:"
|
||||
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
|
||||
|
||||
# Gateway 状态
|
||||
echo -e "\n🔌 Gateway 状态:"
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
|
||||
|
||||
# 系统负载
|
||||
echo -e "\n📊 系统负载:"
|
||||
ssh lazycat "uptime"
|
||||
|
||||
echo -e "\n================================"
|
||||
echo "✅ 健康检查完成"
|
||||
```
|
||||
|
||||
### 2. 日志管理
|
||||
|
||||
```bash
|
||||
# 查看最近的错误日志
|
||||
ssh lazycat "lzc-docker logs 5f3bf33e090b --since 1h 2>&1 | grep -i error"
|
||||
|
||||
# 查看 Tower 崩溃日志
|
||||
ssh lazycat "lzc-docker logs 5f3bf33e090b 2>&1 | grep -i 'crashed\|failed'"
|
||||
|
||||
# 查看 OpenClaw 应用日志
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -100 /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log"
|
||||
|
||||
# 清理旧日志(保留最近7天)
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b find /tmp/openclaw -name '*.log' -mtime +7 -delete"
|
||||
```
|
||||
|
||||
### 3. 备份与恢复
|
||||
|
||||
```bash
|
||||
# 备份 OpenClaw 配置
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b tar czf /tmp/openclaw-config-backup-$(date +%Y%m%d).tar.gz -C /home/node/.openclaw ."
|
||||
|
||||
# 下载备份到本地
|
||||
scp lazycat:/tmp/openclaw-config-backup-*.tar.gz ~/backups/
|
||||
|
||||
# 恢复配置
|
||||
scp ~/backups/openclaw-config-backup-*.tar.gz lazycat:/tmp/
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b tar xzf /tmp/openclaw-config-backup-*.tar.gz -C /home/node/.openclaw"
|
||||
ssh lazycat "lzc-docker restart 5f3bf33e090b"
|
||||
```
|
||||
|
||||
### 4. 监控告警
|
||||
|
||||
建议设置以下监控指标:
|
||||
|
||||
- **僵尸进程数** > 50:触发告警,自动重启(已实现)
|
||||
- **内存使用率** > 90%:触发告警
|
||||
- **Gateway 离线时间** > 5分钟:触发告警
|
||||
- **容器重启次数** > 3次/天:触发告警
|
||||
|
||||
### 5. 容器重建后的恢复清单
|
||||
|
||||
如果容器被重新创建(从镜像),需要重新应用以下修复:
|
||||
|
||||
```bash
|
||||
# 1. 安装 tini
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'apt-get update -qq && apt-get install -y tini'"
|
||||
|
||||
# 2. 修改 entrypoint
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b sed -i 's|exec /usr/local/bin/tower|exec /usr/bin/tini -- /usr/local/bin/tower|g' /usr/local/bin/clawdbot-entrypoint.sh"
|
||||
|
||||
# 3. 重启容器
|
||||
ssh lazycat "lzc-docker restart 5f3bf33e090b"
|
||||
|
||||
# 4. 验证
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o cmd | grep tini"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 附录
|
||||
|
||||
### 相关文档
|
||||
|
||||
- OpenClaw 官方文档:https://docs.openclaw.ai/
|
||||
- 故障排查指南:https://docs.openclaw.ai/troubleshooting
|
||||
- Tini 项目:https://github.com/krallin/tini
|
||||
|
||||
### 联系信息
|
||||
|
||||
- 懒猫云支持:support@lazycat.cloud
|
||||
- OpenClaw 社区:https://community.openclaw.ai/
|
||||
|
||||
### 版本历史
|
||||
|
||||
- 2026-02-16:
|
||||
- 创建文档,记录 Tower 崩溃修复经验(使用 tini)
|
||||
- 添加多 TUI 实例问题和 openclaw-clean 解决方案
|
||||
- 2026-02-15:实施僵尸进程监控和自动重启
|
||||
- 2026-02-14:调整容器资源限制为接近系统上限
|
||||
|
||||
---
|
||||
|
||||
## 快速参考
|
||||
|
||||
### 常用命令速查
|
||||
|
||||
```bash
|
||||
# 连接 OpenClaw TUI
|
||||
openclaw-tui
|
||||
|
||||
# 查看容器状态
|
||||
ssh lazycat "lzc-docker ps | grep openclaw"
|
||||
|
||||
# 重启容器
|
||||
ssh lazycat "lzc-docker restart 5f3bf33e090b"
|
||||
|
||||
# 查看僵尸进程数
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
|
||||
|
||||
# 检查 Gateway 状态
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
|
||||
|
||||
# 查看资源使用
|
||||
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
|
||||
|
||||
# 查看定时任务
|
||||
ssh lazycat "systemctl list-timers | grep openclaw"
|
||||
|
||||
# 清理多余的 OpenClaw TUI 进程
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-tui'"
|
||||
|
||||
# 启动 OpenClaw(自动清理旧进程)
|
||||
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw-clean"
|
||||
|
||||
# 全面健康检查
|
||||
ssh lazycat "echo '=== 容器 ===' && lzc-docker ps | grep openclaw && echo '' && echo '=== 僵尸进程 ===' && lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z' && echo '' && echo '=== Gateway ===' && lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
|
||||
```
|
||||
|
||||
### 故障处理速查
|
||||
|
||||
| 问题 | 快速解决 |
|
||||
|------|----------|
|
||||
| Tower 反复崩溃 | 参考"Tower 反复崩溃"章节,安装 tini |
|
||||
| 多个 TUI 实例 | 使用 `openclaw-tui`(自动清理)或手动 `pkill -9 -f openclaw-tui` |
|
||||
| Gateway 无响应 | `ssh lazycat "lzc-docker restart 5f3bf33e090b"` |
|
||||
| 僵尸进程过多 | `ssh lazycat "lzc-docker restart 5f3bf33e090b"` |
|
||||
| 内存不足 | 检查资源限制,重启容器 |
|
||||
| 自动重启失败 | 检查脚本是否使用完整路径 `/lzcsys/bin/lzc-docker` |
|
||||
Reference in New Issue
Block a user