Files
ai-proj-helper/plugins/openclaw-ops-plugin/skills/SKILL.md

798 lines
23 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OpenClaw 运维技能
OpenClaw 容器化部署、运维监控、故障排查完整指南。
## 目录
- [服务器部署](#服务器部署)
- [容器管理](#容器管理)
- [性能优化](#性能优化)
- [故障排查](#故障排查)
- [最佳实践](#最佳实践)
---
## 服务器部署
### 懒猫算力仓 (lazycat)
**服务器信息**
- 主机名haiqing.heiyu.space
- SSH 别名lazycat, lanmao
- 用途OpenClaw 算力服务
- 系统Debian-based Linux
- 容器平台lzc-docker
**OpenClaw 容器信息**
- 容器 ID5f3bf33e090b
- 镜像registry.lazycat.cloud/openclaw:1.1.5
- OpenClaw 版本2026.2.9
- 容器名iamxiaoelzcappopenclaw-openclaw-1
**访问方式**
```bash
# SSH 连接
ssh lazycat
# 进入容器
ssh lazycat "lzc-docker exec -it 5f3bf33e090b bash"
# 启动 OpenClaw TUI
openclaw-tui # 使用本地快捷脚本
```
---
## 容器管理
### 快捷访问脚本
**~/bin/openclaw-tui**
```bash
#!/bin/bash
# OpenClaw TUI 快捷访问脚本(自动启动 Gateway
set -e
echo "🦞 连接到龙虾服务器 (懒猫)..."
echo ""
# 检查并启动 Gateway
echo "检查 OpenClaw Gateway 状态..."
GATEWAY_STATUS=$(ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status" 2>/dev/null | grep "RPC probe" || echo "failed")
if echo "$GATEWAY_STATUS" | grep -q "ok"; then
echo "✅ Gateway 已运行"
else
echo "🔧 启动 Gateway..."
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'nohup openclaw gateway run > /tmp/gateway.log 2>&1 &'" 2>/dev/null
sleep 2
echo "✅ Gateway 已启动"
fi
echo ""
echo "启动 OpenClaw TUI..."
# SSH 到懒猫服务器,然后进入 Docker 容器并启动 OpenClaw TUI
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b bash -c 'openclaw tui'"
```
### 容器操作命令
```bash
# 查看容器状态
ssh lazycat "lzc-docker ps | grep openclaw"
# 查看容器日志
ssh lazycat "lzc-docker logs -f 5f3bf33e090b --tail 100"
# 重启容器
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 查看容器资源使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# 进入容器 shell
ssh lazycat "lzc-docker exec -it 5f3bf33e090b bash"
```
### Gateway 管理
```bash
# 检查 Gateway 状态
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status"
# 启动 Gateway前台
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway run"
# 启动 Gateway后台
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'nohup openclaw gateway run > /tmp/gateway.log 2>&1 &'"
# 停止 Gateway
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -f 'openclaw-gateway'"
# 查看 Gateway 日志
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -f /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log"
```
---
## 性能优化
### 资源配置
**当前配置**(接近系统上限,确保充足性能):
- 内存限制30GB系统 97%
- 内存+交换32GB
- CPU 限制8.0 核心(系统 100%
- 进程限制10,000 个
```bash
# 查看资源限制配置
ssh lazycat "lzc-docker inspect 5f3bf33e090b --format='
内存限制: {{.HostConfig.Memory}} bytes
内存+交换: {{.HostConfig.MemorySwap}} bytes
CPU配额: {{.HostConfig.CpuQuota}}
CPU周期: {{.HostConfig.CpuPeriod}}
PID限制: {{.HostConfig.PidsLimit}}
'"
# 查看实时资源使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
```
**配置说明**
- 懒猫算力仓的主要职责是提供 OpenClaw 服务
- 资源限制设置为接近系统上限,确保有充足资源运行
- 同时提供基本的失控保护机制
### 自动化优化措施
#### 1. 定期自动重启(每周日 03:00
**目的**:清理累积的僵尸进程,释放资源
**查看状态**
```bash
# 查看定时任务状态
ssh lazycat "systemctl status openclaw-restart.timer"
# 查看重启日志
ssh lazycat "tail -50 /var/log/openclaw-restart.log"
# 手动执行重启
ssh lazycat "/root/restart-openclaw.sh"
```
**配置文件**
- Service: `/etc/systemd/system/openclaw-restart.service`
- Timer: `/etc/systemd/system/openclaw-restart.timer`
- 脚本: `/root/restart-openclaw.sh`
**重启脚本** (`/root/restart-openclaw.sh`)
```bash
#!/bin/bash
# OpenClaw 容器定期重启脚本
# 每周日凌晨3点执行
LOG_FILE='/var/log/openclaw-restart.log'
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 开始重启 OpenClaw 容器" >> $LOG_FILE
# 重启容器
/lzcsys/bin/lzc-docker restart 5f3bf33e090b >> $LOG_FILE 2>&1
# 等待容器启动
sleep 10
# 检查健康状态
STATUS=$(/lzcsys/bin/lzc-docker inspect -f '{{.State.Health.Status}}' 5f3bf33e090b 2>/dev/null || echo 'unknown')
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 重启完成,健康状态: $STATUS" >> $LOG_FILE
# 检查僵尸进程
ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l)
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 当前僵尸进程数: $ZOMBIE_COUNT" >> $LOG_FILE
echo "----------------------------------------" >> $LOG_FILE
```
#### 2. 僵尸进程自动监控(每小时检查)
**目的**:监控僵尸进程数量,超过阈值自动重启容器
**查看状态**
```bash
# 查看监控状态
ssh lazycat "systemctl status openclaw-zombie-monitor.timer"
# 查看监控日志
ssh lazycat "tail -50 /var/log/openclaw-zombie-monitor.log"
# 手动检查僵尸进程
ssh lazycat "/root/monitor-openclaw-zombies.sh"
# 直接查看僵尸进程数
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l"
```
**监控参数**
- 检查频率:每小时
- 触发阈值50 个僵尸进程
- 自动操作:重启容器
**监控脚本** (`/root/monitor-openclaw-zombies.sh`)
```bash
#!/bin/bash
# OpenClaw 僵尸进程监控脚本
# 当僵尸进程超过50个时自动重启容器
ZOMBIE_THRESHOLD=50
CONTAINER_ID='5f3bf33e090b'
LOG_FILE='/var/log/openclaw-zombie-monitor.log'
# 检查僵尸进程数量
ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec $CONTAINER_ID ps aux 2>/dev/null | grep -c 'Z' || echo '0')
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 僵尸进程数: $ZOMBIE_COUNT" >> $LOG_FILE
if [ $ZOMBIE_COUNT -gt $ZOMBIE_THRESHOLD ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ⚠️ 僵尸进程超过阈值($ZOMBIE_THRESHOLD),执行自动重启" >> $LOG_FILE
# 重启容器
/lzcsys/bin/lzc-docker restart $CONTAINER_ID >> $LOG_FILE 2>&1
# 等待容器启动
sleep 10
# 再次检查
NEW_ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec $CONTAINER_ID ps aux 2>/dev/null | grep -c 'Z' || echo '0')
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 重启后僵尸进程数: $NEW_ZOMBIE_COUNT" >> $LOG_FILE
echo "----------------------------------------" >> $LOG_FILE
fi
```
#### 3. 全面健康检查
```bash
# 一键健康检查脚本
ssh lazycat "
echo '=== 系统负载 ===' && uptime &&
echo '' && echo '=== 僵尸进程 ===' &&
lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l &&
echo '' && echo '=== 容器资源 ===' &&
lzc-docker stats --no-stream 5f3bf33e090b &&
echo '' && echo '=== Gateway 状态 ===' &&
lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe' &&
echo '' && echo '=== 容器健康 ===' &&
lzc-docker inspect 5f3bf33e090b --format='Status: {{.State.Status}}, Health: {{.State.Health.Status}}'
"
# 查看所有定时任务
ssh lazycat "systemctl list-timers | grep openclaw"
```
---
## 故障排查
### Tower 反复崩溃(已修复 2026-02-16
**现象**
- Tower 日志显示反复崩溃:`[tower] OpenClaw crashed: exit status 1`
- Gateway 启动失败:`gateway already running (pid xxx); lock timeout`
- 僵尸 Gateway 进程堆积,无法回收
- 日志中出现多个僵尸进程:`[openclaw-gatewa] <defunct>`
**典型错误日志**
```
[22:19:39] [tower] OpenClaw crashed: exit status 1
[22:24:52] [tower] OpenClaw crashed: signal: killed
[22:27:33] Gateway failed to start: gateway already running (pid 2005)
[22:27:33] If the gateway is supervised, stop it with: openclaw gateway stop
```
**根本原因**
- Tower 作为容器 PID 1 进程,不是专业的 init 进程
- 缺少子进程回收reaping机制导致僵尸进程未被清理
- 僵尸进程占用锁文件和端口18789阻塞新 Gateway 启动
- 容器 PID 1 是 `/usr/local/bin/tower`,没有僵尸进程回收能力
**诊断命令**
```bash
# 查看 PID 1 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
# 查看僵尸进程详情
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'defunct'"
# 检查端口占用
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
# 查看进程树
ssh lazycat "lzc-docker exec 5f3bf33e090b ps auxf | head -30"
```
**永久解决方案(已实施)**
使用 **tini** 作为容器 PID 1自动回收僵尸进程。
```bash
# 1. 在容器中安装 tini专业 init 进程)
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'apt-get update -qq && apt-get install -y tini'"
# 2. 修改 entrypoint 使用 tini 包装 tower
ssh lazycat "lzc-docker exec 5f3bf33e090b sed -i 's|exec /usr/local/bin/tower|exec /usr/bin/tini -- /usr/local/bin/tower|g' /usr/local/bin/clawdbot-entrypoint.sh"
# 3. 验证修改
ssh lazycat "lzc-docker exec 5f3bf33e090b grep 'exec.*tower' /usr/local/bin/clawdbot-entrypoint.sh"
# 应该看到: exec /usr/bin/tini -- /usr/local/bin/tower ...
# 4. 重启容器使修改生效
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 5. 验证 tini 已成为 PID 1
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
# 输出应显示: PID 1 -> /usr/bin/tini -- /usr/local/bin/tower ...
# 6. 检查进程树
ssh lazycat "lzc-docker exec 5f3bf33e090b ps auxf | head -15"
```
**修复后的进程架构**
```
PID 1: /usr/bin/tini (专业 init 进程,自动回收僵尸进程)
└─ PID 58: tower
└─ PID 64: openclaw
└─ PID 72: openclaw-gateway
```
**修复效果**
- ✅ Tini 作为 PID 1自动回收所有僵尸进程
- ✅ 僵尸进程数量从 5+ 个降至 1-2 个(健康水平)
- ✅ Tower 稳定运行,不再反复崩溃
- ✅ Gateway 启动正常,无锁文件冲突
- ✅ RPC probe 持续显示 ok
**注意事项**
- ⚠️ 当前修改在运行容器内,**容器重建后需重新应用**
- 💡 建议向镜像维护者(懒猫云)提交 PR在 Dockerfile 中添加 tini
- 📌 每次从镜像重新创建容器时,需要重新执行上述步骤 1-4
**镜像级永久修复**(建议提交给懒猫云):
在 OpenClaw 镜像的 Dockerfile 中添加:
```dockerfile
# 安装 tini
RUN apt-get update && apt-get install -y tini && rm -rf /var/lib/apt/lists/*
# 或使用更轻量的安装方式
ADD https://github.com/krallin/tini/releases/download/v0.19.0/tini /usr/bin/tini
RUN chmod +x /usr/bin/tini
# 在 entrypoint 脚本中使用 tini 包装(已在当前镜像的 entrypoint 中修改)
```
### 僵尸进程过多
**现象**
- 僵尸进程数超过 50 个
- Gateway 响应变慢
- 容器内存占用升高
**诊断**
```bash
# 查看僵尸进程详情
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'Z'"
# 统计僵尸进程数量
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
# 查看僵尸进程父进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -eo pid,ppid,stat,comm | grep 'Z'"
```
**解决方案**
```bash
# 方案 1重启容器推荐
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 方案 2手动清理 Gateway 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-gateway'"
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway run &"
# 验证清理效果
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
```
### Gateway 无响应
**现象**
- `RPC probe: failed` 或超时
- TUI 连接失败:`gateway not connected`
- Dashboard 无法访问
**诊断**
```bash
# 检查 Gateway 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep gateway"
# 检查端口监听
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
# 查看 Gateway 日志
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -100 /tmp/openclaw/openclaw-*.log"
# 测试本地连接
ssh lazycat "lzc-docker exec 5f3bf33e090b curl -I http://127.0.0.1:18789"
```
**解决方案**
```bash
# 1. 杀死所有 Gateway 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-gateway'"
# 2. 启动新的 Gateway
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'openclaw gateway run > /tmp/gateway.log 2>&1 &'"
# 3. 等待启动
sleep 5
# 4. 验证状态
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status"
```
### 多个 OpenClaw TUI 实例运行(已修复 2026-02-16
**现象**
- 每次启动 OpenClaw TUI 前需要 `pkill -9 openclaw`
- 启动失败或端口冲突
- 多个 `openclaw-tui` 进程在后台运行
- 容器资源占用异常高
**诊断**
```bash
# 检查运行中的 OpenClaw 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep openclaw"
# 通常会看到多个 openclaw-tui 实例:
# PID 3041 - openclaw-tui (pts/0)
# PID 6338 - openclaw-tui (pts/1)
# PID 7223 - openclaw-tui (pts/2)
# 检查端口占用
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
```
**根本原因**
- 每次运行 `openclaw tui` 都启动新进程
- 退出 TUI 时进程没有完全清理
- 多个实例同时运行导致资源竞争
**永久解决方案(已实施)**
**1. 创建自动清理脚本**(容器中):
```bash
# 在容器中创建 /usr/local/bin/openclaw-clean
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c \"cat > /usr/local/bin/openclaw-clean << 'EOF'
#!/bin/bash
# OpenClaw 清理并重启脚本
# 清理所有非 Tower 管理的 openclaw 进程
echo '🧹 清理旧的 OpenClaw 进程...'
pkill -9 -f 'openclaw-tui' || true
pkill -9 -f 'openclaw tui' || true
# 等待进程完全退出
sleep 1
# 检查剩余进程
REMAINING=\\\$(ps aux | grep -E 'openclaw' | grep -v 'openclaw-gateway' | grep -v 'tower' | grep -v 'grep' | wc -l)
if [ \\\$REMAINING -gt 0 ]; then
echo '⚠️ 警告:还有 '\\\$REMAINING' 个 openclaw 进程'
else
echo '✅ 清理完成'
fi
# 启动 OpenClaw TUI
echo ''
echo '🦞 启动 OpenClaw TUI...'
exec openclaw tui
EOF
chmod +x /usr/local/bin/openclaw-clean\""
```
**2. 更新本地 openclaw-tui 脚本**
修改 `~/bin/openclaw-tui` 的最后一行:
```bash
# 修改前
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b bash -c 'openclaw tui'"
# 修改后
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b openclaw-clean"
```
**修复效果**
- ✅ 每次启动自动清理旧进程
- ✅ 不再需要手动 `pkill -9 openclaw`
- ✅ 避免多实例导致的资源浪费
- ✅ 一条命令 `openclaw-tui` 搞定所有
**使用方法**
```bash
# 以前(需要手动清理)
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'pkill -9 openclaw && openclaw tui'"
# 现在(自动清理)
openclaw-tui # 一条命令搞定!
```
**手动清理**(如果需要):
```bash
# 清理所有 openclaw-tui 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-tui'"
# 验证清理结果
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep openclaw | grep -v tower | grep -v openclaw-gateway"
```
### 容器内存不足
**现象**
- 容器内存使用率超过 90%
- OOM (Out of Memory) 错误
- 进程被 killed
**诊断**
```bash
# 检查内存使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# 查看内存限制
ssh lazycat "lzc-docker inspect 5f3bf33e090b --format='{{.HostConfig.Memory}}'"
# 查看系统总内存
ssh lazycat "free -h"
```
**解决方案**
```bash
# 调整内存限制(如果当前限制过低)
# 注意:懒猫算力仓已设置为 30GB一般不需要调整
# 如确需调整,使用以下命令
ssh lazycat "lzc-docker update 5f3bf33e090b --memory=30g --memory-swap=32g"
# 重启容器使配置生效
ssh lazycat "lzc-docker restart 5f3bf33e090b"
```
### 自动重启失败
**现象**
- systemd timer 未触发
- 重启脚本执行失败
- 日志显示 `lzc-docker: command not found`
**诊断**
```bash
# 检查 timer 状态
ssh lazycat "systemctl status openclaw-restart.timer"
# 检查 service 状态
ssh lazycat "systemctl status openclaw-restart.service"
# 查看 service 日志
ssh lazycat "journalctl -u openclaw-restart.service -n 50"
# 查看脚本日志
ssh lazycat "tail -50 /var/log/openclaw-restart.log"
# 手动测试脚本
ssh lazycat "bash -x /root/restart-openclaw.sh"
```
**解决方案**
问题通常是脚本中 `lzc-docker` 命令找不到PATH 问题)。
```bash
# 确认 lzc-docker 路径
ssh lazycat "which lzc-docker"
# 输出: /lzcsys/bin/lzc-docker
# 确保脚本使用完整路径
ssh lazycat "grep 'lzc-docker' /root/restart-openclaw.sh"
# 应该看到: /lzcsys/bin/lzc-docker
# 如果使用的是相对路径,需要修改
ssh lazycat "sed -i 's|lzc-docker|/lzcsys/bin/lzc-docker|g' /root/restart-openclaw.sh"
ssh lazycat "sed -i 's|lzc-docker|/lzcsys/bin/lzc-docker|g' /root/monitor-openclaw-zombies.sh"
# 重新加载 systemd 配置
ssh lazycat "systemctl daemon-reload"
# 测试执行
ssh lazycat "/root/restart-openclaw.sh"
```
---
## 最佳实践
### 1. 定期健康检查
建议每天执行一次全面健康检查:
```bash
#!/bin/bash
# OpenClaw 健康检查脚本
echo "🔍 OpenClaw 健康检查 - $(date)"
echo "================================"
# 容器状态
echo -e "\n📦 容器状态:"
ssh lazycat "lzc-docker ps --filter id=5f3bf33e090b --format 'Status: {{.Status}}'"
# PID 1 进程
echo -e "\n🏗 PID 1 进程:"
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
# 僵尸进程数
echo -e "\n👻 僵尸进程:"
ZOMBIE_COUNT=$(ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'")
echo "僵尸进程数: $ZOMBIE_COUNT"
if [ $ZOMBIE_COUNT -gt 10 ]; then
echo "⚠️ 警告:僵尸进程较多,建议重启容器"
fi
# 资源使用
echo -e "\n💾 资源使用:"
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# Gateway 状态
echo -e "\n🔌 Gateway 状态:"
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
# 系统负载
echo -e "\n📊 系统负载:"
ssh lazycat "uptime"
echo -e "\n================================"
echo "✅ 健康检查完成"
```
### 2. 日志管理
```bash
# 查看最近的错误日志
ssh lazycat "lzc-docker logs 5f3bf33e090b --since 1h 2>&1 | grep -i error"
# 查看 Tower 崩溃日志
ssh lazycat "lzc-docker logs 5f3bf33e090b 2>&1 | grep -i 'crashed\|failed'"
# 查看 OpenClaw 应用日志
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -100 /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log"
# 清理旧日志保留最近7天
ssh lazycat "lzc-docker exec 5f3bf33e090b find /tmp/openclaw -name '*.log' -mtime +7 -delete"
```
### 3. 备份与恢复
```bash
# 备份 OpenClaw 配置
ssh lazycat "lzc-docker exec 5f3bf33e090b tar czf /tmp/openclaw-config-backup-$(date +%Y%m%d).tar.gz -C /home/node/.openclaw ."
# 下载备份到本地
scp lazycat:/tmp/openclaw-config-backup-*.tar.gz ~/backups/
# 恢复配置
scp ~/backups/openclaw-config-backup-*.tar.gz lazycat:/tmp/
ssh lazycat "lzc-docker exec 5f3bf33e090b tar xzf /tmp/openclaw-config-backup-*.tar.gz -C /home/node/.openclaw"
ssh lazycat "lzc-docker restart 5f3bf33e090b"
```
### 4. 监控告警
建议设置以下监控指标:
- **僵尸进程数** > 50触发告警自动重启已实现
- **内存使用率** > 90%:触发告警
- **Gateway 离线时间** > 5分钟触发告警
- **容器重启次数** > 3次/天:触发告警
### 5. 容器重建后的恢复清单
如果容器被重新创建(从镜像),需要重新应用以下修复:
```bash
# 1. 安装 tini
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'apt-get update -qq && apt-get install -y tini'"
# 2. 修改 entrypoint
ssh lazycat "lzc-docker exec 5f3bf33e090b sed -i 's|exec /usr/local/bin/tower|exec /usr/bin/tini -- /usr/local/bin/tower|g' /usr/local/bin/clawdbot-entrypoint.sh"
# 3. 重启容器
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 4. 验证
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o cmd | grep tini"
```
---
## 附录
### 相关文档
- OpenClaw 官方文档https://docs.openclaw.ai/
- 故障排查指南https://docs.openclaw.ai/troubleshooting
- Tini 项目https://github.com/krallin/tini
### 联系信息
- 懒猫云支持support@lazycat.cloud
- OpenClaw 社区https://community.openclaw.ai/
### 版本历史
- 2026-02-16
- 创建文档,记录 Tower 崩溃修复经验(使用 tini
- 添加多 TUI 实例问题和 openclaw-clean 解决方案
- 2026-02-15实施僵尸进程监控和自动重启
- 2026-02-14调整容器资源限制为接近系统上限
---
## 快速参考
### 常用命令速查
```bash
# 连接 OpenClaw TUI
openclaw-tui
# 查看容器状态
ssh lazycat "lzc-docker ps | grep openclaw"
# 重启容器
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 查看僵尸进程数
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
# 检查 Gateway 状态
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
# 查看资源使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# 查看定时任务
ssh lazycat "systemctl list-timers | grep openclaw"
# 清理多余的 OpenClaw TUI 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-tui'"
# 启动 OpenClaw自动清理旧进程
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw-clean"
# 全面健康检查
ssh lazycat "echo '=== 容器 ===' && lzc-docker ps | grep openclaw && echo '' && echo '=== 僵尸进程 ===' && lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z' && echo '' && echo '=== Gateway ===' && lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
```
### 故障处理速查
| 问题 | 快速解决 |
|------|----------|
| Tower 反复崩溃 | 参考"Tower 反复崩溃"章节,安装 tini |
| 多个 TUI 实例 | 使用 `openclaw-tui`(自动清理)或手动 `pkill -9 -f openclaw-tui` |
| Gateway 无响应 | `ssh lazycat "lzc-docker restart 5f3bf33e090b"` |
| 僵尸进程过多 | `ssh lazycat "lzc-docker restart 5f3bf33e090b"` |
| 内存不足 | 检查资源限制,重启容器 |
| 自动重启失败 | 检查脚本是否使用完整路径 `/lzcsys/bin/lzc-docker` |