refactor: 通用技能按类别拆分为独立目录

skills/ → skills-dev(9), skills-req(10), skills-ops(4),
skills-integration(8), skills-biz(4), skills-workflow(7)

generate-marketplace.py 改为自动扫描所有 skills-* 目录。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-14 11:31:58 +10:30
parent ea266e9cce
commit 712063071c
170 changed files with 341 additions and 346 deletions

View File

@@ -0,0 +1,797 @@
# OpenClaw 运维技能
OpenClaw 容器化部署、运维监控、故障排查完整指南。
## 目录
- [服务器部署](#服务器部署)
- [容器管理](#容器管理)
- [性能优化](#性能优化)
- [故障排查](#故障排查)
- [最佳实践](#最佳实践)
---
## 服务器部署
### 懒猫算力仓 (lazycat)
**服务器信息**
- 主机名haiqing.heiyu.space
- SSH 别名lazycat, lanmao
- 用途OpenClaw 算力服务
- 系统Debian-based Linux
- 容器平台lzc-docker
**OpenClaw 容器信息**
- 容器 ID5f3bf33e090b
- 镜像registry.lazycat.cloud/openclaw:1.1.5
- OpenClaw 版本2026.2.9
- 容器名iamxiaoelzcappopenclaw-openclaw-1
**访问方式**
```bash
# SSH 连接
ssh lazycat
# 进入容器
ssh lazycat "lzc-docker exec -it 5f3bf33e090b bash"
# 启动 OpenClaw TUI
openclaw-tui # 使用本地快捷脚本
```
---
## 容器管理
### 快捷访问脚本
**~/bin/openclaw-tui**
```bash
#!/bin/bash
# OpenClaw TUI 快捷访问脚本(自动启动 Gateway
set -e
echo "🦞 连接到龙虾服务器 (懒猫)..."
echo ""
# 检查并启动 Gateway
echo "检查 OpenClaw Gateway 状态..."
GATEWAY_STATUS=$(ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status" 2>/dev/null | grep "RPC probe" || echo "failed")
if echo "$GATEWAY_STATUS" | grep -q "ok"; then
echo "✅ Gateway 已运行"
else
echo "🔧 启动 Gateway..."
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'nohup openclaw gateway run > /tmp/gateway.log 2>&1 &'" 2>/dev/null
sleep 2
echo "✅ Gateway 已启动"
fi
echo ""
echo "启动 OpenClaw TUI..."
# SSH 到懒猫服务器,然后进入 Docker 容器并启动 OpenClaw TUI
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b bash -c 'openclaw tui'"
```
### 容器操作命令
```bash
# 查看容器状态
ssh lazycat "lzc-docker ps | grep openclaw"
# 查看容器日志
ssh lazycat "lzc-docker logs -f 5f3bf33e090b --tail 100"
# 重启容器
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 查看容器资源使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# 进入容器 shell
ssh lazycat "lzc-docker exec -it 5f3bf33e090b bash"
```
### Gateway 管理
```bash
# 检查 Gateway 状态
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status"
# 启动 Gateway前台
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway run"
# 启动 Gateway后台
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'nohup openclaw gateway run > /tmp/gateway.log 2>&1 &'"
# 停止 Gateway
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -f 'openclaw-gateway'"
# 查看 Gateway 日志
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -f /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log"
```
---
## 性能优化
### 资源配置
**当前配置**(接近系统上限,确保充足性能):
- 内存限制30GB系统 97%
- 内存+交换32GB
- CPU 限制8.0 核心(系统 100%
- 进程限制10,000 个
```bash
# 查看资源限制配置
ssh lazycat "lzc-docker inspect 5f3bf33e090b --format='
内存限制: {{.HostConfig.Memory}} bytes
内存+交换: {{.HostConfig.MemorySwap}} bytes
CPU配额: {{.HostConfig.CpuQuota}}
CPU周期: {{.HostConfig.CpuPeriod}}
PID限制: {{.HostConfig.PidsLimit}}
'"
# 查看实时资源使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
```
**配置说明**
- 懒猫算力仓的主要职责是提供 OpenClaw 服务
- 资源限制设置为接近系统上限,确保有充足资源运行
- 同时提供基本的失控保护机制
### 自动化优化措施
#### 1. 定期自动重启(每周日 03:00
**目的**:清理累积的僵尸进程,释放资源
**查看状态**
```bash
# 查看定时任务状态
ssh lazycat "systemctl status openclaw-restart.timer"
# 查看重启日志
ssh lazycat "tail -50 /var/log/openclaw-restart.log"
# 手动执行重启
ssh lazycat "/root/restart-openclaw.sh"
```
**配置文件**
- Service: `/etc/systemd/system/openclaw-restart.service`
- Timer: `/etc/systemd/system/openclaw-restart.timer`
- 脚本: `/root/restart-openclaw.sh`
**重启脚本** (`/root/restart-openclaw.sh`)
```bash
#!/bin/bash
# OpenClaw 容器定期重启脚本
# 每周日凌晨3点执行
LOG_FILE='/var/log/openclaw-restart.log'
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 开始重启 OpenClaw 容器" >> $LOG_FILE
# 重启容器
/lzcsys/bin/lzc-docker restart 5f3bf33e090b >> $LOG_FILE 2>&1
# 等待容器启动
sleep 10
# 检查健康状态
STATUS=$(/lzcsys/bin/lzc-docker inspect -f '{{.State.Health.Status}}' 5f3bf33e090b 2>/dev/null || echo 'unknown')
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 重启完成,健康状态: $STATUS" >> $LOG_FILE
# 检查僵尸进程
ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l)
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 当前僵尸进程数: $ZOMBIE_COUNT" >> $LOG_FILE
echo "----------------------------------------" >> $LOG_FILE
```
#### 2. 僵尸进程自动监控(每小时检查)
**目的**:监控僵尸进程数量,超过阈值自动重启容器
**查看状态**
```bash
# 查看监控状态
ssh lazycat "systemctl status openclaw-zombie-monitor.timer"
# 查看监控日志
ssh lazycat "tail -50 /var/log/openclaw-zombie-monitor.log"
# 手动检查僵尸进程
ssh lazycat "/root/monitor-openclaw-zombies.sh"
# 直接查看僵尸进程数
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l"
```
**监控参数**
- 检查频率:每小时
- 触发阈值50 个僵尸进程
- 自动操作:重启容器
**监控脚本** (`/root/monitor-openclaw-zombies.sh`)
```bash
#!/bin/bash
# OpenClaw 僵尸进程监控脚本
# 当僵尸进程超过50个时自动重启容器
ZOMBIE_THRESHOLD=50
CONTAINER_ID='5f3bf33e090b'
LOG_FILE='/var/log/openclaw-zombie-monitor.log'
# 检查僵尸进程数量
ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec $CONTAINER_ID ps aux 2>/dev/null | grep -c 'Z' || echo '0')
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 僵尸进程数: $ZOMBIE_COUNT" >> $LOG_FILE
if [ $ZOMBIE_COUNT -gt $ZOMBIE_THRESHOLD ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ⚠️ 僵尸进程超过阈值($ZOMBIE_THRESHOLD),执行自动重启" >> $LOG_FILE
# 重启容器
/lzcsys/bin/lzc-docker restart $CONTAINER_ID >> $LOG_FILE 2>&1
# 等待容器启动
sleep 10
# 再次检查
NEW_ZOMBIE_COUNT=$(/lzcsys/bin/lzc-docker exec $CONTAINER_ID ps aux 2>/dev/null | grep -c 'Z' || echo '0')
echo "[$(date '+%Y-%m-%d %H:%M:%S')] 重启后僵尸进程数: $NEW_ZOMBIE_COUNT" >> $LOG_FILE
echo "----------------------------------------" >> $LOG_FILE
fi
```
#### 3. 全面健康检查
```bash
# 一键健康检查脚本
ssh lazycat "
echo '=== 系统负载 ===' && uptime &&
echo '' && echo '=== 僵尸进程 ===' &&
lzc-docker exec 5f3bf33e090b ps aux | grep 'Z' | wc -l &&
echo '' && echo '=== 容器资源 ===' &&
lzc-docker stats --no-stream 5f3bf33e090b &&
echo '' && echo '=== Gateway 状态 ===' &&
lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe' &&
echo '' && echo '=== 容器健康 ===' &&
lzc-docker inspect 5f3bf33e090b --format='Status: {{.State.Status}}, Health: {{.State.Health.Status}}'
"
# 查看所有定时任务
ssh lazycat "systemctl list-timers | grep openclaw"
```
---
## 故障排查
### Tower 反复崩溃(已修复 2026-02-16
**现象**
- Tower 日志显示反复崩溃:`[tower] OpenClaw crashed: exit status 1`
- Gateway 启动失败:`gateway already running (pid xxx); lock timeout`
- 僵尸 Gateway 进程堆积,无法回收
- 日志中出现多个僵尸进程:`[openclaw-gatewa] <defunct>`
**典型错误日志**
```
[22:19:39] [tower] OpenClaw crashed: exit status 1
[22:24:52] [tower] OpenClaw crashed: signal: killed
[22:27:33] Gateway failed to start: gateway already running (pid 2005)
[22:27:33] If the gateway is supervised, stop it with: openclaw gateway stop
```
**根本原因**
- Tower 作为容器 PID 1 进程,不是专业的 init 进程
- 缺少子进程回收reaping机制导致僵尸进程未被清理
- 僵尸进程占用锁文件和端口18789阻塞新 Gateway 启动
- 容器 PID 1 是 `/usr/local/bin/tower`,没有僵尸进程回收能力
**诊断命令**
```bash
# 查看 PID 1 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
# 查看僵尸进程详情
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'defunct'"
# 检查端口占用
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
# 查看进程树
ssh lazycat "lzc-docker exec 5f3bf33e090b ps auxf | head -30"
```
**永久解决方案(已实施)**
使用 **tini** 作为容器 PID 1自动回收僵尸进程。
```bash
# 1. 在容器中安装 tini专业 init 进程)
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'apt-get update -qq && apt-get install -y tini'"
# 2. 修改 entrypoint 使用 tini 包装 tower
ssh lazycat "lzc-docker exec 5f3bf33e090b sed -i 's|exec /usr/local/bin/tower|exec /usr/bin/tini -- /usr/local/bin/tower|g' /usr/local/bin/clawdbot-entrypoint.sh"
# 3. 验证修改
ssh lazycat "lzc-docker exec 5f3bf33e090b grep 'exec.*tower' /usr/local/bin/clawdbot-entrypoint.sh"
# 应该看到: exec /usr/bin/tini -- /usr/local/bin/tower ...
# 4. 重启容器使修改生效
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 5. 验证 tini 已成为 PID 1
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
# 输出应显示: PID 1 -> /usr/bin/tini -- /usr/local/bin/tower ...
# 6. 检查进程树
ssh lazycat "lzc-docker exec 5f3bf33e090b ps auxf | head -15"
```
**修复后的进程架构**
```
PID 1: /usr/bin/tini (专业 init 进程,自动回收僵尸进程)
└─ PID 58: tower
└─ PID 64: openclaw
└─ PID 72: openclaw-gateway
```
**修复效果**
- ✅ Tini 作为 PID 1自动回收所有僵尸进程
- ✅ 僵尸进程数量从 5+ 个降至 1-2 个(健康水平)
- ✅ Tower 稳定运行,不再反复崩溃
- ✅ Gateway 启动正常,无锁文件冲突
- ✅ RPC probe 持续显示 ok
**注意事项**
- ⚠️ 当前修改在运行容器内,**容器重建后需重新应用**
- 💡 建议向镜像维护者(懒猫云)提交 PR在 Dockerfile 中添加 tini
- 📌 每次从镜像重新创建容器时,需要重新执行上述步骤 1-4
**镜像级永久修复**(建议提交给懒猫云):
在 OpenClaw 镜像的 Dockerfile 中添加:
```dockerfile
# 安装 tini
RUN apt-get update && apt-get install -y tini && rm -rf /var/lib/apt/lists/*
# 或使用更轻量的安装方式
ADD https://github.com/krallin/tini/releases/download/v0.19.0/tini /usr/bin/tini
RUN chmod +x /usr/bin/tini
# 在 entrypoint 脚本中使用 tini 包装(已在当前镜像的 entrypoint 中修改)
```
### 僵尸进程过多
**现象**
- 僵尸进程数超过 50 个
- Gateway 响应变慢
- 容器内存占用升高
**诊断**
```bash
# 查看僵尸进程详情
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep 'Z'"
# 统计僵尸进程数量
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
# 查看僵尸进程父进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -eo pid,ppid,stat,comm | grep 'Z'"
```
**解决方案**
```bash
# 方案 1重启容器推荐
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 方案 2手动清理 Gateway 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-gateway'"
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway run &"
# 验证清理效果
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
```
### Gateway 无响应
**现象**
- `RPC probe: failed` 或超时
- TUI 连接失败:`gateway not connected`
- Dashboard 无法访问
**诊断**
```bash
# 检查 Gateway 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep gateway"
# 检查端口监听
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
# 查看 Gateway 日志
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -100 /tmp/openclaw/openclaw-*.log"
# 测试本地连接
ssh lazycat "lzc-docker exec 5f3bf33e090b curl -I http://127.0.0.1:18789"
```
**解决方案**
```bash
# 1. 杀死所有 Gateway 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-gateway'"
# 2. 启动新的 Gateway
ssh lazycat "lzc-docker exec -d 5f3bf33e090b bash -c 'openclaw gateway run > /tmp/gateway.log 2>&1 &'"
# 3. 等待启动
sleep 5
# 4. 验证状态
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status"
```
### 多个 OpenClaw TUI 实例运行(已修复 2026-02-16
**现象**
- 每次启动 OpenClaw TUI 前需要 `pkill -9 openclaw`
- 启动失败或端口冲突
- 多个 `openclaw-tui` 进程在后台运行
- 容器资源占用异常高
**诊断**
```bash
# 检查运行中的 OpenClaw 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep openclaw"
# 通常会看到多个 openclaw-tui 实例:
# PID 3041 - openclaw-tui (pts/0)
# PID 6338 - openclaw-tui (pts/1)
# PID 7223 - openclaw-tui (pts/2)
# 检查端口占用
ssh lazycat "lzc-docker exec 5f3bf33e090b netstat -tlnp | grep 18789"
```
**根本原因**
- 每次运行 `openclaw tui` 都启动新进程
- 退出 TUI 时进程没有完全清理
- 多个实例同时运行导致资源竞争
**永久解决方案(已实施)**
**1. 创建自动清理脚本**(容器中):
```bash
# 在容器中创建 /usr/local/bin/openclaw-clean
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c \"cat > /usr/local/bin/openclaw-clean << 'EOF'
#!/bin/bash
# OpenClaw 清理并重启脚本
# 清理所有非 Tower 管理的 openclaw 进程
echo '🧹 清理旧的 OpenClaw 进程...'
pkill -9 -f 'openclaw-tui' || true
pkill -9 -f 'openclaw tui' || true
# 等待进程完全退出
sleep 1
# 检查剩余进程
REMAINING=\\\$(ps aux | grep -E 'openclaw' | grep -v 'openclaw-gateway' | grep -v 'tower' | grep -v 'grep' | wc -l)
if [ \\\$REMAINING -gt 0 ]; then
echo '⚠️ 警告:还有 '\\\$REMAINING' 个 openclaw 进程'
else
echo '✅ 清理完成'
fi
# 启动 OpenClaw TUI
echo ''
echo '🦞 启动 OpenClaw TUI...'
exec openclaw tui
EOF
chmod +x /usr/local/bin/openclaw-clean\""
```
**2. 更新本地 openclaw-tui 脚本**
修改 `~/bin/openclaw-tui` 的最后一行:
```bash
# 修改前
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b bash -c 'openclaw tui'"
# 修改后
ssh -t lazycat "lzc-docker exec -it 5f3bf33e090b openclaw-clean"
```
**修复效果**
- ✅ 每次启动自动清理旧进程
- ✅ 不再需要手动 `pkill -9 openclaw`
- ✅ 避免多实例导致的资源浪费
- ✅ 一条命令 `openclaw-tui` 搞定所有
**使用方法**
```bash
# 以前(需要手动清理)
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'pkill -9 openclaw && openclaw tui'"
# 现在(自动清理)
openclaw-tui # 一条命令搞定!
```
**手动清理**(如果需要):
```bash
# 清理所有 openclaw-tui 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-tui'"
# 验证清理结果
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep openclaw | grep -v tower | grep -v openclaw-gateway"
```
### 容器内存不足
**现象**
- 容器内存使用率超过 90%
- OOM (Out of Memory) 错误
- 进程被 killed
**诊断**
```bash
# 检查内存使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# 查看内存限制
ssh lazycat "lzc-docker inspect 5f3bf33e090b --format='{{.HostConfig.Memory}}'"
# 查看系统总内存
ssh lazycat "free -h"
```
**解决方案**
```bash
# 调整内存限制(如果当前限制过低)
# 注意:懒猫算力仓已设置为 30GB一般不需要调整
# 如确需调整,使用以下命令
ssh lazycat "lzc-docker update 5f3bf33e090b --memory=30g --memory-swap=32g"
# 重启容器使配置生效
ssh lazycat "lzc-docker restart 5f3bf33e090b"
```
### 自动重启失败
**现象**
- systemd timer 未触发
- 重启脚本执行失败
- 日志显示 `lzc-docker: command not found`
**诊断**
```bash
# 检查 timer 状态
ssh lazycat "systemctl status openclaw-restart.timer"
# 检查 service 状态
ssh lazycat "systemctl status openclaw-restart.service"
# 查看 service 日志
ssh lazycat "journalctl -u openclaw-restart.service -n 50"
# 查看脚本日志
ssh lazycat "tail -50 /var/log/openclaw-restart.log"
# 手动测试脚本
ssh lazycat "bash -x /root/restart-openclaw.sh"
```
**解决方案**
问题通常是脚本中 `lzc-docker` 命令找不到PATH 问题)。
```bash
# 确认 lzc-docker 路径
ssh lazycat "which lzc-docker"
# 输出: /lzcsys/bin/lzc-docker
# 确保脚本使用完整路径
ssh lazycat "grep 'lzc-docker' /root/restart-openclaw.sh"
# 应该看到: /lzcsys/bin/lzc-docker
# 如果使用的是相对路径,需要修改
ssh lazycat "sed -i 's|lzc-docker|/lzcsys/bin/lzc-docker|g' /root/restart-openclaw.sh"
ssh lazycat "sed -i 's|lzc-docker|/lzcsys/bin/lzc-docker|g' /root/monitor-openclaw-zombies.sh"
# 重新加载 systemd 配置
ssh lazycat "systemctl daemon-reload"
# 测试执行
ssh lazycat "/root/restart-openclaw.sh"
```
---
## 最佳实践
### 1. 定期健康检查
建议每天执行一次全面健康检查:
```bash
#!/bin/bash
# OpenClaw 健康检查脚本
echo "🔍 OpenClaw 健康检查 - $(date)"
echo "================================"
# 容器状态
echo -e "\n📦 容器状态:"
ssh lazycat "lzc-docker ps --filter id=5f3bf33e090b --format 'Status: {{.Status}}'"
# PID 1 进程
echo -e "\n🏗 PID 1 进程:"
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o pid,ppid,cmd"
# 僵尸进程数
echo -e "\n👻 僵尸进程:"
ZOMBIE_COUNT=$(ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'")
echo "僵尸进程数: $ZOMBIE_COUNT"
if [ $ZOMBIE_COUNT -gt 10 ]; then
echo "⚠️ 警告:僵尸进程较多,建议重启容器"
fi
# 资源使用
echo -e "\n💾 资源使用:"
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# Gateway 状态
echo -e "\n🔌 Gateway 状态:"
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
# 系统负载
echo -e "\n📊 系统负载:"
ssh lazycat "uptime"
echo -e "\n================================"
echo "✅ 健康检查完成"
```
### 2. 日志管理
```bash
# 查看最近的错误日志
ssh lazycat "lzc-docker logs 5f3bf33e090b --since 1h 2>&1 | grep -i error"
# 查看 Tower 崩溃日志
ssh lazycat "lzc-docker logs 5f3bf33e090b 2>&1 | grep -i 'crashed\|failed'"
# 查看 OpenClaw 应用日志
ssh lazycat "lzc-docker exec 5f3bf33e090b tail -100 /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log"
# 清理旧日志保留最近7天
ssh lazycat "lzc-docker exec 5f3bf33e090b find /tmp/openclaw -name '*.log' -mtime +7 -delete"
```
### 3. 备份与恢复
```bash
# 备份 OpenClaw 配置
ssh lazycat "lzc-docker exec 5f3bf33e090b tar czf /tmp/openclaw-config-backup-$(date +%Y%m%d).tar.gz -C /home/node/.openclaw ."
# 下载备份到本地
scp lazycat:/tmp/openclaw-config-backup-*.tar.gz ~/backups/
# 恢复配置
scp ~/backups/openclaw-config-backup-*.tar.gz lazycat:/tmp/
ssh lazycat "lzc-docker exec 5f3bf33e090b tar xzf /tmp/openclaw-config-backup-*.tar.gz -C /home/node/.openclaw"
ssh lazycat "lzc-docker restart 5f3bf33e090b"
```
### 4. 监控告警
建议设置以下监控指标:
- **僵尸进程数** > 50触发告警自动重启已实现
- **内存使用率** > 90%:触发告警
- **Gateway 离线时间** > 5分钟触发告警
- **容器重启次数** > 3次/天:触发告警
### 5. 容器重建后的恢复清单
如果容器被重新创建(从镜像),需要重新应用以下修复:
```bash
# 1. 安装 tini
ssh lazycat "lzc-docker exec 5f3bf33e090b bash -c 'apt-get update -qq && apt-get install -y tini'"
# 2. 修改 entrypoint
ssh lazycat "lzc-docker exec 5f3bf33e090b sed -i 's|exec /usr/local/bin/tower|exec /usr/bin/tini -- /usr/local/bin/tower|g' /usr/local/bin/clawdbot-entrypoint.sh"
# 3. 重启容器
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 4. 验证
ssh lazycat "lzc-docker exec 5f3bf33e090b ps -p 1 -o cmd | grep tini"
```
---
## 附录
### 相关文档
- OpenClaw 官方文档https://docs.openclaw.ai/
- 故障排查指南https://docs.openclaw.ai/troubleshooting
- Tini 项目https://github.com/krallin/tini
### 联系信息
- 懒猫云支持support@lazycat.cloud
- OpenClaw 社区https://community.openclaw.ai/
### 版本历史
- 2026-02-16
- 创建文档,记录 Tower 崩溃修复经验(使用 tini
- 添加多 TUI 实例问题和 openclaw-clean 解决方案
- 2026-02-15实施僵尸进程监控和自动重启
- 2026-02-14调整容器资源限制为接近系统上限
---
## 快速参考
### 常用命令速查
```bash
# 连接 OpenClaw TUI
openclaw-tui
# 查看容器状态
ssh lazycat "lzc-docker ps | grep openclaw"
# 重启容器
ssh lazycat "lzc-docker restart 5f3bf33e090b"
# 查看僵尸进程数
ssh lazycat "lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z'"
# 检查 Gateway 状态
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
# 查看资源使用
ssh lazycat "lzc-docker stats --no-stream 5f3bf33e090b"
# 查看定时任务
ssh lazycat "systemctl list-timers | grep openclaw"
# 清理多余的 OpenClaw TUI 进程
ssh lazycat "lzc-docker exec 5f3bf33e090b pkill -9 -f 'openclaw-tui'"
# 启动 OpenClaw自动清理旧进程
ssh lazycat "lzc-docker exec 5f3bf33e090b openclaw-clean"
# 全面健康检查
ssh lazycat "echo '=== 容器 ===' && lzc-docker ps | grep openclaw && echo '' && echo '=== 僵尸进程 ===' && lzc-docker exec 5f3bf33e090b ps aux | grep -c 'Z' && echo '' && echo '=== Gateway ===' && lzc-docker exec 5f3bf33e090b openclaw gateway status | grep 'RPC probe'"
```
### 故障处理速查
| 问题 | 快速解决 |
|------|----------|
| Tower 反复崩溃 | 参考"Tower 反复崩溃"章节,安装 tini |
| 多个 TUI 实例 | 使用 `openclaw-tui`(自动清理)或手动 `pkill -9 -f openclaw-tui` |
| Gateway 无响应 | `ssh lazycat "lzc-docker restart 5f3bf33e090b"` |
| 僵尸进程过多 | `ssh lazycat "lzc-docker restart 5f3bf33e090b"` |
| 内存不足 | 检查资源限制,重启容器 |
| 自动重启失败 | 检查脚本是否使用完整路径 `/lzcsys/bin/lzc-docker` |