move claude-marketplace to ai-proj-helper

This commit is contained in:
2026-03-12 21:42:30 +08:00
parent d7b6835e1d
commit 43585b8504
188 changed files with 39510 additions and 0 deletions

View File

@@ -0,0 +1,948 @@
---
name: finance
description: 财务对账技能。支持银行对账、应收对账、应付对账、内部往来对账等。通过自然语言实现账单导入、差异分析、对账报告生成。当用户提到对账、银行流水、应收应付、账单核对、财务核销相关任务时自动激活。
---
# 财务对账技能
## 功能概述
- **银行对账**: 银行流水 vs 系统收付款记录匹配
- **应收对账**: 客户对账单生成与核对
- **应付对账**: 供应商对账单核对
- **内部对账**: 公司间/部门间往来核对
- **差异分析**: 自动识别未达账项、差异原因
- **对账报告**: 生成标准化对账报告
- **扫描单据OCR**: PDF/图片扫描件识别,提取订单号与对账单匹配(适用于诉讼证据核对)
---
## 对账类型
### 1. 银行对账
| 对账项 | 说明 |
|--------|------|
| 银行已收/企业未收 | 银行已入账,系统未记录 |
| 企业已收/银行未收 | 系统已记录,银行未入账 |
| 银行已付/企业未付 | 银行已扣款,系统未记录 |
| 企业已付/银行未付 | 系统已记录,银行未扣款 |
### 2. 应收对账
- 客户欠款余额核对
- 发票与回款匹配
- 账龄分析
### 3. 应付对账
- 供应商应付余额核对
- 采购入库与付款匹配
- 预付款冲销
---
## 数据格式要求
### 银行流水格式
```csv
交易日期,交易类型,摘要,金额,余额,对方账户,对方名称,交易流水号
2026-01-15,收入,货款,50000.00,150000.00,622848******1234,深圳XX公司,TXN20260115001
2026-01-16,支出,采购款,-30000.00,120000.00,622848******5678,广州YY公司,TXN20260116001
```
### 系统收付款格式
```csv
单据日期,单据类型,单据编号,金额,客户/供应商,备注,关联流水号
2026-01-15,收款单,SK20260115001,50000.00,深圳XX公司,货款,TXN20260115001
2026-01-16,付款单,FK20260116001,30000.00,广州YY公司,采购款,
```
---
## 自然语言操作示例
### 银行对账
| 用户说 | 执行操作 |
|--------|----------|
| "导入银行流水" | 解析银行导出的 Excel/CSV |
| "和系统记录对账" | 自动匹配流水号/金额/日期 |
| "找出未达账项" | 识别双方差异记录 |
| "生成银行余额调节表" | 输出标准调节表 |
### 应收对账
| 用户说 | 执行操作 |
|--------|----------|
| "生成客户对账单" | 按客户汇总应收明细 |
| "核对XX公司的应收款" | 单客户对账 |
| "应收账龄分析" | 按账龄分段统计 |
| "哪些发票还没收到款" | 查询未核销发票 |
### 应付对账
| 用户说 | 执行操作 |
|--------|----------|
| "核对供应商对账单" | 导入并匹配供应商账单 |
| "本月应付汇总" | 按供应商统计应付 |
| "哪些采购单还没付款" | 查询未付款采购单 |
### 扫描单据OCR对账
| 用户说 | 执行操作 |
|--------|----------|
| "识别扫描件与对账单匹配" | OCR 提取 PDF 订单号并与 Excel 对账 |
| "承运商起诉了,核对证据" | 诉讼证据对账分析 |
| "PDF发票识别" | OCR 提取发票信息 |
| "找出对账单没有的订单" | 差异订单识别 |
---
## Python 代码模板
### 银行对账核心逻辑
```python
import pandas as pd
from datetime import datetime
def load_bank_statement(file_path: str) -> pd.DataFrame:
"""加载银行流水"""
df = pd.read_excel(file_path) if file_path.endswith('.xlsx') else pd.read_csv(file_path)
# 标准化列名
column_mapping = {
'交易日期': 'date',
'金额': 'amount',
'交易流水号': 'ref_no',
'对方名称': 'counterparty',
'摘要': 'memo'
}
df = df.rename(columns=column_mapping)
df['date'] = pd.to_datetime(df['date'])
df['source'] = 'bank'
return df
def load_system_records(file_path: str) -> pd.DataFrame:
"""加载系统收付款记录"""
df = pd.read_excel(file_path) if file_path.endswith('.xlsx') else pd.read_csv(file_path)
column_mapping = {
'单据日期': 'date',
'金额': 'amount',
'关联流水号': 'ref_no',
'客户/供应商': 'counterparty',
'单据编号': 'doc_no'
}
df = df.rename(columns=column_mapping)
df['date'] = pd.to_datetime(df['date'])
df['source'] = 'system'
return df
def reconcile(bank_df: pd.DataFrame, system_df: pd.DataFrame) -> dict:
"""
执行对账
Returns:
{
'matched': 已匹配记录,
'bank_only': 银行有/系统无 (银行未达),
'system_only': 系统有/银行无 (企业未达),
'summary': 对账汇总
}
"""
# 方法1: 按流水号精确匹配
bank_with_ref = bank_df[bank_df['ref_no'].notna() & (bank_df['ref_no'] != '')]
system_with_ref = system_df[system_df['ref_no'].notna() & (system_df['ref_no'] != '')]
matched_by_ref = pd.merge(
bank_with_ref,
system_with_ref,
on='ref_no',
suffixes=('_bank', '_system'),
how='inner'
)
matched_refs = set(matched_by_ref['ref_no'])
# 方法2: 无流水号的按金额+日期模糊匹配
bank_unmatched = bank_df[~bank_df['ref_no'].isin(matched_refs)]
system_unmatched = system_df[~system_df['ref_no'].isin(matched_refs)]
# 按金额和日期(±3天)匹配
fuzzy_matched = []
bank_remaining = bank_unmatched.copy()
system_remaining = system_unmatched.copy()
for idx, bank_row in bank_unmatched.iterrows():
for sys_idx, sys_row in system_remaining.iterrows():
if (abs(bank_row['amount'] - sys_row['amount']) < 0.01 and
abs((bank_row['date'] - sys_row['date']).days) <= 3):
fuzzy_matched.append({
'bank_idx': idx,
'system_idx': sys_idx,
'amount': bank_row['amount'],
'bank_date': bank_row['date'],
'system_date': sys_row['date']
})
system_remaining = system_remaining.drop(sys_idx)
bank_remaining = bank_remaining.drop(idx)
break
# 未匹配项
bank_only = bank_remaining # 银行有/系统无
system_only = system_remaining # 系统有/银行无
# 汇总
summary = {
'bank_total': bank_df['amount'].sum(),
'system_total': system_df['amount'].sum(),
'matched_count': len(matched_by_ref) + len(fuzzy_matched),
'bank_only_count': len(bank_only),
'bank_only_amount': bank_only['amount'].sum() if len(bank_only) > 0 else 0,
'system_only_count': len(system_only),
'system_only_amount': system_only['amount'].sum() if len(system_only) > 0 else 0,
'difference': bank_df['amount'].sum() - system_df['amount'].sum()
}
return {
'matched': pd.concat([matched_by_ref, pd.DataFrame(fuzzy_matched)]),
'bank_only': bank_only,
'system_only': system_only,
'summary': summary
}
```
### 银行余额调节表
```python
def generate_bank_reconciliation_report(
bank_balance: float,
book_balance: float,
bank_only: pd.DataFrame,
system_only: pd.DataFrame,
as_of_date: str
) -> str:
"""生成银行余额调节表"""
# 分类未达账项
bank_receipts_not_in_book = bank_only[bank_only['amount'] > 0]['amount'].sum()
bank_payments_not_in_book = bank_only[bank_only['amount'] < 0]['amount'].sum()
book_receipts_not_in_bank = system_only[system_only['amount'] > 0]['amount'].sum()
book_payments_not_in_bank = system_only[system_only['amount'] < 0]['amount'].sum()
report = f"""
================================================================================
银行存款余额调节表
截止日期: {as_of_date}
================================================================================
一、银行对账单余额 {bank_balance:>15,.2f}
加: 企业已收/银行未收 {book_receipts_not_in_bank:>15,.2f}
减: 企业已付/银行未付 {abs(book_payments_not_in_bank):>15,.2f}
─────────────────
调节后余额 {bank_balance + book_receipts_not_in_bank + book_payments_not_in_bank:>15,.2f}
--------------------------------------------------------------------------------
二、企业账面余额 {book_balance:>15,.2f}
加: 银行已收/企业未收 {bank_receipts_not_in_book:>15,.2f}
减: 银行已付/企业未付 {abs(bank_payments_not_in_book):>15,.2f}
─────────────────
调节后余额 {book_balance + bank_receipts_not_in_book + bank_payments_not_in_book:>15,.2f}
================================================================================
未达账项明细
================================================================================
【银行已收/企业未收】
"""
if len(bank_only[bank_only['amount'] > 0]) > 0:
for _, row in bank_only[bank_only['amount'] > 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
report += "\n【银行已付/企业未付】\n"
if len(bank_only[bank_only['amount'] < 0]) > 0:
for _, row in bank_only[bank_only['amount'] < 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
report += "\n【企业已收/银行未收】\n"
if len(system_only[system_only['amount'] > 0]) > 0:
for _, row in system_only[system_only['amount'] > 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
report += "\n【企业已付/银行未付】\n"
if len(system_only[system_only['amount'] < 0]) > 0:
for _, row in system_only[system_only['amount'] < 0].iterrows():
report += f" {row['date'].strftime('%Y-%m-%d')} {row.get('counterparty', ''):<20} {row['amount']:>12,.2f}\n"
else:
report += " (无)\n"
return report
```
### 应收对账
```python
def generate_ar_statement(
customer_name: str,
transactions: pd.DataFrame,
as_of_date: str
) -> str:
"""
生成客户对账单
transactions 格式:
- date: 日期
- doc_type: 单据类型 (发票/收款/退货)
- doc_no: 单据编号
- debit: 借方(应收增加)
- credit: 贷方(应收减少)
- memo: 摘要
"""
df = transactions.sort_values('date')
report = f"""
================================================================================
客户对账单
================================================================================
客户名称: {customer_name}
对账期间: {df['date'].min().strftime('%Y-%m-%d')}{as_of_date}
================================================================================
日期 单据类型 单据编号 借方 贷方 余额
--------------------------------------------------------------------------------
"""
balance = 0
for _, row in df.iterrows():
debit = row.get('debit', 0) or 0
credit = row.get('credit', 0) or 0
balance += debit - credit
report += f"{row['date'].strftime('%Y-%m-%d')} {row['doc_type']:<8} {row['doc_no']:<16} {debit:>12,.2f} {credit:>12,.2f} {balance:>12,.2f}\n"
report += f"""
--------------------------------------------------------------------------------
合计 {df['debit'].sum():>12,.2f} {df['credit'].sum():>12,.2f} {balance:>12,.2f}
================================================================================
期末应收余额: {balance:,.2f}
请贵司核对以上账目,如有异议请于收到对账单后 7 日内书面告知。
如无异议,视为确认。
对账联系人: _______________
联系电话: _______________
"""
return report
```
### 账龄分析
```python
from datetime import datetime, timedelta
def aging_analysis(
receivables: pd.DataFrame,
as_of_date: str,
aging_buckets: list = [30, 60, 90, 180, 365]
) -> pd.DataFrame:
"""
应收账龄分析
receivables 格式:
- customer: 客户
- invoice_date: 发票日期
- amount: 未收金额
"""
as_of = datetime.strptime(as_of_date, '%Y-%m-%d')
def get_aging_bucket(invoice_date):
days = (as_of - invoice_date).days
for i, bucket in enumerate(aging_buckets):
if days <= bucket:
if i == 0:
return f'0-{bucket}'
else:
return f'{aging_buckets[i-1]+1}-{bucket}'
return f'{aging_buckets[-1]+1}天以上'
df = receivables.copy()
df['invoice_date'] = pd.to_datetime(df['invoice_date'])
df['aging_days'] = (as_of - df['invoice_date']).dt.days
df['aging_bucket'] = df['invoice_date'].apply(get_aging_bucket)
# 按客户和账龄分组汇总
summary = df.pivot_table(
index='customer',
columns='aging_bucket',
values='amount',
aggfunc='sum',
fill_value=0
)
# 添加合计列
summary['合计'] = summary.sum(axis=1)
# 添加合计行
summary.loc['合计'] = summary.sum()
return summary
```
---
## 对账报告模板
### Excel 对账报告
```python
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, Border, Side, PatternFill
def export_reconciliation_to_excel(
result: dict,
output_path: str,
report_date: str
):
"""导出对账结果到 Excel"""
wb = Workbook()
# Sheet1: 对账汇总
ws_summary = wb.active
ws_summary.title = "对账汇总"
summary = result['summary']
ws_summary['A1'] = '对账汇总报告'
ws_summary['A1'].font = Font(bold=True, size=14)
ws_summary['A2'] = f'对账日期: {report_date}'
headers = ['项目', '金额', '笔数']
for col, header in enumerate(headers, 1):
ws_summary.cell(row=4, column=col, value=header)
data = [
('银行流水合计', summary['bank_total'], '-'),
('系统记录合计', summary['system_total'], '-'),
('已匹配', '-', summary['matched_count']),
('银行有/系统无', summary['bank_only_amount'], summary['bank_only_count']),
('系统有/银行无', summary['system_only_amount'], summary['system_only_count']),
('差异金额', summary['difference'], '-'),
]
for row, (item, amount, count) in enumerate(data, 5):
ws_summary.cell(row=row, column=1, value=item)
ws_summary.cell(row=row, column=2, value=amount)
ws_summary.cell(row=row, column=3, value=count)
# Sheet2: 银行未达账项
ws_bank = wb.create_sheet("银行未达账项")
if len(result['bank_only']) > 0:
for col, header in enumerate(result['bank_only'].columns, 1):
ws_bank.cell(row=1, column=col, value=header)
for row, data_row in enumerate(result['bank_only'].values, 2):
for col, value in enumerate(data_row, 1):
ws_bank.cell(row=row, column=col, value=value)
# Sheet3: 企业未达账项
ws_system = wb.create_sheet("企业未达账项")
if len(result['system_only']) > 0:
for col, header in enumerate(result['system_only'].columns, 1):
ws_system.cell(row=1, column=col, value=header)
for row, data_row in enumerate(result['system_only'].values, 2):
for col, value in enumerate(data_row, 1):
ws_system.cell(row=row, column=col, value=value)
wb.save(output_path)
print(f"对账报告已保存: {output_path}")
```
---
## 常见对账场景
### 场景1: 月末银行对账
```python
# 1. 加载数据
bank_df = load_bank_statement('银行流水_202601.xlsx')
system_df = load_system_records('收付款记录_202601.xlsx')
# 2. 执行对账
result = reconcile(bank_df, system_df)
# 3. 查看汇总
print(f"已匹配: {result['summary']['matched_count']}")
print(f"银行未达: {result['summary']['bank_only_count']} 笔, 金额 {result['summary']['bank_only_amount']:,.2f}")
print(f"企业未达: {result['summary']['system_only_count']} 笔, 金额 {result['summary']['system_only_amount']:,.2f}")
# 4. 生成调节表
report = generate_bank_reconciliation_report(
bank_balance=150000.00, # 银行对账单余额
book_balance=145000.00, # 企业账面余额
bank_only=result['bank_only'],
system_only=result['system_only'],
as_of_date='2026-01-31'
)
print(report)
# 5. 导出 Excel
export_reconciliation_to_excel(result, '银行对账_202601.xlsx', '2026-01-31')
```
### 场景2: 客户对账
```python
# 加载客户交易明细
ar_transactions = pd.read_excel('应收明细.xlsx')
# 筛选特定客户
customer_data = ar_transactions[ar_transactions['customer'] == '深圳XX公司']
# 生成对账单
statement = generate_ar_statement(
customer_name='深圳XX公司',
transactions=customer_data,
as_of_date='2026-01-31'
)
print(statement)
```
### 场景3: 账龄分析
```python
# 加载未收款数据
receivables = pd.read_excel('应收账款.xlsx')
# 账龄分析
aging = aging_analysis(
receivables=receivables,
as_of_date='2026-01-31',
aging_buckets=[30, 60, 90, 180, 365]
)
print(aging)
# 导出
aging.to_excel('账龄分析_202601.xlsx')
```
### 场景4: 扫描单据OCR识别与对账匹配诉讼证据核对
适用于:承运商/供应商提供扫描件作为诉讼证据,需要与我司对账单进行匹配核对。
```python
from pdf2image import convert_from_path
import pytesseract
import pandas as pd
import re
def ocr_pdf_extract_orders(pdf_path: str, dpi: int = 150) -> pd.DataFrame:
"""
从 PDF 扫描件中 OCR 提取订单信息
Args:
pdf_path: PDF 文件路径
dpi: 图像分辨率,越高识别越准但速度越慢
Returns:
DataFrame 包含 order_no, page, tc_no 等字段
"""
print(f"正在转换 PDF...")
images = convert_from_path(pdf_path, dpi=dpi)
print(f"{len(images)} 页,开始 OCR 识别...")
all_records = []
for i, img in enumerate(images, 1):
if i % 10 == 0:
print(f" 处理进度: {i}/{len(images)}")
text = pytesseract.image_to_string(img, lang='chi_sim+eng')
# 提取多种订单号格式(根据实际业务调整)
hm_orders = re.findall(r'(HM\d{12,15})(?:-\d+)?', text) # 华住订单
hpo_orders = re.findall(r'(HPO-\d{8}-\d+)', text) # 华住采购订单
fy_orders = re.findall(r'(FY\d{9,})', text) # 凤悦订单
phgco_orders = re.findall(r'(PHGCO\d{9,})', text) # 锦江订单
tc_orders = re.findall(r'(TC\d{8,})', text) # 运单号
# 提取日期
dates = re.findall(r'(\d{4}-\d{2}-\d{2})', text)
all_orders = hm_orders + hpo_orders + fy_orders + phgco_orders
for order in all_orders:
all_records.append({
'order_no': order,
'page': i,
'tc_no': tc_orders[0] if tc_orders else None,
'date': dates[0] if dates else None,
'raw_text': text[:500] # 保留部分原文用于核查
})
df = pd.DataFrame(all_records)
df = df.drop_duplicates(subset=['order_no'])
print(f"提取完成,共 {len(df)} 条唯一订单")
return df
def extract_base_order_no(order_no: str) -> str:
"""
提取基础订单号(移除后缀如 -1, -2
例如: HM202505227320-1 -> HM202505227320
"""
if pd.isna(order_no):
return None
order_str = str(order_no)
match = re.match(r'(HM\d{12,15}|HPO-\d{8}-\d+|FY\d{9,}|PHGCO\d{9,})', order_str)
return match.group(1) if match else order_str
def reconcile_scanned_with_excel(
scanned_df: pd.DataFrame,
excel_path: str,
main_sheet: str,
pending_sheet: str = None,
order_column: str = '华住/锦江订单号'
) -> dict:
"""
扫描单据与 Excel 对账单匹配
Args:
scanned_df: OCR 提取的订单 DataFrame
excel_path: Excel 对账单路径
main_sheet: 主数据 Sheet 名称
pending_sheet: 待结算/次月结 Sheet 名称(可选)
order_column: 订单号列名
Returns:
{
'matched_main': 在主表中匹配的订单,
'matched_pending': 在待结算表中匹配的订单,
'not_found': 未找到的订单,
'summary': 汇总统计
}
"""
# 加载主表
df_main = pd.read_excel(excel_path, sheet_name=main_sheet)
# 处理表头(如果第一行是表头)
if df_main.iloc[0].astype(str).str.contains(order_column).any():
df_main.columns = df_main.iloc[0]
df_main = df_main.iloc[1:].reset_index(drop=True)
# 提取基础订单号
main_orders = set(df_main[order_column].apply(extract_base_order_no).dropna())
# 加载待结算表(如有)
pending_orders = set()
df_pending = None
if pending_sheet:
try:
df_pending = pd.read_excel(excel_path, sheet_name=pending_sheet)
pending_orders = set(df_pending[order_column].apply(extract_base_order_no).dropna())
except:
pass
all_excel_orders = main_orders | pending_orders
# 扫描件订单号
scan_orders = set(scanned_df['order_no'].apply(extract_base_order_no).dropna())
# 匹配分析
matched_main = scan_orders & main_orders
matched_pending = scan_orders & pending_orders
not_found = scan_orders - all_excel_orders
# 汇总
summary = {
'scan_total': len(scan_orders),
'matched_main_count': len(matched_main),
'matched_pending_count': len(matched_pending),
'not_found_count': len(not_found),
'match_rate': (len(matched_main) + len(matched_pending)) / len(scan_orders) * 100 if scan_orders else 0
}
return {
'matched_main': matched_main,
'matched_pending': matched_pending,
'not_found': not_found,
'summary': summary,
'df_main': df_main,
'df_pending': df_pending
}
def generate_litigation_reconciliation_report(
result: dict,
scanned_df: pd.DataFrame,
output_path: str
):
"""
生成诉讼对账分析报告
"""
summary = result['summary']
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
# Sheet1: 汇总
summary_df = pd.DataFrame({
'项目': [
'扫描件订单数',
'在主对账单中找到',
'在待结算表中找到',
'完全未找到',
'匹配率'
],
'数值': [
summary['scan_total'],
summary['matched_main_count'],
summary['matched_pending_count'],
summary['not_found_count'],
f"{summary['match_rate']:.1f}%"
]
})
summary_df.to_excel(writer, sheet_name='对账汇总', index=False)
# Sheet2: 未找到的订单
if result['not_found']:
not_found_df = scanned_df[
scanned_df['order_no'].apply(extract_base_order_no).isin(result['not_found'])
][['order_no', 'page', 'tc_no', 'date']]
not_found_df.columns = ['订单号', 'PDF页码', '运单号', '日期']
not_found_df.to_excel(writer, sheet_name='未找到订单', index=False)
# Sheet3: 待结算订单详情
if result['matched_pending'] and result['df_pending'] is not None:
pending_detail = result['df_pending'][
result['df_pending']['华住/锦江订单号'].apply(extract_base_order_no).isin(result['matched_pending'])
]
pending_detail.to_excel(writer, sheet_name='待结算订单', index=False)
print(f"报告已保存: {output_path}")
```
#### 使用示例
```python
# 1. 从 PDF 扫描件提取订单
pdf_path = "承运商证据_6月业务.pdf"
scanned_df = ocr_pdf_extract_orders(pdf_path, dpi=150)
# 2. 与 Excel 对账单匹配
result = reconcile_scanned_with_excel(
scanned_df=scanned_df,
excel_path="对账单_2025年6月.xlsx",
main_sheet="2025.6",
pending_sheet="次月结",
order_column="华住/锦江订单号"
)
# 3. 查看匹配结果
print(f"匹配率: {result['summary']['match_rate']:.1f}%")
print(f"未找到: {result['summary']['not_found_count']}")
if result['not_found']:
print("\n未找到的订单:")
for order in result['not_found']:
print(f" {order}")
# 4. 生成报告
generate_litigation_reconciliation_report(
result=result,
scanned_df=scanned_df,
output_path="诉讼对账分析报告.xlsx"
)
```
#### OCR 识别注意事项
1. **环境依赖**:
```bash
# macOS
brew install tesseract tesseract-lang poppler
pip install pdf2image pytesseract pandas openpyxl
```
2. **识别精度**:
- DPI 150 适合快速扫描DPI 200-250 适合精确识别
- 中文识别需要 `chi_sim` 语言包
3. **订单号格式差异**:
- PDF 中可能是 `HM202505227320`
- Excel 中可能是 `HM202505227320-1`、`HM202505227320-2`(带后缀)
- 匹配时应使用基础订单号
4. **常见问题**:
- 空白页(签收单背面)会被自动跳过
- OCR 可能误识别字符(如 5→S, 0→O建议人工抽查
- 大 PDF 处理较慢,可先用低 DPI 预览
5. **"次月结"/"待结算" 订单**:
- 这类订单通常标记为"没有报价"、"未建单"等
- 虽然在系统中有记录,但可能未计入正式结算金额
- 诉讼中需特别关注这类订单的状态
6. **多承运商订单归属校验(重要风险点)**:
- 承运商可能将其他公司的订单混入证据,企图多收运费
- 必须检查PDF订单是否出现在其他承运商的对账单中
- 如果订单同时出现在多个承运商对账单中,则为异常订单
#### 多承运商订单归属校验
```python
def check_order_ownership(
scanned_df: pd.DataFrame,
my_company_excel: str,
other_carrier_excels: list[str],
order_column: str = '华住/锦江订单号'
) -> dict:
"""
检查订单归属,识别是否有其他承运商的订单被混入
Args:
scanned_df: OCR 提取的订单 DataFrame
my_company_excel: 我司对账单路径
other_carrier_excels: 其他承运商对账单路径列表
order_column: 订单号列名
Returns:
{
'my_orders': 属于我司的订单,
'other_carrier_orders': 属于其他承运商的订单(异常),
'disputed_orders': 同时出现在多处的争议订单,
'summary': 汇总
}
"""
# 提取PDF订单
scan_orders = set(scanned_df['order_no'].apply(extract_base_order_no).dropna())
# 加载我司对账单订单
my_orders = set()
try:
xl = pd.ExcelFile(my_company_excel)
for sheet in xl.sheet_names:
df = pd.read_excel(my_company_excel, sheet_name=sheet)
for col in df.columns:
if '订单' in str(col):
orders = df[col].apply(extract_base_order_no).dropna()
my_orders.update(orders)
except Exception as e:
print(f"加载我司对账单失败: {e}")
# 加载其他承运商对账单订单
other_orders = {} # {carrier_name: set of orders}
for excel_path in other_carrier_excels:
carrier_name = excel_path.split('/')[-1]
carrier_orders = set()
try:
xl = pd.ExcelFile(excel_path)
for sheet in xl.sheet_names:
df = pd.read_excel(excel_path, sheet_name=sheet)
for col in df.columns:
if '订单' in str(col):
orders = df[col].apply(extract_base_order_no).dropna()
carrier_orders.update(orders)
other_orders[carrier_name] = carrier_orders
except:
pass
# 分析订单归属
all_other_orders = set()
for orders in other_orders.values():
all_other_orders.update(orders)
# 分类
only_mine = scan_orders & my_orders - all_other_orders # 仅在我司
only_others = scan_orders & all_other_orders - my_orders # 仅在其他承运商(异常!)
in_both = scan_orders & my_orders & all_other_orders # 同时出现(争议)
nowhere = scan_orders - my_orders - all_other_orders # 哪里都没有
# 识别异常订单属于哪个承运商
abnormal_details = []
for order in only_others:
for carrier, orders in other_orders.items():
if order in orders:
page = scanned_df[scanned_df['order_no'].apply(extract_base_order_no) == order]['page'].values
abnormal_details.append({
'order_no': order,
'belongs_to': carrier,
'pdf_page': page[0] if len(page) > 0 else None
})
summary = {
'scan_total': len(scan_orders),
'my_orders_count': len(only_mine),
'other_carrier_count': len(only_others), # 异常订单数
'disputed_count': len(in_both),
'not_found_count': len(nowhere)
}
return {
'my_orders': only_mine,
'other_carrier_orders': only_others,
'abnormal_details': abnormal_details,
'disputed_orders': in_both,
'not_found': nowhere,
'summary': summary
}
```
#### 使用示例(多承运商校验)
```python
# 检查订单归属
result = check_order_ownership(
scanned_df=scanned_df,
my_company_excel="智慧云彩对账单_2025年6月.xlsx",
other_carrier_excels=[
"友利速运对账单_2025年6月.xlsx",
"其他承运商对账单.xlsx"
]
)
# 检查是否有异常订单
if result['summary']['other_carrier_count'] > 0:
print(f"⚠️ 发现 {result['summary']['other_carrier_count']} 个异常订单!")
print("这些订单属于其他承运商,不应由我司支付:")
for item in result['abnormal_details']:
print(f" {item['order_no']} -> 属于 {item['belongs_to']} (PDF第{item['pdf_page']}页)")
else:
print("✓ 未发现异常订单,所有订单归属正确")
```
---
## 注意事项
1. **数据格式统一**: 确保日期、金额格式一致后再对账
2. **匹配规则灵活**: 先精确匹配流水号,再模糊匹配金额+日期
3. **差异原因分析**: 对未达账项要逐笔核实原因
4. **定期对账**: 建议至少每月进行一次银行对账
5. **凭证留存**: 对账报告需打印签字存档
6. **OCR 核对**: 扫描件 OCR 后应人工抽查,确保识别准确
7. **多承运商校验**: 诉讼场景下必须检查订单是否属于其他承运商,防止对方混入他人订单骗取运费
---
## 与其他技能协作
- **data-excel**: 处理复杂的 Excel 数据转换
- **feishu**: 将对账报告保存到飞书云文档
- **siyuan**: 记录对账工作日志到思源笔记