三个子系统:ARENA、LANTERN和MOCHA
数据集搭建:面向子系统的功能搭建
正确性验证集:包含十多个类别的问题,主要为数据库专有名词解释,生成SQL语句,DBEDU相关介绍,连续的DBEDU查询、单个DBEDU查询。测试集部分内容如下:
Q1: Who are you?
Q2: Generate a SQL to retrieve items with an extended price greater than 100
Q3: How will select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_totalprice > 10
and l_extendedprice > 10
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate execute?
Q4: What is hash join?
Q5: How will the SQL execute without hash join and index scan?
Q6: Can you give me some other plans?
Q7: What is relational processor?
prompt生成器:基于规则的生成,用于接收用户输入并生成对应的prompt引导gpt,并提供必要的信息。
datalink processor:提供LLM-子系统之间的连接
判别器问题:我原本以为要用的,后来发现变成了gpt判断问题类型💔,可能gpt更聪明一点吧
整体框架图:
