大语言模型数据清洗 · 论文笔记(四)

AnnoLLM - Making Large Language Models to Be Better Crowdsourced Annotators

https://arxiv.org/html/2303.16854

Preview

  • 解释 - 标注 双阶段方法
    • LLM生成少量人类标注的解释
    • 自动构建思维链+fewshot提示词
    • 自动标注

Approach

从人类标注者的培训中可以发现,我们需要提供一定的引导、样例,才能规范人类标注一致性

模型标注也是同理

解释

使用GPT3.5进行生成解释:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Directions:

Given a search engine query: "google data studio sharepoint", first, consider what the user could have in mind when they type in the query and allow for misspellings or other ambiguity, then classify the relevance of keyword: "sharepoint migration tool file share" to the query into one of the following categories: "Not bad", "Bad".

Definitions of the categories:

- **"Not bad"** : The keyword is relevant to the users search query. This can include: broader or narrower product selection, competitor or alternative products, accessories, products often purchased together, and related topics as well as direct matches to the users search.
- **"Bad"**: The keyword is not relevant to the users search query. There is no relationship between the query and keyword. This includes but is not limited to: incorrect/unrelated product, unrelated topic, wrong location when location is important, cannot be used in place of query product nor are they commonly purchased together, etc.

Briefly explain why the relevance is "Bad", with a response length not exceeding 100 words.

构建提示词

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Given a search engine query, first, consider what the user could have in mind when they type in the query and allow for misspellings or other ambiguity, then classify the relevance of keyword to the query into one of the following categories: "Not bad", or "Bad". The definitions of the categories are:

- **"Not bad"**: the keyword is relevant to the users search query. This can include: broader or narrower product selection, competitor or alternative products, accessories, products often purchased together and related topics as well as direct matches to the users search.
- **"Bad"**: the keyword is not relevant to the users search query. There is no relationship between the query and keyword. This includes but is not limited to: incorrect/unrelated product, unrelated topic, wrong location when location is important, cannot be used in place of query product nor are they commonly purchased together, etc.

Please predict whether the keyword is relevant to the query or not. The answer should be exact "Not bad" or "Bad".

### Examples:
1. **Query**: google data studio sharepoint  
   **Keyword**: sharepoint migration tool file share  
   **Answer**: The relevance of the keyword "sharepoint migration tool file share" to the search query "google data studio sharepoint" is "Bad". This keyword is not relevant to the users search query as it pertains to a different tool or technology (SharePoint) that is not mentioned in the search query. The keyword focuses on a migration tool for SharePoint file sharing, while the search query is focused on Google Data Studio. There is no relationship between the two, so the keyword is not relevant to the users search query. Therefore, the relevance is "Bad".

2. **Query**: motorhomes sale  
   **Keyword**: rv sale used class c  
   **Answer**: The relevance of keyword "rv sale used class c" is "Not bad". The keyword "rv sale used class c" contains relevant information to the users search query "motorhomes sale". It includes a specific type of motorhome, "class c", that is being sold used, which is a relevant product selection for the users search. Additionally, the keyword "rv sale used class c" is a direct match to the users search and is commonly purchased together with the products in the search query. Therefore, the relevance is "Not bad".

3. **Query**: southern exposure seed exchange company  
   **Keyword**: uk poppy seeds  
   **Answer**: The relevance of the keyword "uk poppy seeds" to the search query "southern exposure seed exchange company" is "Not bad". This is because the search query is about a seed exchange company, and the keyword "uk poppy seeds" is a specific type of seed that could be offered by such a company. Even though the company mentioned in the search query is based in the southern hemisphere, they may still offer seeds from other regions, including the UK, which could include poppy seeds. Therefore, the relevance is "Not bad".

4. **Query**: nissan parts canada  
   **Keyword**: purchase tires  
   **Answer**: The relevance of "purchase tires" to the query "nissan parts canada" is "Bad". The keyword "purchase tires" is not directly related to the users search for Nissan parts in Canada. Tires are not a part of a Nissan car and therefore are not relevant to the search for Nissan parts. Additionally, the keyword does not relate to a common purchase with Nissan parts, and it is not a substitute for the search query. Therefore, the relevance is "Bad".

### Task:
**Query**: {query}  
**Keyword**: {keyword}  
**Answer**: 
使用 Hugo 构建
主题 StackJimmy 设计