SICK数据集简介

xiaoxiao2021-02-28  114



官方网址:http://clic.cimec.unitn.it/composes/sick.html

SICK是Sentences Involving Compositional Knowledge 的首字母缩写

SICK数据集包含一万个英语句子对,  来自于两个已经存在的paraphrase数据集:一个是8k imageFlickrbuilt, (http://nlp.cs.illinois.edu/HockenmaierGroup/data.html) 另一个是SEMEVAL-2012的语义文本相似度视频描述数据集 (http://www.cs.york.ac.uk/semeval-2012/task6/index.php?id=data).每个句子对按照含义的关系标注以及两者的蕴含(entailment)关系标注

SICK 的发布遵照以下协议:Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US)

在发布的研究中应用SICK时,请应用:M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi and R. Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. Proceedings of LREC 2014, Reykjavik (Iceland): ELRA.

SICK数据集用于SemEval 2014 - Task 1:Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment

文件结构: tab分割的文本文件

各个域的定义:

- pair_ID: 句子对ID

- sentence_A:  A句

- sentence_B:  B局

- entailment_label: 文本蕴含关系的标注(gold truth/ground truth) (NEUTRAL, ENTAILMENT, or CONTRADICTION)

- relatedness_score: 语义关系度的标注分数 gold score (on a 1-5 continuous scale)

- entailment_AB: A到B的蕴含关系entailment for the A-B order (A_neutral_B, A_entails_B, or A_contradicts_B)

- entailment_BA: B到A的蕴含关系entailment for the B-A order (B_neutral_A, B_entails_A, or B_contradicts_A)

- sentence_A_original: 导出句子A的原始句子original sentence from which sentence A is derived

- sentence_B_original: 导出句子B的原始句子original sentence from which sentence B is derived

- sentence_A_dataset: 句子A的来源数据集dataset from which the original sentence A was extracted (FLICKR vs. SEMEVAL)

- sentence_B_dataset: 句子B的来源数据集dataset from which the original sentence B was extracted (FLICKR vs. SEMEVAL)

- SemEval_set: set including the sentence pair in SemEval 2014 Task 1 (TRIAL, TRAIN, or TEST)

转载请注明原文地址: https://www.6miu.com/read-2630516.html

最新回复(0)