Corpus of Claim Microstructures

Version: 1.0
Release date: July 29, 2017

1 Description

The Claim Microstructure dataset contains posts split into claim segments, translated into claim microstructures. The dataset is created to explore how claims can be structured using a restricted language and grammar. Additionally, it was used to help solve the stance classification task; using claim microstructure information when determining stance of a claim. The task and the dataset are described in:

Filip Boltužić and Jan Šnajder (2017). Toward Stance Classification Based on Claim Microstructures . Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2017), Copenhagen. Association for Computational Linguistics.

If you use the Claim-microstructure dataset for your own work, please cite the above paper. The BibTeX citation is:

@InProceedings{boltuzic2017back,
  author    = {Boltu\v{z}i\'{c}, Filip  and  \v{S}najder, Jan},
  title     = {Toward Stance Classification Based on Claim Microstructures},
  booktitle = {Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis},
  month     = {September},
  year      = {2017},
  address   = {Copenhagen},
  publisher = {Association for Computational Linguistics}
}

2 Dataset

The dataset is available from here: TakeLab-claim-microstructure.tar.gz.
There are two files in the archive:

json file with containing user 100 user posts, split into 920 claim segments and paraphrases, annotated with 882 microstructures (A1) and 842 microstructures (A2), and
hierarchical list of 307 allowed concepts.

The json schema of the user posts, claim segments, paraphrases and microstructures is as follows:

[
        {
            "segment_id": "..", 
            "post_id": "..", 
            "post_text": 
            "segment_text": "..", 
            "segment_paraphrase": "..", 
            "a1_log_claim": "..", 
            "a1_log_claim_quality_score": "..", 
            "a1_stance": "..", 
            "a2_log_claim": "..", 
            "a2_log_claim_quality_score": "..", 
            "a2_stance": "..", 
        }, 
]

3 License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.