MBTI9k — Corpus of Reddit comments and post labeled with MBTI personality types

Version: 1.0
Release date: June 6, 2018

1 Description

MBTI9k is a dataset of Reddit posts and comments labeled with MBTI personality types. It consists of several datasets:

The dataset acquisition process is described in:

Matej Gjurković and Jan Šnajder (2018). Reddit: A Gold Mine for Personality Prediction . Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media.

If you use the MBTI dataset for your own work, please cite the above paper. The BibTeX citation is:

@inproceedings{gjurkovic2018reddit,
  title={Reddit: A Gold Mine for Personality Prediction},
  author={Gjurkovi{\'c}, Matej and {\v{S}}najder, Jan},
  booktitle={Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media},
  pages={87--97},
  month={June},
  year={2018},
  address={New Orleans, Louisiana, USA},
  url={http://aclweb.org/anthology/W18-1112},
  doi={10.18653/v1/W18-1112 },
  publisher={Association for Computational Linguistics}
}

2 Dataset

The datasets are available on request. Please contact me at matej.gjurkovic@fer.hr.

3 License


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.