Kongregate Corpus

Kongregate: www.kongregate.com

Kongregate is a web site primarily aimed at providing online games to be played using a web browser with the Adobe Flash plugin.  It also includes some aspects of community building, in particular the possibility to join chatrooms for realtime communication between players.

The corpus we provide consists of logs of these online chats, recorded on different dates from different chatrooms. The training data set contains about 150.000 messages.

The zip file contains 12 files (one for each chatroom), having the following format:

<thread id="kg_3_15546"> <!-- the id is kg_ and an id containing the recording session and the chatroom number -->
<posts> <!-- the list messages from that chatroom -->
<post id="kg_3_15546_342">
<user><username>LordShadow</username></user> <!--  username as used in the chat -->
<body> The user comment</body>
</post>
<post> <!-- other posts --></post>
<post> <!-- other posts --></post>
</posts>
</thread>

AttachmentSize
kongregate.zip2.44 MB