Thread Discourse Structure Analysis
of Web User Forum Data Li WANG
Li WANG PhD Student
Language Technology Group
Department of Computing and
University of Melbourne
12 December 2013 (Thursday)
4:00pm - 5:00pm
Seminar Room 2.4, Level 2
School of Information Systems
Singapore Management University
80 Stamford Road
We look forward to seeing you at this research seminar.
Allow us to share this seminar with your co-workers
For SMU Community:
For External Visitors:
Web user forums (or simply "forums") are online platforms for people to discuss and obtain information via a text-based threaded discourse, generally in a pre-determined domain (e.g. IT support or DSLR cameras). Due to the sheer scale of the data and the complex thread structure, it is often hard to extract and access relevant information from forums. To address this problem, we propose the task of automatically parsing the discourse structure of forum threads, for the purpose of enhancing information access and solution sharing over web user forums.
The discourse structure of a forum thread is modelled as a rooted directed acyclic graph (DAG), and each post in the thread is represented as a node in this DAG. The reply-to relations between posts are then denoted as directed edges (LINKs) between nodes in the DAG, and the type of a reply-to relation is defined as a dialogue act (DA). To parse the discourse structure of threads, we take several approaches. The first method uses conditional random fields (CRFs) to either classify the LINK and DA separately and compose them afterwards, or classify the combined LINK and DA directly. Another technique we adopt is to treat this discourse structure parsing as a dependency parsing problem, which is the task of automatically predicting the dependency structure of a token sequence, in the form of binary asymmetric dependency relations with dependency types. We obtain high discourse structure parsing F-scores with the proposed methods.
Furthermore, we investigate ways of using thread discourse structure information to improve information access and solution sharing over web user forums. In particular, we explore the tasks of thread Solvedness classification (i.e. whether the problem asked in a thread is solved or not), and thread-level information retrieval over forums. Our experiments show that using the discourse structure information of forum threads can benefit both tasks significantly.
About the speaker
Li WANG is a final year PhD student from the Language Technology Group of The University of Melbourne. His research interests lie in knowledge discovery and extraction of social media data. Currently, his research mainly focuses on improving information access over web user forums. Li obtained a BE from Wuhan University, and Master of Information Technology from The University of Melbourne. He has published papers at top-tier NLP conferences including EMNLP, COLING, IJCNLP and CoNLL, and was awarded Google Plenary Highlight Paper Award.
LARC is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office, Media Development Authority (MDA).