I'm currently doing research for my bachelor thesis on how to
automatically extract FAQs from unstructured data.
For this I've built a system automatically performing the following:
- Load thousands of conversations from forums and mailing lists (don't
mind the categories there).
- Build categorization solely based on the conversation's texts (by
- Pick the best modelled categories as basis for one FAQ each.
- For each question (first entry in a conversation) find the best reply
from its answers.
- Select the most relevant and well formatted question/answer-pairs for