Københavns Universitet
Uafhængig af ledelsen


PhD defence about learning from educational data by Stephan Sloth Lorenzen

Ph.d.-forsvar — PhD defence about Learning from Educational Data - Improving Methods and Theoretical Guarantees for Data Mining by Stephan Sloth Lorenzen from Department of Computer Science


Date & Time:

HCØ AUD 10, Universitetsparken 5, 2100 Copenhagen Ø

Hosted by:
Datalogisk Institut



This thesis summarizes my PhD project. The structure of the thesis, and my project, follows the philosophy behind the Danish Center for Big Data Analytics driven Innovation (DABAI); in collaboration with companies, we develop solutions for educational data mining. Taking inspiration from the challenges faced, we define and investigate research problems within the areas of algorithms and machine learning.

During my project, I have worked with the Danish companies Clio and MaCom. With Clio, the main objective has been to provide teacher insight about students in primary school. We do so through performance prediction in an online quiz system and by analyzing behavioral patterns observed in log data, in order to determine optimal study behavior. With MaCom, we investigate methods for detecting ghostwriters in high school; external authors hired by students to write their essays. We extend this work to an analysis tool for analyzing and tracking writing style changes for high school students, providing insights for teachers.

Based on the problems faced while working with Clio, we develop novel techniques for improving budgeted maximum inner product search, an important algorithmic ingredient in many data mining methods.

Furthermore, we investigate theoretical bounds for majority vote classifiers, providing theoretical guarantees for the random forest classifier. While these bounds are often still too loose for practical uses, the area of research is important, as highlighted by our work with MaCom.

Finally, the thesis concludes with an overview of the company collaboration and a discussion of the challenges faced during the collaboration.​​

Assessment Committee

Professor Jakob Grue Simonsen , computer science
Associate Professor Troels Andreasen , RUC
Associate Professor Jaap Kamps , University of Amsterdam


Principal Supervisor Stephen Alstrup
Co-Supervisor Mikkel Abrahamsen
Co-Supervisor Christian Igel

For an electronic copy of the thesis, please contact phdadmin@di.ku.dk.