Universitetsavisen
Nørregade 10
1165 København K
Tlf: 21 17 95 65 (man-fre kl. 9-15)
E-mail: uni-avis@adm.ku.dk
Ph.d.-forsvar
Ph.d.-forsvar — On 6 May 2021, Simon Hellemann Flachs, PhD student at Department of Computer Science will defend his PhD thesis titled "Computational Grammatical Error Correction: Bridging the Gap from Academia to Industry".
Date & Time:
Place:
Zoom: https://ucph-ku.zoom.us/j/63156025235?pwd=RWp4c0Iwbyt4UVZWc3hBM2pRSnFPZz09
Hosted by:
Department of Computer Science
Cost:
Free
Computational Grammatical Error Correction:
Bridging the Gap from Academia to Industry
Grammatical Error Correction (GEC) is the research field studying computational methods for correcting grammatical errors in text. These methods have the potential to improve human communication by enabling clear and error-free text.
While GEC has been studied thoroughly in academia, industrial adoption has been limited. This thesis aims to overcome some of the obstacles preventing industrial use.
In the first part of the thesis, we look into two methods for developing GEC systems without large amounts of expensive training data. Firstly, we show that artificially generated training data can be used to train robust systems for detecting subject-verb agreement errors. Secondly, we show that language models can be used for creating useful GEC systems, without using annotated training data.
In the second part of the thesis, we look into how GEC systems perform when not evaluated on text written by English language learners – we release a new GEC benchmark, CWEB, consisting of website text with corrections, and show that current GEC systems perform poorly on this domain.
In the final part, we focus on GEC for non-English languages, and show that GEC systems pre-trained on noisy data can be fine-tuned effectively on only small amounts of expert-annotated data, which opens up for creating inexpensive GEC systems in new languages.
Professor, Anders Søgaard, Department of Computer Science, University of Copenhagen
Assistant professor, Daniel Hershcovich Department of Computer Science,
University of Copenhagen
For an electronic copy of the thesis, please go to: https://di.ku.dk/english/research/phd/