The University of Zululand’s Department of Computer Science recently hosted an intensive two-day workshop titled “Corpus Creation & Linguistic Text Processing Tools,” facilitated by Marissa Griesel, Project Manager at SADiLaR (South African Centre for Digital Language Resources). The event took place on March 17 and 18, 2026, at the D3/D4 laboratory, drawing participants from various academic and research backgrounds interested in natural language processing (NLP) and low-resource language research.
The event was officially opened by Professor Pragasen Mudali, who highlighted the department’s strategic motivation for hosting the workshop. He emphasised that the initiative aligns with the ongoing Telkom–NRF funded project, which focuses on advancing research in NLP and LLMs for low-resource South African languages, including isiZulu, isiXhosa, and Sepedi. He further noted the importance of addressing key challenges such as data scarcity and fostering meaningful community involvement in language resource development.
Providing additional context, Professor Matthew Adigun delivered a comprehensive overview of the Telkom–NRF project, particularly for Ms Griesel and newly involved postgraduate students entering the LLM research space. His presentation outlined the project’s objectives and its significance in strengthening local capacity in language technologies.
Day 1 of the workshop concentrated on SADiLaR’s digital resources, best practices in corpus collection, data cleaning techniques utilizing regular expressions, and text exploration using Voyant Tools. Participants engaged in practical exercises designed to enhance their understanding of corpus management and analysis.
Day 2 shifted focus to annotation principles, introduction to NCHLT (National Centre for Human Language Technologies) Web Services, and hands-on activities involving annotation and evaluation. Attendees practiced applying these skills to their own research, equipping them with practical tools for linguistic data processing.
Notable attendees included Dr. Gladness Myeni and Dr. Khayelihle Khumalo from the African Languages Department, both of whom expressed enthusiasm about the training. They highlighted the significance of such capacity-building efforts for advancing domain expertise and supporting the Telkom-NRF project’s goals.
The workshop marks an important step in strengthening UniZulu’s role in advancing digital language research and equipping researchers with the necessary tools to contribute to the development of inclusive and representative language technologies in South Africa.
By Sakhile Fatyi
