Focal Research Area in Language Technology
Language Technology (LT) is the interdisciplinary study of computational systems that process—and to a certain degree ‘understand’—natural languages. As such, Language Technology is at the interface of Computer Science, Linguistics, and Psychology. Computational Linguistics and Natural Language Processing (NLP) are closely related names for this region of overlap between the disciplines. We say Language Technology because the term emphasizes the applied nature of our research, that is we are specifically interested in methods and technologies that are sufficiently scalable (and ultimately engineered to a sufficient degree) to enable practical demonstrations of our research.
With a short history going back to code cracking efforts around World War II, some ‘classic’ Language Technology applications are Machine Translation (enabling computers to translate from one language into another one) and various kinds of dialogue systems—for example voice interfaces to controlling complex in-car electronics. With the advent of massive on-line repositories of natural language content (e.g. the World-Wide Web), problems of search (or retrieval) and information extraction have gained rapidly increasing interest in the past decades. The concept of a semantic web is a viable vision; expecting, however, that large-scale semantic structuring of on-line content will be achieved fully manually—by human authors or editors—would seem overly optimistic. Web-scale semantic parsing—language technology in pursuit of its ultimate scientific goal, text understanding—will be a prerequisite to realization of the semantic web.
In 2007, the LNS research group at IFI was recognized as a focal research area by the UiO School of Mathematics and Natural Sciences (MN). The goal here is to ‘cluster’ research activities in Language Technology, so as to create a critical mass that enables the group to establish itself nationally and internationally. Language Technology at UiO has a long and proud history, going back to the 1960s and pioneering joint activities between researchers in mathematics and the humanities. Steadily growing demands on Language Technology and increased internationalization of research and training alike require sustained efforts, long-term planning, a good balance of breadth and focus, and active collaboration and exchange with leading academic players world-wide. The UiO focal research area in Language Technology makes it possible for the group to develop, over time, into a scientific environment that contributes actively to the advancement of Language Technology, both scientifically—at a competitive international level—and practically, by virtue of partnerships that enable the transfer of new knowledge into innovative applications.
Approaches, Projects, Visions
Language Technology is traditionally grounded in formal and computational accounts of the system of rules that govern human language use. After all, there is an infinite number of natural language utterances, and in everyday use a significant proportion of utterances have never been coined in this exact form before. The human ability to constantly produce new utterances, as well as the observation that natural language serves its purpose—communication— very effectively, both suggest strongly that there is indeed such a system of rules, with ‘grammar’ at its core. Research in our group is grounded in this assumption, and we employ (and develop futher) specialized descriptive formalisms that enable computers to take advantage of grammatical knowledge. LT group members are among the inventors and co-developers of one of the most widely used toolkits for precision grammar engineering and natural language applications, the Deep Linguistic Processing with HPSG (DELPH-IN) repository of linguistic resources and associated software. DELPH-IN technology is in active use at many dozens of R&D sites world-wide, and the de-facto standard of the DELPH-IN formalism, stability over time, and open-source support has enabled linguists to develop broad-coverage computational grammars of individual languages that serve as general-purpose resources in a variety of NLP research projects (and already a small number of commercial applications).
Through a strategic research partnership with the Center for the Study of Language and Information (CSLI) at Stanford University (USA), the UiO LT group focuses its current research on English language technology, specifically on methods and applications that capitalize on accuracy and high-quality results more than on robustness to out-of-scope inputs. One such effort in the recent past was the LOGON Machine Translation (MT) project, where in joint work with other Norwegian universities and Stanford, the consortium succeeded in protoytping an automated translation system from Norwegian to English that, within its clearly bounded subject area and genre, is highly competitive with other state-of-the-art approaches in terms of translation quality (at the cost of leaving about one third of a document untranslated, thus providing a candidate tool to a professional translator rather than an on-line service). Parallel to its MT work, the group investigates the use of its technology for large-scale semantic analysis of scholarly literature (in particular the on-line encyclopedia Wikipedia), aiming to advance tasks like information extraction and ontology learning on the basis of general-purpose, logical-form semantics.
Making the rules of grammar accessible to computation, however, is but one part of contemporary language technology. Natural language utterances are notoriously ambiguous. For example, the string Time flies like an arrow. may be interpreted in a variety of ways, including (among others) the observation that ‘time moves quickly just like an arrow does’, the instruction to ‘measure the speed of flying insects like one would measure that of an arrow’, or as a statement about a specific species of insects (‘time flies’, in contrast to, say, ‘fruit flies’) and their fondness of arrows. Obviously, some of these interpretations are much more plausible than others, with some bordering on non-sensical. Human language users make highly efficient use of natural language ambiguity, but it presents a major challenge to computational NLP. Among other parameters, earlier experience—that is observed frequencies of the usage of words, combinations of words, and grammatical structures—is a central factor in determining the most likely interpretation of an utterance. Language Technology today models such phenomena in terms of probabilistic models of language structure, mathematically complex systems that ‘learn’ generalizations over usage patterns from a training distribution.
The combination of formal, grammatical knowledge and rich probabilistic models has led to a third generation of NLP approaches, so-called hybrid systems that search to balance the use of linguistic description and machine learning. Again as part of the DELPH-IN repository, LT group members have helped create the infrastructure that supports the creation and maintenance of training data, acquisition of probabilistic models in a variety of frameworks and for various tasks, and quantitative evaluation of different configurations. Large-scale linguistic analysis and probabilistic modeling alike are computationally intensive. The group develops algorithmic improvements to core components and investigates methods and tools for distributed processing over standard high-performance computation (HPC) facilities. The group maintains a small specialized compute cluster, and through integration with the central UiO HPC group has access to massive computational infrastructure. On-going initiatives at the LT group include WeScience (semantic analysis of scholarly literature) and DELPH-IN21 (scaling DELPH-IN technology for HPC use).
Contact Information
The LT group is part of the IFI research group in Logic and Natural Languages, with offices at the Oslo Innovation Center (Forskningsparken). Current group members include
- Doctoral Fellow Liv Ellingsen
- Doctoral Fellow Gordana Ilić Holen
- Doctoral Fellow Elisabeth Lien
- Professor Jan Tore Lønning
- Professor Stephan Oepen
- Post-Doctoral Fellow Erik Velldal
- Doctoral Fellow Gisle Ytrestøl
- Associate Professor NN
- Doctoral Fellow NN
Stephan Oepen is the administrative head of the language technology group and can be contacted at oe@ifi.uio.no, (+47) 2284 0125. A set of slides presented at an internal seminar may provide further background information. Furthermore, a more detailed overview of the general field of Language Technology is maintained on-line by Hans Uszkoreit, one of our scientific partners.
