Swedish as Distorted English (Intro)

Swedish is a Scandinavian language with some ten million speakers found all over the globe. Most of them live in Sweden, of course, but they travel a lot and often settle down in remote areas such as Denmark, Brussels, California or the Pacific Islands. Maybe you’d care to address one of them in their native tongue, or read a book by a Swedish author as it was created? Then, this will be a good place to start, or return to, once you have started to pick it up.

We employ a strategy inspired by the so-called Noisy Channel Model, primarily associated with Claude Shannon and the Noisy Channel Coding Theorem, but of more interest to us in the way it has been used to model translation in the field of Statistical natural-language processing (SNLP).

The model has three components: a Transmitter (or Source), that produces signals (x) of some sort, a Channel through which the signals have to pass, and are likely to be distorted along the way, and a Receiver which has to deal with the distorted signal (y). Graphically, the model may be depicted as below:

noisychannelFig. 1. The Noisy Channel Model. Källa: Alchetron.com

Here we will focus our attention on the channel and, more specifically, on the relation between x and y. In our case, x will be an English sentence and y will be a Swedish sentence and the relation can be seen as some kind of translation, or coding.

When the model is applied in machine translation research, translation is equated with decoding, i.e., finding x given y. What we see at the receiving end is y, but what we want is x, something in a language we presumably know. Moreover, computing x requires a model of the Transmitter which in SMT will be a probabilistic one, but this model is not needed for our aims. Here, we will only deal with differences and similarities between x and y.

More specifically we will be concerned with contrastive aspects of the two languages, and, as in many other contrastive studies (e.g., Aijmer, 1999; Johansson, 2001), we use parallel corpora as our data. Our approach will be more formal, and in some sense, more superficial. Using the metaphor of the Channel, we will try to characterize Swedish as a form of (distorted) English. The question then becomes ”What kind of general and more specific distortions does the Channel cause?”

To answer this question we also need to say something about the properties of the signals, i.e., the language data we are dealing with. Simply stated, they are strings of words, one after the other. What, then, is a word? We take the view, familiar from structural linguistics and modern grammar models such as Head-Driven Phrase Structure Grammar (HPSG) that words are signs, relating meaning, form and matter. Strings of words are also signs, with the added complexity that they partake in relations and create units such as phrases, clauses and sentences.

So, a word may be seen as an object with three layers. The layer we can perceive, —  the letters of a piece of writing, or the sounds of a spoken utterance — is an outer layer acting as a kind of cover, or peel, around the rest. We will refer to it as the character layer. The second layer is the form, which can be said to give shape to the third layer, which is the meaning. Forms will be described in grammatical terms and we make the assumption that they can occur across languages. Thus, an English noun and a Swedish noun are assumed to have a formal property in common. We will not have much to say  about meanings, but we will insist that there can be sameness of meaning between languages, and that we can make judgements on meaning relations, not only whether two words have same or different meanings, but also whether one meaning is more specific, or more general than another.

When a word enters the channel it may sometimes pass through without change, but more often something will be done to it. It could be modified at one or more layers, removed, replaced, moved around, broken up or attached to other words, all of this contributing to its distortion.

 

Next chapter

Lämna ett svar