Schäfer: Bachelor Thesis - Random Feature Transformer for language translation

Schäfer: Bachelor Thesis - Random Feature Transformer for language translation

by Jörg Schäfer -
Number of replies: 0

We offer the following bachelor thesis in our research group:


Introduction into the topic

Transformer models revolutionized the field of natural language processing (NLP) by introducing a novel architecture that achieved state-of-the-art performance across various language tasks. Unlike previous sequential models like recurrent neural networks (RNNs), transformers leverage self-attention mechanisms to weigh the significance of different words in a sentence, enabling parallel processing of words and capturing long-range dependencies more effectively. Introduced by Vaswani et al. in 2017, transformers quickly became the backbone of many cutting-edge NLP applications, including machine translation, text generation, sentiment analysis, and many more.

Task:

The heart of the transformer architecture lies in its learnable embedding matrices, namely , , and , which evolve during training. A pertinent research inquiry emerges:

What is the efficacy of transformers when we forego learning these matrices and opt for fixed random alternatives instead?

This deviation represents a random feature approach. To empirically evaluate this, we intend to compare the performance using BLEU score between a vanilla transformer model, such as the Annotated Transformer, and its counterpart utilizing fixed random embedding matrices. This investigation aims to elucidate the importance of learned embeddings in transformer models and the potential consequences of employing random features in place of learned ones.

Expectation:
  • Implement a transformer architecture with random embedding matrices. You can start from a given codebase like the Annotated Transformer
  • Find a “good” (often used) language translation task and compare the performance of the vanilla transformer with the random feature type transformer
  • Incorporate visual representations of your findings within your thesis
Skills Required:
  • Basic understanding of Machine Learning
  • Experience in Python
  • Programming or modifying ML pipelines (Pytorch or Tensorflow)
  • Profound interest or expertise in the attention mechanism

Supervisor: Marius Lotz (mailto:marius.lotz@fb2.fra-uas.de)