Mariano Kamp

(Amazon Web Services (AWS))
hosted by Engineering with Generative AI Course

""Look Ma: I shrunk Bert!" -- Knowledge Distillation"

Large NLP models are already dazzling us with their awesome performance. But their size can make them cumbersome to handle, making it harder to experiment and more expensive. Let’s use knowledge distillation to leverage the power of a high capacity teacher model to help a smaller student model to learn better, creating a model that is wise beyond its size. The student model can even use a different network architecture than the teacher; one that better fits the downstream task. We’ll see step-by-step how knowledge distillation works and how we can mix and match architectural features that are needed to optimize the model performance. We answer questions like what impact does, for example, the type and length of the input have on choosing the right tokenizer and neural network architecture? We see how knowledge distillation effectively compresses a large model which reduces the inference latency that allows its use in use-cases like online bidding, fraud detection etc.


Time: Tuesday, 30.01.2024, 12:15
Place: 42-115

Termin als iCAL Datei downloaden und in den Kalender importieren.