Posts

Showing posts from September, 2024

Notes for ICML Physics of LLM Talk

 Source:  https://youtu.be/yBL7J0kgldU?si=koiBhKpq3Cp1M8G7 research methodology deconstruct into building blocks, structure, knowledge, reasoning etc.. study in controlled way, idealized environment, control the data, tweak the params highly repeatable experiments 100m size model, universal laws 1xH100 within a day probe inner working knowledge extraction 2 types of data biography of N individuals QA data to extract the fact of the N individuals based on Biography Training data: N biographies, + N/2 QA data Test data: the other N/2 QA data If the model can perform well on the other N/2 individuals’ biography questions, then it has knowledge extraction capability Option 1: Pre train with both N biographies and N/2 QA result: good knowledge extraction Option 2: Pre train with biography data only, fine tune with QA result: bad knowledge extraction Option 3: augment the biography data for each person, pretrain with biography and fine tune with QA result:

MIT Efficient ML Course Notes and Highlights

  Personal highlights Memory movement is more expensive than computation Network latency is more significant than computation with same memory consumption, we want the network to have as much computation as possible to increase accuracy Common technique: Pruning, Quantization, Distillation different level of grouping and granularity used in pruning, quantization, parallel execution Common evaluation and optimization criteria weight significance, activation significance, tensor wise, channel wise, batch wise … l2 loss, KL divergence, accuracy, latency, number of computation, memory usage Common ideas to optimize a neural network structure using above techniques architecture option as a trainable parameter and additional loss or KD divergence Optimize the architecture params with regular weights either together or freeze one and optimize the other iteratively iteratively prune/ quantize / distill and evaluate after fine tune in each round abrasion study, delete one