Notes for ICML Physics of LLM Talk
Source: https://youtu.be/yBL7J0kgldU?si=koiBhKpq3Cp1M8G7 research methodology deconstruct into building blocks, structure, knowledge, reasoning etc.. study in controlled way, idealized environment, control the data, tweak the params highly repeatable experiments 100m size model, universal laws 1xH100 within a day probe inner working knowledge extraction 2 types of data biography of N individuals QA data to extract the fact of the N individuals based on Biography Training data: N biographies, + N/2 QA data Test data: the other N/2 QA data If the model can perform well on the other N/2 individuals’ biography questions, then it has knowledge extraction capability Option 1: Pre train with both N biographies and N/2 QA result: good knowledge extraction Option 2: Pre train with biography data only, fine tune with QA result: bad knowledge extraction Option 3: augment the biography data for each person, pretrain with biography and fine tune with QA result: