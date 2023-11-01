The words, and in some cases parts of words, such as the plural marker"s" or the prefix"un-", are stored in the model assuch as . That one is for Paris. The first number represents distance and direction east-west from Greenwich. The second does the same for distance from the equator.also tell us about the relationship between the places. Paris and Orléans's numbers are close together, because the two places are close together.
The basic predictive text in SMS apps, in contrast, only really has one dimension; what is the word which most commonly, in all scenarios, comes next. But crucially an LLM is still, deep down, only figuring out what word - or sequence of words - is most likely to come next. It can do so much better than predictive text for two related reasons;"transformers" and"attention".are in its utterance.
First the LLM weights all the relationships between all the words it knows, in thousands of dimensions, based on its immense corpus of training data. But then, crucially, it looks at what words have come before and reweights those associations. It is possible that within the model's computations,"Tidy" has a close relationship with"utility" and"tools" and that informs the output. .
That reweighting step is what the LLM technicians call a “transformer”, and the principle of re-evaluating the weights based on the salience given to previous bits of the text is what they call “attention”.
