Detailed Explanation
The context window is the maximum number of tokens an LLM can take as input and hold in its working memory during a single prompt generation cycle. If a conversation or document exceeds the context window, the model will 'forget' the earliest parts of the input. Modern models boast massive context windows, such as Google's Gemini 1.5 Pro, which can handle over 1 million tokens.