All Industries

Massive Context Windows Are Here - Hype or Game-changer?

A trend that is seen in the AI landscape recently is the expansion of context windows and the introduction of some massive context windows.

But what exactly is a context window and how does it function? Does the increased size of the context window suggest something for users and developers? Are there alternative solutions for the context window's limitations?

Though larger context windows are perceived as beneficial, it isn't straightforward and accurate. In this article, we will explain what context windows are and answer the questions mentioned above.

What is a Context Window?

The 'context window' is a crucial element of a large language model's (LLM's) performance and practical applications. It is the number of tokens that the LLM can take as input when generating responses—the longer the context window, the more data can be added into the prompt.

A ‘token' in models' context windows means a single unit of text. It can be thought of as a slice of language—can be a word, syllable, prefix, or even a punctuation. For example, the word "unimaginable” could be split into tokens "un”, "imagine”, and "able.” With this method, LLMs can process language more efficiently by dealing with complex words or phrases they may not have encountered previously, through the use of token combinations they already know.

By having a larger context window, more data can be fed to the LLM, increasing its short-term memory. Think of it as the AI's immediate workspace, where it keeps the most recent part of a conversation or text that it's processing.

LLMs Context Window Size Comparison

While there are various large language models available nowadays, each one differs when it comes to context window size. Here is a quick comparison between the context window size of leading LLMs.

GPT-3 by OpenAI has a context window of 2,000 tokens.
GPT-3.5-Turbo by OpenAI has a context window of 4,000 tokens. Another version of it, GPT-3.5-16k, can handle up to 16,000 tokens.
GPT-4 by OpenAI comes with fine-tuning abilities offers up to 8,000 tokens in context window size. Its successor, GPT-4-32k, has an even larger context window of up to 32,000 tokens.
Claude by Anthropic offers 9,000 token context. Claude is still in the beta stage, and the API is available to a limited number of territories as of this writing.
Claude 2 by Anthropic has a larger context window of up to 100,000 tokens compared to its previous model. It can handle a document consisting of 75,000 words in a single prompt.
Large Language Model Meta AI (LLaMA) is a family of LLM's released by Meta AI. It offers more than 100,000 tokens.
GPT-4-Turbo is OpenAI's latest offering of a more capable model that can handle up to 128,000 token context. This model has knowledge of world events up to April 2023.
As of this writing, Nous-Capybara, a new open source model, currently holds the record of the largest context window for any LLMs with an astonishing 200,000 tokens.

The Importance of a Large Context Window Size

Having a larger context window size improves an LLM’s in-context learning ability in prompts. This allows users to provide larger and better examples as prompt inputs, allowing LLMs to produce more accurate responses.

Can Process More Context A larger context window allows you to feed the model with a wider range of materials which can vary from full technical documents, source code databases, PDFs, legal documents, medical records, student notes and essays, research papers, software logs, and even news updates.
Better Understanding on Data Having a massive context window means the model can grasp and connect information from parts of the text that are distant from each other. This is particularly valuable in tasks requiring document summarization, extended AI conversations, or complex problem-solving where previous input and context is vital.
Better AI Conversations It is highly likely that an LLM with a small context window “forgets” the content of the most recent conversation, resulting in them veering off-topic. Moreover, they may “forget” their initial instructions after processing a few thousand words and generate responses based on the most recent information within their context window.

LLMs with a large context window, on the other hand, can “remember” more effectively, allowing them to deliver responses that are more relevant to the ongoing conversation and tasks.

The Challenges of Increasing an LLM’s Context Window Size

Sure enough, having a massive context window improves the way we use and interact with AI. However, it is not without any challenges both from a user and developer perspective. While it seems convenient to “just provide the model with all the data and information and let it figure it out on its own,” this approach has issues that everyone should know.

Cost Exhaustive Increasing the context window of a model also means an increase in computational demands and training cost. This results in longer processing times and the need for a more powerful hardware and resources. For real-time applications or resource-constrained situations, this can be impractical.

To put it in context, the estimated training cost for a model like LLaMA by Meta AI with an initial context length of 2,000 and embedding size of 4,000 is roughly ~$3 million. OpenAI’s ChatGPT–3 model with 2,000 context length cost a little over $4.6 million to train, while the recent ChatGPT–4 with over 32,000 context has a training cost of over $100 million. The figures provided show how the size of the context window greatly affects the training cost.
Performance and Accuracy is NOT Guaranteed One fact about large context windows is that it does not always translate to better performance and accuracy—some models struggle and tend to repeat or contradict themselves. Its benefits varies depending on the task at hand, as some tasks may not benefit from the additional context, particularly those with data and information that can be combined in a smaller context window.

In a recent study titled "Lost in the Middle," by Nelson F. Liu and colleagues from Stanford, they demonstrated that advanced LLMs frequently struggle to extract significant information from their context windows, particularly when the information is buried within the middle area of the context.

According to their findings, LLMs do best when provided with fewer, more relevant data in the context, rather than a huge amount of information.

Takeaways

Massive context windows for LLMs are an exciting new development, but come with tradeoffs. In particular, the computational cost scales rapidly, risk of hallucinations goes up, and models can struggle to extract relevant details from too much context.

On the contrary, further innovations may mitigate some of the current downsides of large context windows. Techniques that help models focus on the most salient parts could unlock more benefits while reducing spurious outputs. The future capabilities of LLMs will depend both on hardware advances and better algorithmic approaches to utilizing ever growing amounts of knowledge.

While massive context windows are an exciting milestone, their practical value is complex. We are still discovering the right way to leverage these capabilities for beneficial and safe deployment. Strategic use of context and further research will be key in realizing their promise while avoiding pitfalls.

References:

https://www.e2enetworks.com/blog/the-competitive-advantage-of-100k-context-window-in-llms#:~:text=Large%20context%20windows%20are%20crucial,to%20achieve%20the%20same%20results.
https://www.hopsworks.ai/dictionary/context-window-for-llms
https://www.techtarget.com/whatis/definition/context-window#:~:text=A%20large%20context%20window%20is,related%20to%20the%20target%20token.
https://matthewdwhite.medium.com/the-allure-of-larger-context-windows-a66ed5d6420b#:~:text=Larger%20context%20windows%20enable%20models,generate%20more%20contextually%20rich%20responses.
https://www.respell.ai/post/what-are-context-windows-and-what-do-they-do
https://www.linkedin.com/pulse/whats-context-window-anyway-caitie-doogan-phd
https://www.pinecone.io/blog/why-use-retrieval-instead-of-larger-context/
https://ai.plainenglish.io/context-window-size-and-language-model-performance-balancing-act-2ae2964e3ec1#:~:text=Making%20the%20context%20window%20bigger,a%20wider%20range%20of%20information.
https://arxiv.org/pdf/2307.03172.pdf
https://openai.com/blog/new-models-and-developer-products-announced-at-devday
https://huggingface.co/NousResearch/Nous-Capybara-34B
https://twitter.com/LouisKnightWebb/status/1724039951610761343

Have an Idea?

Services Technologies Industries • Agencies • For CMO's • For CFO's Fractional CTO About Us

Case Studies • Doyo • The Bullpen • Logos • ToastMail • Jet.GD Blog • Navigating Common Pitfalls in Software Development

Careers New Customers Contact