Context engineering using Mistral Chat completions in Elasticsearch

Test Elastic's leading-edge, out-of-the-box capabilities. Dive into our sample notebooks, start a free cloud trial, or try Elastic on your local machine now.

Context engineering is easy with Elasticsearch, we will demonstrate that using Mistral Chat completions.

Throwing axes from a skateboard

I have vivid memories of me playing Super Adventure Island, a Super Nintendo game that featured a caveman character on a skateboard throwing stone axes at enemies. However, none of the LLMs I asked, by the time of this writing, could tell me what game that was. In this blog, I will show you how to influence LLMs by using some context engineering.

Why is this important? Enterprises often deal with complex domain-specific knowledge—product catalogs, internal documentation, regulations, or customer support data. Out-of-the-box LLMs might give generic answers. By applying Context Engineering, you ensure the model’s responses are grounded in your company’s data, increasing reliability and trust, of course it works for video game titles as well.

Contextualizing context engineering

We can outline context as a set of attributes that are intended to skew or influence the data passed to an LLM, to ground its responses. Context engineering is then considered the practice of carefully designing, structuring, and selecting these contextual attributes so that the LLM produces outputs that are more accurate, relevant, or aligned with a specific goal.

As Dexter Horthy outlines as his 3rd principal of 12-Factor Agents, it's important to own your context to ensure LLMs generate the best outputs possible.

It’s effectively a combination of prompt design, data curation, and system instruction tuning, aimed at controlling the behavior of the model without modifying its underlying parameters.

Using Mistral Chat completions in Elasticsearch

You can run all the code below on the mistral-chat-completions.ipynb notebook. We are going to ask Mistral about Super Nintendo games from the 90’s, and then try to skew its response towards mentioning a very popular game called Super Adventure Island.

Install Elasticsearch

Get Elasticsearch up and running either by creating a cloud deployment (instructions here) or by running it in Docker (instructions here).

Assuming you are using the cloud deployment, grab the API Key and the Elasticsearch host for the deployment as mentioned in the instructions. We will use them later.

Get a Mistral API key

To call Mistral models from Elasticsearch you need:

A Mistral account (La Plateforme / console).
An API key created in your console (the key is displayed once — copy it and store it securely).

Mistral also announced a free API tier and pricing adjustments, so there’s often a free or low-cost way to start.

Testing

We can use Dev Tools to test the _inference endpoint creation and the _stream API. The inference endpoint can be created like below:

Now let’s ask Mistral about Super Nintendo games from the 90s, but let’s pick one that is not popular at all. There was a game called Super Adventure Island that featured a character on a skateboard throwing stone axes. We will use _inference/chat_completion API to stream the response.

Unsurprisingly, the answer will not mention Super Adventure Island, since it’s not a popular game; instead, Mistral would answer with games like Joe & Mac (featuring stone axes but not skateboards) and will tell you that there is no game featuring both. Note: This is not Mistral’s fault necessarily, as none of the other LLMs we tested answered with the correct game.

The response structure is like below, with:

event: The type of event being sent (here always "message").

data: The JSON payload with the actual chunk of response

Note that the final object also contains metadata like finish_reason and usage. Our notebook will take care of parsing this structure and displaying it like text from a chat.

Running the example

You will need the ELASTICSEARCH_HOST,ELASTICSEARCH_API_KEY and MISTRAL_API_KEY credentials set in the mistral-chat-completions.ipynb notebook, first and foremost.

The model we are using is the mistral-large-latest, set in MISTRAL_MODEL - the list of available models can be found at Mistral’s Models Overview,

The notebook has code to send requests to Elasticsearch, parse the response, and stream it in the console. Let's ask the question again, now using Python.

Engineering the context for Mistral with Elasticsearch

Alongside the notebook we have a dataset called snes_games.csv containing all 1700+ Super Nintendo Games, including title, publisher, category, year of release in both United States and Japan and a short description - this will serve as our internal database, which we are going to index to Elasticsearch like so:

Note, we are copying title, category, and description fields to description_semantic (of type semantic_text); this is all we need to generate sparse vector embeddings for our fields without requiring separate embedding models or complex vector operations - it uses ELSER.

Semantic search

Once we indexed our dataset (follow the notebook for details), we are ready to search on the index. There are many ways to combine lexical search with semantic search, but for this example we are going to use only description_semantic field to issue a semantic search query:

The initial results are encouraging; the same query, "What SNES games had a character on a skateboard throwing axes?", successfully identifies the game we're seeking, including the previously unknown (to me) Super Adventure Island II or 高橋名人の大冒険島II in Japan!

Now that we have more domain knowledge available, we are ready to feed this data to Mistral to help it make better responses, but without completely taking over Mistral’s ability to reason.

RAG chat as part of context engineering

Here is a simple RAG-augmenting function to include our search results (the full document _source) in the context:

Asking again, we get a much more concise, and grounded, response:

Conclusion

We have covered most of context engineering with user prompt, instructions/system prompt, and RAG, but this example can be extended to include short and long-term memory as well, as they can be easily represented as documents in a separate index in Elasticsearch itself.