Class AssistantService

java.lang.Object
ecmwf.common.ai.AssistantService

public class AssistantService extends Object
AssistantService provides an AI assistant for OpenECPDS documentation.

This service uses a two-model setup:

  • FAST model: used exclusively to rewrite user questions into optimized search queries for RAG (Retrieval-Augmented Generation).
  • DEEP model: always streams the final answer to the user, based on documentation segments retrieved from the RAG index.

The class supports:

  • Streaming AI responses with cancellation support.
  • RAG retrieval using vector embeddings and Lucene-based indexing.
  • Query rewriting for improved search coverage.
  • Segment reranking and hierarchical filtering to produce concise, grounded answers.

Thread safety:

Author:
Laurent Gougeon
  • Field Details

  • Constructor Details

    • AssistantService

      public AssistantService()
      Default constructor using base configuration from Cnf.

      Initializes FAST, DEEP, and embedding models and builds RAG index if needed.

    • AssistantService

      public AssistantService(String baseUrl, String fastModelName, String deepModelName, String embeddingModelName, Path docsPath, Predicate<Path> filter, String deepSystemPrompt)
      Full constructor.
      Parameters:
      baseUrl - base URL for Ollama API
      fastModelName - model name for the FAST model (query rewriting)
      deepModelName - model name for the DEEP model (answer streaming)
      embeddingModelName - model name for the embedding model
      docsPath - root path to documentation files
      filter - predicate to select which files to index
      deepSystemPrompt - system prompt text to guide DEEP model answers; if null, defaults to DEFAULT_DEEP_PROMPT
  • Method Details

    • askStreaming

      public String askStreaming(List<ChatConversation.Message> history, String question, com.fasterxml.jackson.databind.JsonNode context, Consumer<String> consumer, AtomicBoolean cancelled)
      Main entry point for streaming answers from the DEEP model.

      Automatically rewrites the question using FAST, retrieves relevant documentation segments from RAG, reranks, filters, compresses them, and streams the answer.

      Optionally disables rewriting if REWRITE_PROMPT_WITH_FAST_MODEL is false, in which case the original question is used directly for RAG retrieval.

      Parameters:
      history - multi-turn chat history
      question - user question
      context - optional additional user context as JSON
      consumer - callback that receives tokens as they are streamed
      cancelled - atomic flag that can cancel the streaming
      Returns:
      full answer as a string
    • shutdown

      public void shutdown()
      Shuts down the streaming consumer executor.

      Should be called at application shutdown to clean up resources.