\n\n\n\n My LlamaIndex Query Engine Handles Complex Data - AgntBox My LlamaIndex Query Engine Handles Complex Data - AgntBox \n

My LlamaIndex Query Engine Handles Complex Data

📖 12 min read2,381 wordsUpdated Mar 26, 2026

Hey there, tech fam! Nina Torres back again from agntbox.com. Today, we’re exploring something that’s been making waves in the AI community, especially for those of us who dabble in creating more personalized, context-aware AI experiences. I’m talking about LlamaIndex, and specifically, how its query engine has evolved to handle increasingly complex data interactions. Forget just throwing a PDF at it and hoping for the best; we’re looking at how to make it truly think across multiple, different data sources.

My inbox, like many of yours, has been flooded with questions about how to move beyond basic RAG (Retrieval-Augmented Generation) setups. People are building incredible applications, but they hit a wall when their AI needs to answer questions that require synthesizing information from a database, a set of documents, and a live API. That’s not a simple “find this text” problem anymore. It’s a “figure out what to ask where, then combine the answers” problem. And honestly, it’s a problem I’ve been wrestling with on a personal project for a client who runs a niche e-commerce site – imagine an AI assistant that needs to check product inventory (database), pull up user reviews (documents), and suggest related items based on live pricing (API). My initial LlamaIndex setup, while decent for the review part, completely fell short on the multi-source synthesis.

So, today, we’re going to explore LlamaIndex’s advanced query engine capabilities for multi-source data synthesis. This isn’t just about indexing different types of data; it’s about building a query engine that intelligently orchestrates information retrieval across those diverse sources to answer complex, multi-part questions. We’re talking about going from a simple retriever to a sophisticated routing and planning system.

The Evolution of “Asking a Question” in LlamaIndex

Remember when LlamaIndex first burst onto the scene? It was awesome for turning unstructured data into something an LLM could chat with. You’d load documents, build an index, and ask away. Simple, effective. But life isn’t always simple. Our data isn’t always in neat text files. It’s scattered across SQL databases, noSQL stores, APIs, and a bunch of PDFs.

The initial approach often involved creating separate indexes for each data source. You’d have your document index, your SQL index, maybe an API tool. Then, you’d manually decide which one to query based on the user’s input. That works for straightforward cases, but what about a question like, “What are the average ratings for products launched in the last quarter, and do any of them have open support tickets?”

That question requires:

  1. Querying a database for product launch dates and average ratings.
  2. Querying another system (maybe a separate document store or API) for open support tickets related to those products.
  3. Synthesizing both pieces of information to provide a coherent answer.

This is where LlamaIndex has really stepped up its game, moving from isolated retrievers to integrated query engines that can understand and act on such multi-faceted queries.

Beyond Basic Retrieval: Query Planning and Routing

The magic happens with what LlamaIndex calls “query planning” and “routing.” Instead of just fetching chunks of text, the query engine, often powered by an LLM itself, first tries to understand the user’s intent and then decides the best strategy to answer it. This involves:

  • Identifying Sub-Questions: Breaking down a complex question into smaller, independent questions.
  • Matching Sub-Questions to Tools/Indexes: Determining which specific data source (e.g., a SQL database, a vector index of documents, an API endpoint) is best suited to answer each sub-question.
  • Executing Queries: Running those sub-queries against the chosen tools.
  • Synthesizing Results: Taking the individual answers and combining them into a single, thorough response.

This isn’t just a theoretical concept; it’s practically implemented through things like LlamaIndex’s `QueryPipeline`, `RouterQueryEngine`, and the ability to define custom `Tools`.

Setting Up for Multi-Source Synthesis: A Practical Example

Let’s walk through a simplified version of my e-commerce client’s problem. Imagine we have three data sources:

  1. Product Database: A SQL database with product IDs, names, prices, and launch dates.
  2. Review Documents: A collection of user reviews (PDFs, text files) for each product.
  3. Inventory API: A simple API that returns current stock levels for a given product ID.

Our goal is to answer a question like: “Tell me about ‘Fancy Widget Pro’ – what’s its price, what do users say about it, and is it currently in stock?”

Step 1: Preparing Your Data Sources and Tools

First, we need to make each data source accessible to LlamaIndex as a “tool.”

A. SQL Database Tool

We’ll use LlamaIndex’s `SQLTableRetrieverTool` for this. Assuming you have a simple SQLite database named `products.db` with a `products` table.


from llama_index.core import SQLDatabase
from sqlalchemy import create_engine, text
from llama_index.core.tools import SQLTableRetrieverTool

# Create a dummy database and table for demonstration
engine = create_engine("sqlite:///products.db")
with engine.connect() as connection:
 connection.execute(text("""
 CREATE TABLE IF NOT EXISTS products (
 product_id TEXT PRIMARY KEY,
 name TEXT,
 price REAL,
 launch_date TEXT
 );
 """))
 connection.execute(text("""
 INSERT OR IGNORE INTO products (product_id, name, price, launch_date) VALUES
 ('FWP001', 'Fancy Widget Pro', 129.99, '2025-01-15'),
 ('MGS002', 'Mega Gadget Super', 249.00, '2024-11-01');
 """))
 connection.commit()

sql_database = SQLDatabase(engine=engine)

sql_tool = SQLTableRetrieverTool.from_instances(
 sql_database=sql_database,
 table_names=["products"],
 description=(
 "Useful for querying product information such as name, price, and launch date. "
 "Input should be a SQL query for the 'products' table."
 )
)

Nina’s Take: This `SQLTableRetrieverTool` is a lifesaver. Before it, I was writing custom functions to interface with databases, and it felt like I was reinventing the wheel every time. This tool makes it so much cleaner, though you still need to be mindful of prompt engineering to ensure the LLM generates good SQL queries.

B. Review Document Index Tool

For reviews, we’ll create a vector index from some dummy review documents.


import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool

# Create a dummy reviews directory and files
os.makedirs("reviews", exist_ok=True)
with open("reviews/FWP001_reviews.txt", "w") as f:
 f.write("Fancy Widget Pro reviews:\n")
 f.write("1. 'Absolutely love the Fancy Widget Pro! So fast and reliable.' - User A\n")
 f.write("2. 'A bit pricey, but worth it for the quality.' - User B\n")
 f.write("3. 'Improved my workflow significantly.' - User C\n")

with open("reviews/MGS002_reviews.txt", "w") as f:
 f.write("Mega Gadget Super reviews:\n")
 f.write("1. 'The battery life is incredible!' - User X\n")
 f.write("2. 'Confusing setup process.' - User Y\n")

# Load documents and create an index
documents = SimpleDirectoryReader("reviews").load_data()
review_index = VectorStoreIndex.from_documents(documents)
review_query_engine = review_index.as_query_engine()

review_tool = QueryEngineTool(
 query_engine=review_query_engine,
 metadata={"name": "review_tool", 
 "description": "Useful for answering questions about product reviews from user feedback documents. "
 "Input should be a specific product name or ID."}
)

Nina’s Take: The `QueryEngineTool` is your go-to for wrapping any existing LlamaIndex query engine into a tool. It’s incredibly flexible. Just make sure your `description` is super clear so the LLM knows when to use it.

C. Inventory API Tool

For the API, we’ll simulate a simple Python function as an API call using `FunctionTool`.


from llama_index.core.tools import FunctionTool

# Simulate an inventory API
def get_stock_level(product_id: str) -> str:
 """
 Returns the current stock level for a given product ID.
 Args:
 product_id (str): The ID of the product.
 Returns:
 str: A string indicating the stock level, e.g., "In Stock", "Low Stock", "Out of Stock".
 """
 if product_id == "FWP001":
 return "In Stock"
 elif product_id == "MGS002":
 return "Low Stock"
 else:
 return "Out of Stock"

inventory_tool = FunctionTool.from_defaults(fn=get_stock_level, 
 description=(
 "Useful for checking the current stock level of a product. "
 "Input should be a product ID (e.g., 'FWP001')."
 ))

Nina’s Take: `FunctionTool` is pure genius. You can wrap almost any Python function and expose it to your LLM. This is how you connect to actual APIs, internal services, or even run local scripts. It’s a significant shift for bringing external actions into your AI’s capabilities.

Step 2: Building the Router Query Engine

Now that we have our individual tools, we need a way for LlamaIndex to intelligently choose and use them. This is where the `RouterQueryEngine` comes in. It uses an LLM to decide which tool (or sequence of tools) to use based on the user’s query.


from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.llms.openai import OpenAI # Assuming you have OpenAI API key configured

# Initialize LLM (e.g., OpenAI) for routing and synthesis
llm = OpenAI(model="gpt-3.5-turbo") # Or gpt-4, depends on your needs and budget

# Combine all tools
all_tools = [sql_tool, review_tool, inventory_tool]

# Create the RouterQueryEngine
router_query_engine = RouterQueryEngine(
 selector=LLMSingleSelector.from_defaults(llm=llm),
 query_engine_tools=all_tools,
 verbose=True # Set to True to see the routing decisions
)

Nina’s Take: The `LLMSingleSelector` is the default, and often sufficient, way to tell the router which tool to use. For more complex scenarios, LlamaIndex offers other selectors or you can even build a custom one. The `verbose=True` is absolutely essential during development – it shows you what the LLM is “thinking” when it tries to route your query, which is gold for debugging.

Step 3: Querying the Multi-Source Engine

Let’s ask our complex question:


response = router_query_engine.query("Tell me about 'Fancy Widget Pro' – what's its price, what do users say about it, and is it currently in stock?")
print(response)

When you run this, you’ll see the `verbose` output showing the LLM’s thought process:

  • It will identify that “price” needs the `sql_tool`.
  • “What users say” needs the `review_tool`.
  • “Currently in stock” needs the `inventory_tool`.

The LLM will then execute each of these tools, get their respective answers, and finally synthesize them into a single, coherent response. It’s like having a mini-orchestra conductor for your data!

A typical output might look something like:


> Selecting query engine tool: sql_tool
> Selecting query engine tool: review_tool
> Selecting query engine tool: inventory_tool

The Fancy Widget Pro (product ID FWP001) costs $129.99. Users generally love it, describing it as "fast and reliable" and stating it "improved their workflow significantly," though some mention it's "a bit pricey." It is currently In Stock.

This is a vast improvement over manually querying each source. The LLM handles the orchestration, making the interaction much more natural and powerful.

Advanced Scenarios: Query Pipelines and Recursive Routing

The `RouterQueryEngine` is fantastic for choosing a single tool to answer a sub-question. But what if one tool’s output needs to be fed as input to another? Or if you need a specific sequence of operations that can’t be handled by a simple single-tool selection?

This is where `QueryPipeline` becomes incredibly powerful. It allows you to chain together multiple components, including tools, retrievers, LLMs, and even other query engines, in a directed acyclic graph (DAG) fashion.

Imagine a scenario: “Find all products launched in the last 6 months that have low stock and positive reviews.”

  1. Query SQL database for products launched in the last 6 months.
  2. For each product ID from step 1, check stock level via API. Filter for “Low Stock”.
  3. For remaining products, query review documents for sentiment. Filter for “positive reviews”.
  4. Synthesize the final list.

This is a multi-step, conditional process. You could build a `QueryPipeline` for this, where the output of one step becomes the input to the next, possibly with an LLM in between to process intermediate results or decide on the next step.

While we won’t build a full `QueryPipeline` example here (that’s a whole article on its own!), understand that it provides the flexibility for truly complex workflow automation within your LlamaIndex application. You can even embed a `RouterQueryEngine` within a `QueryPipeline` for recursive decision-making.

Actionable Takeaways for Your Next AI Project

  1. Map Your Data Sources: Before you write a single line of code, clearly identify all the data sources your AI needs to interact with. Understand their structure (structured, unstructured, API-driven).
  2. Define Clear Tools: For each data source, create a specific LlamaIndex `Tool` (e.g., `SQLTableRetrieverTool`, `QueryEngineTool`, `FunctionTool`). Crucially, write clear and descriptive `metadata` for each tool. This description is what the LLM uses to decide when to invoke it. Spend time on this; it’s the prompt engineering for your tools!
  3. Start with `RouterQueryEngine`: For initial multi-source projects, `RouterQueryEngine` is often the easiest entry point. It handles the LLM-based decision-making for you.
  4. Use `verbose=True` Extensively: Seriously, this is your best friend. It gives you insight into the LLM’s reasoning and helps you refine your tool descriptions when the router makes unexpected choices.
  5. Consider `QueryPipeline` for Workflows: If your questions involve sequential steps, conditional logic, or where the output of one tool directly feeds another, start thinking about `QueryPipeline`. It’s more complex to set up but offers unparalleled control over multi-step interactions.
  6. Iterate on Tool Descriptions: The LLM’s ability to pick the right tool depends heavily on how well you’ve described each tool’s purpose and expected input/output. Don’t be afraid to experiment with different phrasings.

The ability of LlamaIndex to intelligently orchestrate queries across different data sources is a major leap forward for building truly intelligent agents. It moves us away from brittle, hard-coded logic and towards systems that can reason about information needs and dynamically fetch data from the most appropriate places. My client’s e-commerce assistant is now much smarter, able to pull pricing from the database, user sentiment from reviews, and stock levels from the API, all from a single natural language query. It’s truly something to behold!

That’s it for today, folks! Go forth and build some incredibly smart, multi-source AI agents. Let me know what you’re building in the comments!

Related Articles

🕒 Last updated:  ·  Originally published: March 17, 2026

🧰
Written by Jake Chen

Software reviewer and AI tool expert. Independently tests and benchmarks AI products. No sponsored reviews — ever.

Learn more →
Browse Topics: AI & Automation | Comparisons | Dev Tools | Infrastructure | Security & Monitoring

Related Sites

AgntapiClawseoAgent101Agntai
Scroll to Top