Kinetica, which offers real-time GPU-accelerated analytics, introduced a generative AI (GenAI) solution for enterprise customers that showcases what it calls “the next step” in retrieval-automated generation (RAG).
GenAI applications utilize RAG to access and integrate up-to-date information from external knowledge bases, ensuring responses go beyond large-language model (LLM) original training data. Because prevalent methods of enriching context (through vector similarity searches) are inadequate for quantitative data – as they are primarily designed to understand textual content – solutions cannot effectively support use cases that need to interface with real-time operational data.
Kinetica’s solution is powered by the NVIDIA accelerated computing infrastructure and NeMo, which is part of its AI enterprise software platform. It is founded on low-latency vector searches and the ability to perform real-time, complex data queries. This combination allows organizations to enrich GenAI applications with domain-specific analytical insight that is derived directly from operational data.
Kinetica has built native database objects that allow users to define semantic context for enterprise data. An LLM can use these objects to grasp the referential context it needs to interact with a database in a context-aware manner.
“Kinetica’s real-time RAG solution, powered by NVIDIA NeMo Retriever microservices, seamlessly integrates LLMs with real-time streaming data insights, overcoming the limitations of traditional approaches,” said Nima Negahban, Kinetica co-founder and CEO. “This innovation helps enterprise clients and analysts gain business insights from operational data, like network data in telcos, using just plain English. All they have to do is ask questions, and we handle the rest.”
Features are exposed to developers via a relational SQL API and LangChain plugins, allowing developers building applications to harness the enterprise-grade features that come with a relational database, including control over who can access the data, reduced data movement from existing data lakes and warehouses and the preservation of existing relational schemas.
“Data is the foundation of AI, and enterprises everywhere are eager to connect theirs to generative AI applications,” said Ronnie Vasishta, SVP of Telecom, NVIDIA. “Kinetica uses the NVIDIA AI Enterprise software platform and accelerated computing infrastructure to infuse real-time data into LLMs, helping customers transform their productivity with generative AI.”
Kinetica’s real-time generative AI solution removes the requirement for re-indexing vectors before they are available for query. Additionally, it can ingest vector embeddings 5X faster than the previous market leader, based on the popular VectorDBBench benchmark. Taken together, this provides best-in-class performance for vector similarity searches that can support real-time use cases.
Under the hood, Kinetica uses NVIDIA CUDA Toolkit to build vectorized database kernels that can harness the massive parallelism offered by NVIDIA GPUs. Kinetica has built analytical functions that are fully vectorized that cover fundamental operations such as filtering, joining and aggregating data that is commonly seen in most analytical databases, as well as specialized functions tailored for spatial, time-series and graph-based analytics.
This analytical breadth across different domains is particularly handy for domain-specific GenAI applications. For instance, in telcos, Kinetica’s generative AI solution can be used to explore and analyze pcap traces in real-time. This requires extensive use of complex spatial joins and aggregations and time-series operations.
Another implementation of this solution uses two data inputs: a stream of L2/L3 radio telemetry data and a vector table that stores telecom-specific rules and definitions, along with their embeddings. A domain-specific telco LLM that is trained on telecom data samples and schema is integrated with NVIDIA NeMo to create a chatbot application. The telco LLM converts user questions into a query that is executed in real time. The results of the query, along with any relevant business rules or definitions, are sent to NeMo, which then translates these results into a human-friendly response.