Overcoming AI Challenges with Data Chunking and RAG

We discover how innovative techniques like data chunking and Retrieval Augmented Generation (RAG) are reshaping the landscape of AI

The management and processing of data remain pivotal in the ever-evolving landscape of AI. As data sets grow exponentially larger and more complex, traditional methods of handling information become increasingly inefficient. However, amidst this challenge, data chunking is an innovative approach that not only streamlines data management but also enhances AI capabilities, fostering more efficient and scalable solutions.

At its core, data chunking involves breaking down large volumes of data into smaller, more manageable chunks or segments. This process facilitates easier storage, retrieval, and processing of information, enabling AI systems to handle vast data sets with enhanced efficiency. By partitioning data into manageable chunks, AI algorithms can operate more swiftly, reducing processing times and resource utilisation.

The potential of data chunking

Data chunking holds immense potential across various industries, revolutionising data management practices and unlocking new possibilities for AI-driven solutions. In healthcare, for instance, medical imaging data can be chunked for faster analysis and diagnosis, leading to improved patient outcomes. In finance, large-scale transaction data can be segmented for fraud detection and risk assessment, bolstering security measures. Similarly, in manufacturing, sensor data from IoT devices can be chunked for real-time monitoring and predictive maintenance, optimising operational efficiency.

Deloitte Research Leader Chris Areknberg says: “If data is the new oil, as many have said, large language models (LLMs) and diffusion models may offer a higher-performance engine to help make it work. Many companies have accumulated large amounts of data that generative AI can help them operationalise. Generative AI (Gen AI) can offer them a better lens on their data, combining a conversational and visual interface with the ability to reckon with vast troves of data far beyond human reasoning. Gazing across 2024, more companies may see the influence of Gen AI not only on their operations and product lines but also within their C-suite and boardrooms.”

Dom Couldwell, Head of Field Engineering, EMEA at DataStax adds: “When you get started with Gen AI, you might be happy to use the responses that a public LLM can prepare for you. However, if you want to provide more value to your customers you’ll need to leverage your own company data around products and customers to personalise results. To achieve this, you will have to prepare your data for Retrieval Augmented Generation, or RAG.”

RAG combines pre-trained language models with a retrieval system that enables you to talk to your company’s own data. “RAG is incredibly useful because it reduces hallucinations, helps LLMs to be more specific by using enterprise data and can be refreshed as quickly as new information becomes available. It’s also a more resource-efficient approach as the embedded retrieval inference costs are substantially lower than batch training custom models.”

To get RAG right,  Couldwell suggests organisations should look at ‘Day 1’ and ‘Day 2’ problems. ‘Day 1’ issues are those that exist around getting started, like preparing your data for RAG. ‘Day 2’ problems exist around how to make systems work at scale, which can be a significantly bigger challenge.

Couldwell’s ‘Day 1’ data problems explained

“To get started around RAG, you have to look at what data you have, what formats it exists in and how to get this ready for use with Generative AI. This can be unstructured data or structured data that exists in a variety of formats. All this data you have will then be turned into document objects that contain both text and associated metadata. This data is then split into smaller portions called chunks that can be indexed and understood.

“The chunks are indexed and converted into vector embeddings, which capture the semantic relationships between concepts or objects in mathematical form. These vectors are stored in a vector database for future use.

“Each of these steps is needed to get your data ready to search when a query comes in from a user. That query gets turned into a vector, which then gets used for a search comparison against all the information held in the vector database. The system finds the information that has the closest semantic match to the query, and then shares that information back to the LLM to build the response.

“As an example, say you ask a retailer for product information around “scarlet Nike sneakers” - a traditional search engine would look for exact matches to that term, while a vector search would understand that “scarlet” is a synonym for “red”, that “Nike” is a brand name, and “sneakers” equates to “trainers” or “shoes.” The vector search should deliver a mix of results that match the search idea, not just the words involved, and provide that customer with a better set of results in context.

“To implement RAG effectively, you have to pick the right embedding model, data chunking strategy and index approach. Each of these approaches is tailored to specific scenarios and types of texts. With multiple different chunking approaches to consider, you can experiment with different methods to determine which best suits the specific needs of your RAG application.”

Cauldwell’s ‘Day 2’ problems explained

“Beyond getting your data prepared, the next problem to consider is how to run at production levels. While you might be able to use data in test implementations and get relevant results, how does that same implementation scale up to thousands or millions of requests coming in? The complexity of accessing structured and unstructured data can introduce latency that impacts the user experience.

“Up to 40% of that latency can come from calls to the embedding service and vector search service. Reducing that overhead per transaction can have a huge impact on the user, while also affecting the cost for those transactions as well. If transactions are more efficient, then the cost to service customers will be reduced and it is therefore more likely that the application will deliver on its business case.

“To achieve this, developers have two choices - you can either manage the components themselves over time, or you can look at employing a stack that comes pre-integrated. Implementing your own components enables you to pick the best approach and toolset for your application, but it does require management and development overhead as tools are changed or updated. Conversely, a stack-based approach can support those different tools in one overall solution, letting your developers concentrate on how the application expands rather than tending to the components.

“Alongside your performance around data retrieval, you will want to look at how you scale up your data deployment. Running in multiple locations requires you to host your data in different regions and keep it consistent. This is necessary because you will want to hold your data closer to your users rather than only having it in one location. This also makes it easier to streamline testing and validation processes as you can replicate your staging environments to different regions, ensuring you can thoroughly test performance in geographically diverse settings. This is also beneficial for managing increasing demands without sacrificing performance or availability as user data volumes grow.

“Generative AI will only be as good as the data that you can put in. Using RAG can help improve your responses, but this has to be implemented so that you can scale up your approach to cope with customer demand. Once you have RAG in place, you will also have to look at your Day 2 issues, so you continue to get the value out of Gen AI that you want.”

******

Make sure you check out the latest edition of AI Magazine and also sign up to our global conference series - Tech & AI LIVE 2024

******

AI Magazine is a BizClik brand

Share

Featured Articles

Alteryx Industry-First AI Copilot Sees New Era of Analytics

Alteryx unveils AiDIN Copilot, the first AI assistant that chats with users to build data analysis workflows

Tamr’s Anthony Deighton: Integrating AI into Enterprise Data

AI Magazine speaks with Anthony Deighton, General Manager of Data Products at Tamr, about the power of AI and how it can be harnessed to transform data

IBM and Tech Mahindra Seek to Power Up Gen AI Adoption

Both tech giants are partnering to harness the power of IBM's watsonx to help enterprises accelerate Gen AI adoption and offer new governance capabilities

NASA's First Chief AI Officer Shows AI's Value Cross Sector

AI Strategy

OpenAI: A Look at the AI Trailblazer’s Leadership Landscape

AI Strategy

Who Are Microsoft's LLM Contemporaries?

Machine Learning