Artificial Intelligence (AI) is a machine’s ability to perform tasks using reasoning, learning & problem solving like humans. AI relies on Machine Learning (ML) models, which are algorithms that can detect patterns, make predictions, and learn from data, rather than from fixed explicit programming instructions. Deep learning (DL) is the type of ML in which the models are based on artificial neural networks inspired by how biological brains work.
Generative AI (GenAI) is the type of AI used to create content such as conversations, images, or video based on prior learning from existing content. GenAI relies on foundational models, which are exceptionally large ML models trained on vast amounts of generalized and unlabeled data to perform variety of general tasks such as understanding language and generating new text, audio or images from user provided prompts in a human language. Foundational models (FM) work by using learned patterns and relationships from the training data to predict the next item in a sequence given a prompt. It is cheaper and faster for data scientists to use foundational models as starting points rather than building models from scratch to build ML apps.
Large Language Models (LLMs) are a class of foundational models trained on text data used to perform a variety of tasks such as understanding language, reasoning over text, and generating new text based on user prompts in a human language. Examples of LLMs include ChatGPT, Llama, and Claude.
LLM-based AI apps leverage understanding language, reasoning & text generation to augment or automate complex tasks that typically require human intervention such as summarizing legal documents, triaging customer support tickets, or more.
Typically, AI developers build LLM-based AI apps that automate complex workflows by combining multiple LLMs and components such as prompts, vectors, or agents that each solve a discrete task that are connected by chains or pipelines in different ways using LLM (Large Language Model) orchestration frameworks.
When deployed to production, different parts of multi-component distributed LLM-based AI apps run on a combination of different kinds of AI infrastructure such as LLM-as-a-Service, GPU (graphics processing units) clouds, managed services from cloud, or custom-engineered AI stack. Typically, these systems are managed in production by IT DevOps engineers.
AI developers code, monitor, debug and optimize the resources in an LLM-based AI application. IT DevOps engineers monitor, troubleshoot, and optimize the services in the AI infra that the LLM-based AI application runs on.
Organizational leaders use usage, cost, and other high-level metrics to track ROI and allocate resources to different AI initiatives.
AI Observability is proactive monitoring of your AI apps and cloud infra they run on to understand how to make them work better - more reliable, performant and cost-effective. AI observability involves discovery of components that build up an AI app and dependencies among them to understand how they work together. By observing execution of your app and the utilization of your infra you can understand what impacts the operation of your app.