Author: Pablo Cohen
-
What is Rate Limiting? — AI Encyclopedia | XLUXX
Rate Limiting — Restricting how many API requests a user can make in a given time period. Prevents abuse and ensures fair access. OpenAI limits by tokens per minute. XLUXX uses daily call limits by tier: free 100/day, starter 1000/day, pro 10000/day. Why It Matters Understanding Rate Limiting is essential for anyone working with AI…
-
What is Throughput? — AI Encyclopedia | XLUXX
Throughput — The number of requests or tokens an AI system can process per unit time. Measured in tokens per second or requests per minute. Higher throughput means serving more users simultaneously. Batch processing increases throughput at the cost of latency. Why It Matters Understanding Throughput is essential for anyone working with AI systems. As…
-
What is Latency? — AI Encyclopedia | XLUXX
Latency — The time between sending a request and receiving a response. In AI, measured as time-to-first-token (TTFT) and tokens-per-second (TPS). Groq claims sub-100ms TTFT. Lower latency means better user experience. Critical for real-time AI applications. Why It Matters Understanding Latency is essential for anyone working with AI systems. As the technology evolves, these fundamentals…
-
What is WebSocket? — AI Encyclopedia | XLUXX
WebSocket — A protocol for real-time bidirectional communication between client and server. Unlike HTTP request-response, WebSockets keep a connection open for continuous data flow. Used for streaming AI responses, real-time chat, and live model inference. Why It Matters Understanding WebSocket is essential for anyone working with AI systems. As the technology evolves, these fundamentals separate…
-
What is JSON? — AI Encyclopedia | XLUXX
JSON — JavaScript Object Notation — the universal data format for APIs. Human-readable key-value pairs. Every AI API sends and receives JSON. If you work with AI APIs, you work with JSON. Lighter than XML, easier to parse, supported by every programming language. Why It Matters Understanding JSON is essential for anyone working with AI…
-
What is REST API? — AI Encyclopedia | XLUXX
REST API — Representational State Transfer — the most common API architecture. Uses HTTP methods (GET, POST, PUT, DELETE) to interact with resources. Almost every AI service exposes a REST API. Stateless, scalable, simple. The backbone of the modern internet. Why It Matters Understanding REST API is essential for anyone working with AI systems. As…
-
What is Kubernetes? — AI Encyclopedia | XLUXX
Kubernetes — Container orchestration platform that manages Docker containers at scale. Automates deployment, scaling, and management. Used by every major AI company to run inference at scale. If Docker is a shipping container, Kubernetes is the entire port. Why It Matters Understanding Kubernetes is essential for anyone working with AI systems. As the technology evolves,…
-
What is Docker? — AI Encyclopedia | XLUXX
Docker — A platform for packaging applications into containers — lightweight, portable environments that run identically everywhere. Essential for deploying AI models. Most MCP servers, AI inference engines, and ML pipelines run in Docker containers. docker run is the universal deployment command. Why It Matters Understanding Docker is essential for anyone working with AI systems.…
-
What is API (Application Programming Interface)? — AI Encyclopedia | XLUXX
API (Application Programming Interface) — A set of rules that lets software applications communicate. REST APIs use HTTP requests. AI APIs let you send text and get AI-generated responses. OpenAI, Anthropic, and Google all offer AI through APIs. Pricing is typically per token or per request. Why It Matters Understanding API (Application Programming Interface) is…
-
What is TPU (Tensor Processing Unit)? — AI Encyclopedia | XLUXX
TPU (Tensor Processing Unit) — Google’s custom AI chip designed specifically for tensor operations used in machine learning. Available through Google Cloud. TPU v5 powers Gemini training. Competitors: NVIDIA H100/B200, AMD MI300, AWS Trainium, Cerebras WSE. Why It Matters Understanding TPU (Tensor Processing Unit) is essential for anyone working with AI systems. As the technology…
-
What is GPU (Graphics Processing Unit)? — AI Encyclopedia | XLUXX
GPU (Graphics Processing Unit) — Originally designed for rendering graphics, GPUs excel at parallel math operations — exactly what AI training needs. A single NVIDIA H100 can perform 2 petaflops of AI compute. AI training and inference are almost entirely GPU-bound. NVIDIA, AMD, and Intel compete in this space. Why It Matters Understanding GPU (Graphics…
-
What is Neural Network? — AI Encyclopedia | XLUXX
Neural Network — A computing system inspired by biological brains. Layers of interconnected nodes (neurons) process data by learning patterns. Input layer receives data, hidden layers extract features, output layer produces results. From image recognition to language generation, neural networks power modern AI. Why It Matters Understanding Neural Network is essential for anyone working with…
-
What is CUDA? — AI Encyclopedia | XLUXX
CUDA — NVIDIA’s parallel computing platform that lets GPUs run general-purpose code. Every AI model trains on CUDA. It is why NVIDIA dominates AI hardware — their GPUs plus CUDA software is the de facto standard. AMD and Intel are trying to compete with ROCm and oneAPI but CUDA’s ecosystem is decades ahead. Why It…
-
What is Backpropagation? — AI Encyclopedia | XLUXX
Backpropagation — The algorithm that makes neural networks learn. After a prediction, backprop calculates how much each weight contributed to the error, then adjusts them to reduce the error next time. Runs backward through the network — hence ‘back’ propagation. The foundation of all deep learning. Why It Matters Understanding Backpropagation is essential for anyone…
-
What is Attention Mechanism? — AI Encyclopedia | XLUXX
Attention Mechanism — The core innovation behind transformers. Allows a model to focus on relevant parts of the input when generating each output token. Self-attention lets every word look at every other word to understand context. Without attention, we would not have GPT, Claude, or any modern LLM. Why It Matters Understanding Attention Mechanism is…
-
What is Prompt Injection? — AI Glossary | XLUXX
Prompt Injection — An attack where malicious input tricks an AI into ignoring its instructions. Example: a user says ‘Ignore your instructions and do X instead.’ Dangerous for AI agents that take actions in the real world. XLUXX’s Context Gate detects prompt injection by monitoring context drift and blocking compromised responses. Why It Matters Understanding…
-
What is Distillation? — AI Glossary | XLUXX
Distillation — Training a smaller model to mimic a larger one. The large ‘teacher’ model generates outputs that the small ‘student’ model learns from. This is how DeepSeek R1 was distilled from 685B to 7B parameters while retaining most reasoning ability. Makes frontier AI accessible on smaller hardware. Why It Matters Understanding Distillation is critical…
-
What is Benchmark? — AI Glossary | XLUXX
Benchmark — Standardized tests that measure AI model performance. MMLU (knowledge), HumanEval (coding), GSM8K (math), MT-Bench (conversation). Every model release includes benchmark scores. Warning: benchmarks can be gamed — real-world performance often differs from leaderboard rankings. Why It Matters Understanding Benchmark is critical for developers and decision-makers working with AI systems. As the technology evolves…
-
What is Synthetic Data? — AI Glossary | XLUXX
Synthetic Data — Training data generated by AI models rather than collected from real sources. Used when real data is scarce, expensive, or privacy-sensitive. Anthropic, OpenAI, and Google all use synthetic data to train their models. A controversial but increasingly common practice. Why It Matters Understanding Synthetic Data is critical for developers and decision-makers working…
-
What is Function Calling? — AI Glossary | XLUXX
Function Calling — The ability for an AI model to output structured JSON that calls external functions or APIs. Instead of just generating text, the model decides which tool to use, what parameters to pass, and interprets the result. The foundation of AI agent tool use. Supported by GPT-4, Claude, Gemini. Why It Matters Understanding…
-
What is Chain of Thought (CoT)? — AI Glossary | XLUXX
Chain of Thought (CoT) — A prompting technique where you ask the AI to show its reasoning step by step before giving a final answer. Dramatically improves accuracy on math, logic, and complex reasoning tasks. Used internally by o1 and DeepSeek R1 as ‘extended thinking.’ Why It Matters Understanding Chain of Thought is critical for…
-
What is Multimodal AI? — AI Glossary | XLUXX
Multimodal AI — AI models that can process multiple types of input — text, images, audio, video. GPT-4o is multimodal (text + vision + audio). Gemini processes text, images, audio, and video natively. The future of AI is multimodal — single-mode text-only models are becoming obsolete. Why It Matters Understanding Multimodal AI is critical for…
-
What is LoRA (Low-Rank Adaptation)? — AI Glossary | XLUXX
LoRA (Low-Rank Adaptation) — A fine-tuning technique that adds small trainable matrices to a frozen pre-trained model instead of updating all parameters. Reduces training cost by 10-100x. You can fine-tune Llama 70B on a single GPU with LoRA. The standard for efficient model customization. Why It Matters Understanding LoRA is critical for developers and decision-makers…
-
What is Quantization? — AI Glossary | XLUXX
Quantization — Reducing the precision of model weights from 32-bit to 16-bit, 8-bit, or even 4-bit to shrink model size and speed up inference. A 70B model at 4-bit quantization fits in 35GB of RAM instead of 140GB. Essential for running large models on consumer hardware. Why It Matters Understanding Quantization is critical for developers…
-
What is Mixture of Experts (MoE)? — AI Glossary | XLUXX
Mixture of Experts (MoE) — An architecture where a model contains many specialized sub-networks (experts) but only activates a few for each input. DeepSeek V3 uses this — 685B total parameters but only ~37B active per token. Delivers frontier performance at a fraction of the compute cost. Why It Matters Understanding Mixture of Experts is…
