Welcome to the Large Language Model (LLM) Glossary, your go-to resource for understanding the key terms and concepts in the rapidly evolving field of artificial intelligence. As LLMs become increasingly integral to applications like natural language processing, text generation, and more, this glossary aims to demystify the terminology, providing clear and concise definitions to support your learning and application of these advanced technologies.
Some of the terms below might be missing a description – this document is in a constant state of development to keep up to speed with additions and changes – it’s being worked on!
Term | Tags | Description |
---|---|---|
Tokenization | Preprocessing | Breaking text into smaller units, like words or subwords, for model input. |
Embedding | Representations | Numerical vector representation of text to capture semantic meaning. |
Transformer Architecture | Model Structure | Neural network design using attention mechanisms for handling sequential data. |
Self-Attention | Mechanism | Technique allowing models to focus on relevant words in a sequence. |
Context Window | Maximum text length a model can process in a single input. | |
Fine-Tuning | Training | Adjusting a pretrained model on specific data for a specialized task. |
Few-Shot Learning | Training | Using limited examples to guide model behavior on a task. |
Zero-Shot Learning | Training | Performing tasks without specific examples, relying on general language understanding. |
Overfitting | Training | When a model performs well on training data but poorly on unseen data. |
Prompt Engineering | Usage | Crafting input prompts to elicit desired responses from a model. |
Language Modeling | Predicting the next word or sequence based on preceding text. | |
Pretraining | Training | Initial training phase on large datasets to learn general language patterns. |
Multimodal Models | Model Types | Models integrating text with other data types, like images or audio. |
Latent Space | Representations | High-dimensional space where text is mapped to abstract features. |
Dropout | Training | Regularization method to prevent overfitting by randomly ignoring neurons. |
Bias | Ethics | Unintended prejudices learned from biased training data. |
Model Drift | Performance | Decline in model performance over time due to changing contexts. |
Beam Search | Algorithm for selecting the best sequence of tokens during text generation. | |
GPT-4 | LLM | Developed by OpenAI, GPT-4 is a multimodal model capable of processing both text and image inputs, excelling in complex reasoning and understanding |
Claude 3 | LLM | Anthropic's Claude 3 focuses on ethical AI interactions, emphasizing safety and reliability in generating human-like text. |
PaLM 2 | LLM | Google's PaLM 2 is designed for advanced language understanding, including reasoning, coding, and multilingual capabilities |
Llama 3 | LLM | Meta's Llama 3 offers open-source accessibility with strong performance in text generation and coding, supporting over 30 languages. |
Mixtral 8x22B | LLM | Mistral AI's Mixtral 8x22B is a powerful open-source model known for top-tier reasoning in high-complexity tasks. |
StableLM 2 | LLM | Stability AI's StableLM 2 is an open-source model optimized for stability and efficiency in various language tasks |
DBRX | LLM | Databricks' DBRX is an open-source model designed for large-scale data analysis and processing tasks. |
Pythia | LLM | EleutherAI's Pythia is an open-source model range from 70 million to 12 billion parameters, suitable for various natural language processing tasks |
Alpaca 7B | LLM | Stanford's Alpaca 7B is an open-source model fine-tuned for instruction-following capabilities, based on LLaMA 7B |
Open LLM Leaderboard | Evaluation Tools | The Open LLM Leaderboard from HuggingFace assesses models based on several benchmarks. |
XGen-7B | LLM | Developed by Salesforce, XGen-7B is an open-source model tailored for business applications, offering efficient performance with 7 billion parameters. |
RAG | Mechanism | Combines external knowledge retrieval with generative models to provide accurate, context-aware responses to queries. |
Chunking | Preprocessing | Dividing text into smaller, manageable pieces for efficient processing and improved context management. |
Hallucination | Limitations | When an LLM generates false or nonsensical information not grounded in its training data or context. |
Huges Hallucination Evaluation Model (HHEM) | Evaluation Tools | A framework to assess the tendency of LLMs to produce hallucinated or incorrect outputs in generated content. Hosted on HuggingFace. |
Transformers | Model Structure | Apply attention mechanisms to consider the importance of all words in a sentence simultaneously for better context understanding. |
Emergent Abilities | Capabilities | Unexpected skills or behaviors that arise in LLMs when scaling up model size or training data. |
Alignment | Ethics | Ensuring LLM behavior aligns with human values, intentions, and ethical guidelines during training and deployment. |
Context Length | Limitations | The maximum amount of text an LLM can process or retain in a single input sequence. |
GPT | Model Types | A series of LLMs using transformer architecture, pretrained on vast data for versatile text generation tasks. |
Vicuna-13B | Model Types | An open-source chatbot fine-tuned from LLaMA on user-shared conversations, achieving over 90% of ChatGPT's quality. |
Temperature | Controls randomness in text generation by scaling token probabilities. |