Welcome to the Large Language Model (LLM) Glossary, your go-to resource for understanding the key terms and concepts in the rapidly evolving field of artificial intelligence. As LLMs become increasingly integral to applications like natural language processing, text generation, and more, this glossary aims to demystify the terminology, providing clear and concise definitions to support your learning and application of these advanced technologies.
Some of the terms below might be missing a description – this document is in a constant state of development to keep up to speed with additions and changes – it’s being worked on!
Processing...
Term | Tags | Description |
---|---|---|
Alignment | Ethics | Ensuring LLM behavior aligns with human values, intentions, and ethical guidelines during training and deployment. |
Alpaca 7B | LLM | Stanford's Alpaca 7B is an open-source model fine-tuned for instruction-following capabilities, based on LLaMA 7B |
Beam Search | Algorithm for selecting the best sequence of tokens during text generation. | |
Bias | Ethics | Unintended prejudices learned from biased training data. |
Chunking | Preprocessing | Dividing text into smaller, manageable pieces for efficient processing and improved context management. |
Claude 3 | LLM | Anthropic's Claude 3 focuses on ethical AI interactions, emphasizing safety and reliability in generating human-like text. |
Context Length | Limitations | The maximum amount of text an LLM can process or retain in a single input sequence. |
Context Window | Maximum text length a model can process in a single input. | |
DBRX | LLM | Databricks' DBRX is an open-source model designed for large-scale data analysis and processing tasks. |
Dropout | Training | Regularization method to prevent overfitting by randomly ignoring neurons. |
Embedding | Representations | Numerical vector representation of text to capture semantic meaning. |
Emergent Abilities | Capabilities | Unexpected skills or behaviors that arise in LLMs when scaling up model size or training data. |
Few-Shot Learning | Training | Using limited examples to guide model behavior on a task. |
Fine-Tuning | Training | Adjusting a pretrained model on specific data for a specialized task. |
GPT | Model Types | A series of LLMs using transformer architecture, pretrained on vast data for versatile text generation tasks. |
GPT-4 | LLM | Developed by OpenAI, GPT-4 is a multimodal model capable of processing both text and image inputs, excelling in complex reasoning and understanding |
Hallucination | Limitations | When an LLM generates false or nonsensical information not grounded in its training data or context. |
Huges Hallucination Evaluation Model (HHEM) | Evaluation Tools | A framework to assess the tendency of LLMs to produce hallucinated or incorrect outputs in generated content. Hosted on HuggingFace. |
Language Modeling | Predicting the next word or sequence based on preceding text. | |
Latent Space | Representations | High-dimensional space where text is mapped to abstract features. |
Llama 3 | LLM | Meta's Llama 3 offers open-source accessibility with strong performance in text generation and coding, supporting over 30 languages. |
Mixtral 8x22B | LLM | Mistral AI's Mixtral 8x22B is a powerful open-source model known for top-tier reasoning in high-complexity tasks. |
Model Drift | Performance | Decline in model performance over time due to changing contexts. |
Multimodal Models | Model Types | Models integrating text with other data types, like images or audio. |
Open LLM Leaderboard | Evaluation Tools | The Open LLM Leaderboard from HuggingFace assesses models based on several benchmarks. |
Overfitting | Training | When a model performs well on training data but poorly on unseen data. |
PaLM 2 | LLM | Google's PaLM 2 is designed for advanced language understanding, including reasoning, coding, and multilingual capabilities |
Pretraining | Training | Initial training phase on large datasets to learn general language patterns. |
Prompt Engineering | Usage | Crafting input prompts to elicit desired responses from a model. |
Pythia | LLM | EleutherAI's Pythia is an open-source model range from 70 million to 12 billion parameters, suitable for various natural language processing tasks |
RAG | Mechanism | Combines external knowledge retrieval with generative models to provide accurate, context-aware responses to queries. |
Self-Attention | Mechanism | Technique allowing models to focus on relevant words in a sequence. |
StableLM 2 | LLM | Stability AI's StableLM 2 is an open-source model optimized for stability and efficiency in various language tasks |
Temperature | Controls randomness in text generation by scaling token probabilities. | |
Tokenization | Preprocessing | Breaking text into smaller units, like words or subwords, for model input. |
Transformer Architecture | Model Structure | Neural network design using attention mechanisms for handling sequential data. |
Transformers | Model Structure | Apply attention mechanisms to consider the importance of all words in a sentence simultaneously for better context understanding. |
Vicuna-13B | Model Types | An open-source chatbot fine-tuned from LLaMA on user-shared conversations, achieving over 90% of ChatGPT's quality. |
XGen-7B | LLM | Developed by Salesforce, XGen-7B is an open-source model tailored for business applications, offering efficient performance with 7 billion parameters. |
Zero-Shot Learning | Training | Performing tasks without specific examples, relying on general language understanding. |