PDA Technical Glossary

PDA Technical Glossary

PDA Technical Reports are highly valued membership benefits because they offer expert guidance and opinions on important scientific and regulatory topics and are used as essential references by industry and regulatory authorities around the world. These reports include terms which explain the material and enhance the reader’s understanding.

The database presented here includes the glossary terms from all current technical reports. The database is searchable by keyword, topic, or by technical report. Each definition provided includes a link to the source technical report within the  PDA Technical Report Portal.

(Please select "All" to restart a filtered Search)

Refine Results
Analytical Model
A mathematical or computational framework used to represent and analyze relationships between variables in drug development and manufacturing. Unlike empirical models, which rely solely on observed data, analytical models are based on theoretical principles, scientific knowledge, and mathematical equations. These models are often used to simulate and predict complex systems such as drug-receptor interactions, pharmacokinetic processes, or biopharmaceutical properties. An application for the pharmaceutical industry is describing a unit operation, such as pumping a buffer into a reaction vessel, using precise mathematical formulations.
Source:
Anomaly Detection
The process of identifying unusual patterns or observations in data, often signaling errors, fraud, or other irregularities. Pharmaceutical industry applications include root cause analysis and transport surveillance, where comparing images of impacted versus non-impacted samples help detect anomalies.
Source:
Area Under the ROC Curve (AUC-ROC)
A metric that quantifies the overall performance of a classification model by measuring the area under the ROC curve. A higher AUC value indicates better model accuracy in distinguishing between different classes. [see Receiver Operating Characteristic Curve (ROC Curve)]
Source:
Artificial General Intelligence (AGI)
A highly advanced form of artificial intelligence with the capability to understand, learn, and apply knowledge across a broad range of tasks at a level comparable to human intelligence. In pharmaceutical applications, AGI could revolutionize drug discovery, development, and manufacturing by autonomously designing experiments, optimizing complex processes, interpreting vast amounts of scientific data, and even generating novel hypotheses.
Source:
Auto Machine Learning (AutoML)
The automated application of machine learning techniques to design, build, and deploy AI models with minimal human intervention. AutoML platforms streamline model selection, feature engineering, hyperparameter tuning, and evaluation, making machine learning more accessible. In pharmaceuticals, AutoML accelerates drug discovery, predictive modeling of patient outcomes, and manufacturing process optimization.
Source:
Bias-Variance Tradeoff
The challenge of balancing model complexity to avoid underfitting and overfitting, ensuring optimal predictive performance. A highly complex model may capture noise (high variance), while a simplistic model may fail to learn key patterns (high bias).
Source:
Confusion Matrix
A structured table used to evaluate the accuracy of a classification model, displaying true positive, true negative, false positive, and false negative counts. It helps identify model strengths and areas for improvement.
Source:
Cross-Validation
A validation method that splits the dataset into multiple subsets, training the model on some while testing on others. This approach ensures better generalization and reduces model bias.
Source:
Data Labeling
The process of annotating datasets with meaningful tags or labels to describe their content, characteristics, or relevance, enabling supervised machine learning to identify patterns and improve predictive accuracy. In pharmaceutical applications, data labeling includes tagging images of cells with their health statuses, categorizing patient records based on treatment outcomes, and annotating chemical compounds with their properties and effects.
Source:
Digital Twin
A sophisticated digital replica of a physical manufacturing process, system, or equipment, encompassing its structure, behavior, performance, and functionality. Digital twins integrate sensor and equipment data to create real-time simulations that mirror actual system behavior, enabling companies to monitor, analyze, and optimize production processes, equipment performance, and product quality. Additional benefits include predictive maintenance, process optimization, and scenario testing without disrupting real operations.
Source:
Ensemble Learning
A technique that combines multiple models to boost overall predictive accuracy and stability, often used to reduce overfitting and enhance robustness. Common methods include bagging, boosting, and stacking.
Source:
Explainability
The degree to which the decision-making process of a machine learning model can be understood and interpreted by humans. Explainability is critical for trust, transparency, regulatory compliance, and ensuring AI driven decisions are interpretable in pharmaceutical applications.
Source:
Feature Engineering
The process of selecting, transforming, or creating input variables (features) to improve the performance of machine learning models. Well-engineered features help model detect patterns effectively, improving predictive accuracy and robustness.
Source:
Feature Importance
An analysis that identifies the relative significance of each input variable in influencing a machine learning model’s predictions. In pharmaceutical applications, this includes recognizing critical process parameters (CPPs) amongst all process parameters, or critical material attributes (CMAs) amongst all material attributes.
Source:
Hallucination
Instances where AI systems, particularly those based on advanced models like GPT-4, generate outputs that are incorrect, misleading, or nonsensical, despite displaying high confidence. Hallucinations can result from biases in the training data, model limitations, or inherent uncertainties in the problem space.
Source:
Human-in-the-Loop (HITL)
Human-in-the-Loop (HITL) is a capability and role whereby qualified personnel can meaningfully intervene within the system’s decision cycle during operation or oversight activities or to enhance trust and continuous improvement. These actions are in place to address uncertainty and limitations, override or adjust outputs, and provide feedback that supports continuous performance assurance. Professionals can actively guide, review, and verify the AI output. HITL is applied in a risk-based manner, with the level and timing of oversight, controls, and documentation proportionate to the system’s intended use and risk and evaluated on the performance of the human–AI team, not the model alone.
Source:
Hybrid Model
A computational model that combines elements from multiple modeling approaches, such as empirical, mechanistic, and machine learning techniques, to solve complex problems. By leveraging the strengths of each approach, hybrid models enhance predictive accuracy, robustness, and interpretability. Applications include integrating mechanistic drug metabolism models with machine learning algorithms trained on experimental data to predict the pharmacokinetic behavior of new drug candidates. Similarly, hybrid models can merge empirical data with physiological knowledge to simulate drug-disease interactions or optimize formulation designs.
Source:
Hyperparameter Tuning
The process of optimizing the configuration settings (hyperparameters) of a machine learning model to maximize performance. Proper tuning prevents underfitting and overfitting, improving overall accuracy.
Source:
IT/OT (Information Technology/Operational Technology)
The merging of Information Technology (IT), which manages computing, networking, and data infrastructure, with Operational Technology (OT), which focuses on monitoring and automating physical processes. IT/OT integration enables seamless data exchange, process optimization, and enhanced system control for greater efficiency and regulatory compliance.
Source:
Model Scoring
The process of evaluating a machine learning model's performance using metrics such as accuracy, precision, recall, and F1 score. Effective model scoring ensures that predictions are valid and trustworthy.
Source:
Model Training
The phase in machine learning where a model learns patterns and relationships from training data, enabling it to make predictions for unseen data. Proper model training ensures generalization and reliability in real world applications.
Source:
Narrow Artificial Intelligence
Artificial intelligence applications and techniques designed for specific tasks, such as speech recognition, image classification, fraud detection, and predictive maintenance. Unlike artificial general intelligence (AGI), narrow AI (also known as weak AI) a specific task or a limited range of tasks with high efficiency and accuracy, but without human-like general reasoning. It relies on well-defined datasets, structured objectives, and clear operational contexts, which is the perfect combination for GMP environments where predictability, reliability, and explainability are mandatory. Some examples of pharmaceutical applications are anomaly detection in fill-finish operations, predictive maintenance for sterile manufacturing equipment, environmental monitoring data analysis, real-time process parameter optimization, automated visual inspection of parenteral products, label verification in secondary packaging, AI-driven root cause analysis (RCA) in deviation management, and process analytical technology (PAT) models for in-line monitoring.
Source:
Overfitting
A situation where a machine learning model learns the training data too well, capturing noise or random fluctuations, leading to poor performance on new, unseen data. Overfitting can be mitigated by regularization, pruning, and using more diverse data.
Source:
Pattern Recognition
The process of identifying and interpreting patterns, trends, and relationships within large and complex datasets, leveraging statistical and computational techniques such as machine learning and data mining to extract meaningful insights from diverse types of data (e.g., chemical structures, biological assays, and clinical outcomes). Pharmaceutical applications include analysis of high-throughput screening data to identify compounds with desired pharmacological properties, such as potency, selectivity, and safety profiles, and in patient data to discover biomarkers, subpopulations, or disease signatures that aid in personalized treatment approaches.
Source:
Prompt Engineering
The systematic design and refinement of instructions provided to AI models, particularly in the context of language models such as GPT (Generative Pre-trained Transformer), to optimize user interactions and generated outputs. Effective prompt engineering enhances model accuracy and relevance, improving applications such as automated text generation, conversational AI, and decision support systems.
Source:
Quantum Computing
A field of computing that utilizes quantum mechanics to process information in ways that classical computers cannot. Quantum computers can perform complex calculations exponentially faster, with potential applications in drug discovery, molecular simulations, and optimization problems. Although still in development, potential applications for the pharmaceutical industry include molecular structure simulation for complex biologics, advanced supply chain optimization, quantum-enhanced predictive modeling for deviation management, simulation of solubility and stability profiles, quantum-accelerated AI model training, and quantum cryptography for secure batch data transmission.
Source:
Receiver Operating Characteristic Curve (ROC Curve)
A graphical representation of a classification model's performance across different threshold settings, illustrating the trade-off between true positive rate and false positive rate. The curve is generated by plotting the true positive rate against the false positive rate, providing insights into the model’s discriminative ability.
Source:
Self-learning
The ability of an AI system to improve its performance over time through experience without explicit human intervention. Self-learning AI adapts continuously, redefining its predictions, decision making, and behaviors based on new data and feedback. In drug manufacturing, self-learning models optimize processes, enhance efficiency, and drive innovation. For example, a model can autonomously incorporate new datasets from batch manufacture, release and/or stability testing to refine its training and improve predictive accuracy.
Source:
Supervised Learning
A type of machine learning used in pharmaceutical manufacturing to train AI models for predictions and decision-making, optimizing drug production, efficacy, and personalizing treatments. This is achieved by pairing input data (e.g., raw materials, process parameters, or patient characteristics) with known outcomes (e.g., successful drug formulations, process yields, or patient responses) to accurately predict outcomes for new, unseen data.
Source:
Token
The basic building blocks of natural language processing (NLP), created by splitting text into smaller components such as words, sub-words, or symbols. Tokens represent individual linguistic units, including words, punctuation, and numbers, enabling AI models to process and understand human language efficiently in tasks such as sentiment analysis, machine translation, and speech recognition.
Source:
Transfer Learning
A machine learning technique where a pretrained model is adapted for a different but related task, often applied in cases with limited labeled data. This method accelerates training by reusing knowledge from previous learning, improving performance in pharmaceutical applications such as drug discovery, patient diagnosis, and molecular analysis.
Source:
Turing Test
A measure of an AI system’s ability to exhibit intelligent behavior indistinguishable from that of a human. In pharmaceutical manufacturing, passing the Turing Test would imply that an AI system can engage in complex reasoning and human-like communication, potentially transforming automated research, patient interactions, and regulatory affairs by providing expert level responses and insights.
Source:
Underfitting
A situation where a machine learning model is too simplistic, failing to identify meaningful patterns in the training data. This results in poor performance on both training and test data. Underfitting can be addressed by using more complex models and richer features.
Source:
Unsupervised Learning
A machine learning approach in pharmaceutical manufacturing where AI models analyze data without predefined labels and autonomously identify patterns, structures, or relationships (e.g., clustering similar compounds, identifying anomalies in production processes, or uncovering hidden patterns in patient data). Applications include clustering similar compounds, identifying anomalies in production processes, or uncovering hidden patterns in patient data to improve drug development insights.
Source: