The Role of Mathematics in Artificial Intelligence: How Advanced Mathematical Thinking Creates Strategic and Economic Value

Contents

1 Executive Introduction: Why Mathematics Is the Invisible Engine of AI
2 A Mental Model for Decision Makers: AI as a Stack of Mathematical Choices
3 Linear Algebra: The Geometry of Meaning and Scale
- 3.1 Why Linear Algebra Is Foundational to AI
- 3.2 Business Value Created
4 Calculus and Optimization: How Machines Learn and Improve
- 4.1 Advanced Optimization Concepts Explained in Words
- 4.2 Executive Implications
5 Probability Theory: Managing Uncertainty, Risk, and Confidence
- 5.1 Where Probability Drives Business Value
- 5.2 Strategic Importance
6 Statistics: Measuring Truth, Performance, and Reliability
- 6.1 Statistics as Governance Infrastructure
7 Information Theory: Turning Data into Economic Signal
- 7.1 Business Outcomes
8 Graph Theory and Discrete Mathematics: Understanding Relationships
- 8.1 Industry Applications
9 Advanced Mathematical Frameworks Behind Breakthrough AI
10 Mathematics Across AI-Driven Industries
11 Mathematics as Competitive Advantage
12 Organizational and Leadership Implications
13 The Future of AI Will Be Written in Mathematics
14 Conclusion: Seeing AI Clearly
15 References

Math in AI is how a computer decides what to do next. It helps the system count, compare, and choose when the answer is not obvious. Without math, an AI system is just a box that stores information.

Most people talk about what AI can do. They do not talk about why it fails. When AI makes the wrong choice, the cause is often not the data or the tool. It is the math used to train it.

This matters because math shapes every trade-off. It affects how fast a system learns, how often it makes mistakes, and how expensive it is to run. When math in artificial intelligence is treated as a detail, problems show up later and cost more to fix.

Executive Introduction: Why Mathematics Is the Invisible Engine of AI

AI is usually explained as data, algorithms, and computing power (Deisenroth et al., 2020).

That description skips the part that matters most. Math is the structure that tells the system how to think. Without it, data has no shape and compute have no direction.

Every AI capability is a math operation run many times. Prediction reduces error. Recommendation ranks options under limits. Generation assigns likelihoods to outputs. Optimization chooses trade-offs. These systems do not reason. They calculate (Goodfellow et al., 2016).

For executives, the role of mathematics in artificial intelligence is not about learning formulas. It is about knowing limits. Math defines what a system can do, what it cannot learn, and how it behaves when inputs change. This is where many AI projects fail. They assume scale will cover weak foundations.

Real differentiation comes from math choices. How uncertainty is handled. How learning is constrained. How error is measured. These decisions explain why some AI investments perform and others stall. Advanced mathematics is a strategic asset. Treating it as a technical detail leads to cost overruns and unstable systems.

A Mental Model for Decision Makers: AI as a Stack of Mathematical Choices

AI is not one system. It is a stack of math decisions. Each layer depends on the one below it. When a lower layer is weak, the failure shows up later as cost, risk, or poor decisions.

The First Layer Is Representation

Reality is turned into data. This step throws information away. Choices about what to measure, how to encode it, and what to ignore already limit what the system can learn. If the data misses key signals, no model can recover them (Gebru et al., 2018).

The Second Layer Is Learning

Models look for patterns using rules defined by math. These rules decide what counts as a good answer and what counts as noise. If the math favors the wrong patterns, the system learns the wrong lessons, even with large datasets (Amodei et al., 2016).

The Third Layer Is Uncertainty

Real systems never have complete information. Math is used to estimate confidence and risk. Many systems break here. They produce clean outputs without honest uncertainty, which leads to decisions that look precise but are fragile (Guo et al., 2017).

The Final Layer Is Optimization

The system chooses actions under constraints like time, cost, or accuracy. Optimization always involves trade-offs. When these trade-offs are poorly defined, the system pushes value in the wrong direction (La Cava, 2023).

Weakness at any layer moves upward. A small assumption in math becomes a large business risk later. This is why engineering execution alone is not enough. Good code cannot fix bad mathematical design.

AI systems are not smart software. They do not understand context or intent. They are formal reasoning systems built on assumptions. When those assumptions are wrong, the system fails in predictable ways.

This is why recent research on the limits of learning systems under distribution shifts and hidden assumptions matters for strategy, not just theory, as shown in this analysis of failure modes in modern machine learning systems (D’Amour et al., 2022).

Math in AI is where control lives. Leaders who understand the stack can see where systems will hold and where they will break. Those who do not are left reacting after the damage is done.

Linear Algebra: The Geometry of Meaning and Scale

Linear algebra is the math of representation and movement. It defines how raw input becomes something a system can compare, store, and reuse. In AI, this step sets the ceiling for everything that follows (Strang, 2019).

At its core:

Vectors are structured descriptions of entities like users, images, or words
Matrices are rules that transform one description into another
Tensors track relationships across many dimensions at once

This is how meaning becomes geometry. Similar things are placed close together. Dissimilar things are pushed apart. If the embedding geometry is poor, downstream performance is strongly constrained and typically degrades accordingly (Garrido et al., 2023)..

Why Linear Algebra Is Foundational to AI

Neural networks are stacks of linear transformations with simple decision checks in between. Each layer reshapes information. Nothing “intelligent” happens without this step working first.

Key mechanisms:

Neural networks apply repeated linear transformations at scale (Goodfellow et al., 2016)
Embeddings encode meaning as distance in a shared space (Devlin et al., 2019)
Dimensionality reduction removes noise before it becomes error (McInnes et al., 2018)

When these fail, models learn unstable patterns. More layers only make the failure harder to detect and more expensive to fix.

Business Value Created

Linear algebra is not an abstract value. It shows up in real systems that either hold under load or break quietly.

It directly supports:

Search relevance that reflects intent, not keywords (Karpukhin et al., 2020)
Recommendations that adapt without manual rules (Covington et al., 2016)
Language and vision systems that scale beyond templates (Radford et al., 2021)
Cost control through compact representations and faster inference (Sanh et al., 2019)

Research on representation learning and geometric structure in modern models explains why these systems scale when math is done well and degrade when it is not (Bahri et al., 2024).

This is where math in AI becomes a business concern. When representation drifts, quality drops before anyone notices. Costs rise later. Teams that understand this fix structure early. They revisit what the model is optimizing and add checks along the way. Teams that don’t, end up debugging symptoms after the damage is done, chasing errors downstream instead of correcting the math upstream.

Calculus and Optimization: How Machines Learn and Improve

AI does not learn by understanding. It learns by reducing error. The AI system makes a guess, checks how wrong it is, and adjusts. This repeats until improvement slows or cost becomes too high. That loop is calculus in practice.

Training an AI system is not a coding task. The code stays mostly fixed. What changes are numbers inside the model. Calculus provides the feedback that tells those numbers how to move. Without that feedback, the system cannot improve (Baydin et al., 2018).

A gradient is not an equation you need to memorize. It is direction. It answers one question: if the model changes slightly, does the result get better or worse. Learning works only if that direction is reliable. When it is not, progress becomes unstable (Goodfellow et al., 2016).

Advanced Optimization Concepts Explained in Words

Most real AI training problems are non-convex. There is no single best path downhill. The system can get stuck in shallow improvements or move in circles.

This creates real constraints:

Non-convex optimization means learning can stall or jump unpredictably (Jin, 2019)
Speed vs accuracy forces teams to choose between fast training and stable results (Keskar et al., 2017)
Robustness vs performance requires sacrificing peak scores to avoid failure under change (Tsipras et al., 2019)
Scaling models increases the size of the search space and the chance of instability (Kaplan et al., 2020)

As models grow, optimization becomes harder, not easier (D’Amour et al., 2022). More parameters mean more ways to fail. This is why larger systems cost more to train and are harder to control, a problem studied in work on optimization limits in deep learning systems (Kaplan et al., 2020).

Research on sharp and flat minima in large-scale training shows why some models perform well in tests but break when deployed (Keskar et al., 2017).

Executive Implications

Optimization choices show up directly in business outcomes. They determine how long AI training takes, how much it costs, and how reliable the AI system is after release.

Executives face trade-offs whether they see them or not:

Higher performance usually means higher training cost
Faster experiments reduce risk but may limit final quality
Aggressive optimization increases failure under new conditions

AI systems fail quietly here. Training metrics look good. Deployment exposes the gaps. Leaders who understand optimization can ask better questions early. Leaders who do not usually pay later, in retraining cycles, outages, and lost trust.

This is where calculus stops being theory. It becomes a control system for cost, speed, and reliability.

Probability Theory: Managing Uncertainty, Risk, and Confidence

AI does not produce facts. It produces likelihoods.

Every output is a guess with a confidence level, even when it looks certain. This is not a flaw. It reflects reality. Real data is incomplete, noisy, and always changing (Guo et al., 2017).

Uncertainty cannot be removed. It can only be managed. Probability gives AI a way to act without full information and adjust when new data arrives. When systems ignore uncertainty, they fail quietly. They make clean decisions that break under real conditions (Wilson & Izmailov, 2020).

Bayesian reasoning is a disciplined way to update the probability of an assumption (Gal & Ghahramani, 2016). The system starts with an assumption, sees new evidence, and adjusts. This matters because yesterday’s data is rarely enough. Systems that cannot update their confidence drift out of sync with reality.

Where Probability Drives Business Value

Probability shapes decisions where mistakes have cost:

Demand forecasts that must handle volatility (Salinas et al., 2020)
Risk and fraud scores that balance false alarms and missed threats (Shi et al., 2023)
Medical and financial support systems where confidence matters as much as accuracy (Guo et al., 2017)

Strategic Importance

Probability controls trust. Regulators care about confidence, not just output. Leaders need to know when a model is unsure, not just when it is right.

Risk-aware systems fail less often and fail more visibly. That transparency matters. When AI hides uncertainty, it shifts risk to the business. When uncertainty is exposed, decisions stay human where they should.

Statistics: Measuring Truth, Performance, and Reliability

AI finds patterns. Statistics check whether those patterns hold. These are not the same job. Pattern recognition can produce results quickly. Statistical inference asks if the results mean anything outside the data they came from (James et al., 2021).

Bias, variance, and sampling errors are not just academic issues. Bias means the system learns the wrong lesson. Variance means results change too much to trust. Bad sampling means the system performs well in tests and fails in use (Dockès et al., 2021). Most false confidence in AI comes from ignoring one of these.

Statistics is where math in AI becomes a discipline. It forces limits on what claims can be made and how stable a system really is.

Statistics as Governance Infrastructure

Statistics keeps systems honest after launch:

Validation checks if models still behave as expected
Monitoring detects performance decay and data drift
Metrics interpretation prevents teams from chasing the wrong numbers

When statistical thinking is weak, problems show up late and cost more to fix. When it is strong, failures are smaller, earlier, and easier to control.

Information Theory: Turning Data into Economic Signal

Information theory explains why some data matters and most of it does not. Entropy measures uncertainty. High entropy means outcomes are hard to predict. Low entropy means the system has learned structure (Polyanskiy & Wu, 2023).

Information is what reduces uncertainty. If new data does not change what the system expects, it adds little value. AI systems learn by finding signals that compress many possibilities into fewer, more useful ones (Polyanskiy & Wu, 2023).

This is why compact representations matter. They keep meaning while removing noise. When compression fails, models memorize details and break when inputs change.

Business Outcomes

Information theory shows up in real costs and limits:

Data compression reduces storage and compute spend (Ballé et al., 2018)
Efficient learning lowers data and training requirements (Jeon et al., 2022)
Better generalization improves robustness in new conditions (Alemi et al., 2017)

When AI systems cannot reduce uncertainty, they scale cost without scaling value.

Graph Theory and Discrete Mathematics: Understanding Relationships

Some problems are not about attributes. They are about connections. Graph theory models this directly. A graph represents entities as nodes and interactions as edges (Zhou et al., 2020). The value is not just in the node but also in how it links to others.

This matters because many real systems behave this way. Fraud spreads through networks (Dou et al., 2020). Influence moves through relationships. Meaning forms from context, not isolated facts. When models ignore structure, they miss these effects. Graph theory, a sub-area of discrete mathematics, makes that structure usable.

Discrete mathematics shows up in decision trees and rule systems (Chen & Guestrin, 2016). These are structured choices, not smooth averages. They force clear paths and clear limits. When the logic is wrong, errors are sharp and visible.

Industry Applications

Graph-based math supports systems where relationships drive outcomes:

Fraud and financial crime detection based on transaction networks (Dou et al., 2020)
Enterprise knowledge graphs that connect data across systems (Hogan et al., 2021)
Recommendation and influence networks shaped by user interaction (Wu et al., 2020)

Research on graph-based reasoning for fraud detection and networked systems shows why relational structure improves signal quality in high-risk environments (Dou et al., 2020).

Work on knowledge graphs and relational learning in enterprise AI explains how structured connections improve reasoning across complex datasets (Hogan et al., 2021).

Broader studies in applied discrete mathematics for machine learning systems show where graph-based approaches outperform flat models and where they fail (Yao et al., 2019).

Math in AI breaks here when relationships are flattened into features. When structure is preserved, systems see patterns others miss.

Advanced Mathematical Frameworks Behind Breakthrough AI

Most commercial AI uses familiar tools. That works until scale, safety, or reasoning becomes a limit. The next gains come from math developed to deal with structure, continuity, and logic. These frameworks reduce failure modes that data and compute alone cannot fix.

Manifold Learning and Topology

Real data looks large and messy, but it often follows hidden structure. Many variables move together. Manifold learning finds those lower-dimensional surfaces and ignores the rest (McInnes et al., 2018).

Topology is used to study the shape of data. It detects clusters, gaps, and anomalies that averages miss. When this math is absent, models fit noise and break on new inputs (Chazal et al., 2024).

This matters because structure lowers risk:

Models generalize better to unseen data (Alemi et al., 2017)
Anomalies stand out instead of blending in (Chazal et al., 2024)
Less data is needed to reach stable performance (McInnes et al., 2018)

Research on manifold structure in high-dimensional learning systems explains why understanding shape improves robustness at scale (Sekmen, 2024).

Differential Geometry and Continuous Systems

Some systems must move through space, time, or constraints. Differential geometry models smooth change. It defines how systems evolve and where they are allowed to go (Raissi et al., 2019).

This math shows up where failure is costly:

Robotics and automation (Cohn et al., 2023)
Autonomous vehicles and drones (Peng et al., 2024)
Simulation-driven systems and digital twins
Physics-informed models (Raissi et al., 2019)

Without it, behavior is brittle. With it, systems respect limits like energy, safety, and physical law. That reduces testing cycles and lowers operational risk. This is where math in AI becomes a safety tool, not just a performance tool.

Logic, Algebra, and Symbolic Reasoning

Statistical models recognize patterns. They do not reason well about rules, causality, or long chains of decisions. Logic and algebra fill that gap.

Symbolic systems represent knowledge explicitly. They enforce constraints and justify outcomes (Angelino et al., 2018). Modern systems combine learning with rules to get both flexibility and control (Delvecchio et al., 2025).

This matters in environments where answers must be explained:

Enterprise knowledge systems (DeLong et al., 2024)
Scientific and causal reasoning (Gunning & Aha, 2019)
Regulatory compliance and audit trails (Gunning & Aha, 2019)

Work on symbolic and algebraic frameworks for explainable AI systems shows how formal reasoning improves reliability in regulated settings (Angelino et al., 2018).

Advanced math does not make AI smarter. It makes it more controlled. For executives, that difference shows up as lower risk, fewer surprises, and defensible technical advantage.

Mathematics Across AI-Driven Industries

Math in AI does not create value on its own. It creates value when it reduces uncertainty, controls risk, or allocates resources better than people can at scale. The same math shows up across industries, but the pressure points differ.

Healthcare and Life Sciences

Healthcare AI works under uncertainty and high cost of error. Math manages that risk.

Linear algebra drives imaging and signal processing (Zhou et al., 2021)
Probability models confidence in diagnoses and treatments (Guo et al., 2017)
Statistics validates safety and reliability (Collins et al., 2024)

When the math is weak, systems look accurate but fail on edge cases. When it is strong, error rates drop and regulators trust the output.

Finance and Insurance

Financial AI is built to price risk and manage exposure.

Probability quantifies uncertainty (Guo et al., 2017)
Optimization balances return and loss (Xidonas et al., 2020)
Statistics tests robustness and fairness (Shi et al., 2022)

Failures here are expensive. Bad assumptions lead to mispriced risk, missed fraud, or regulatory trouble (Huang et al., 2024).

Autonomous Systems and Robotics

Autonomous systems act in real time under constraints. Math keeps them stable and secure.

Control theory maintains safety
Optimization finds efficient paths
Probabilistic models handle noisy sensors

Without this structure, systems behave unpredictably. With it, deployment cycles shorten and testing costs fall. Research on applied mathematics for autonomous and industrial systems shows how these methods reduce operational risk (Ter Beek et al., 2024).

Retail, Marketing, and Media

Consumer AI operates at scale, where small errors multiply.

Linear algebra supports similarity and personalization (Covington et al., 2016)
Statistics separates signal from noise in experiments (Deng, 2019)
Optimization manages pricing and inventory (Azadi et al., 2019)

When math is sloppy, spending rises without lift. When it is tight, efficiency improves without constant tuning.

Manufacturing, Energy, and Infrastructure

Industrial systems balance efficiency and reliability over long horizons.

Time-series statistics forecast demand and failure (Aizpurua et al., 2022)
Graph models map supply chains (Wasi et al., 2024)
Optimization plans under uncertainty (Roald et al., 2023)

Here, math failures show up as downtime and missed capacity. Strong math turns complexity into control.

Across industries, the pattern is the same. Math in AI defines where systems hold, where they break, and how much risk the business absorbs.

Mathematics as Competitive Advantage

Mathematical depth makes systems hard to copy. The advantage is not in the tools or models used, but in how problems are framed and constrained. When behavior is defined by math choices, competitors cannot replicate it by swapping software or vendors.

It also creates reliability and scalability. Strong math exposes limits early and controls error as systems grow. Weak math hides problems until scale makes them expensive. This is why tool-first teams see performance drop and costs rise as usage increases.

Math-heavy AI companies outperform because they own the core logic. Over time, that logic becomes intellectual property. It survives tool changes and market shifts. That is a moat built on understanding, not on features.

Organizational and Leadership Implications

Math in AI affects how teams are built, how decisions are made, and which systems last under pressure. These are leadership choices, not technical details.

Talent Strategy

PhD-level expertise is essential when the problem involves new structure, safety limits, or hard optimization. These are cases where existing tools break and assumptions must be rebuilt. Hiring depth matters less for routine use and more when failure is costly.

Research cannot live apart from the product. When it does, ideas stall or ship late. Teams work best when researchers help shape real systems and see where theory fails under use.

Executive Literacy

Leaders do not need math details. They need to understand what the system is optimizing, where uncertainty enters, and which assumptions are fixed. This knowledge keeps expectations grounded.

Good questions come from this base. Instead of asking when the model will be done, leaders ask how it fails, what changes break it, and what risk remains after launch.

Vendor and Platform Evaluation

Shallow AI solutions sell ease and speed. Deep ones explain limits and trade-offs. The difference shows up after deployment, not during demos.

Marketing-driven adoption leads to churn. Teams replace platforms when costs rise or control drops. Evaluating the math behind a system helps avoid that cycle and keeps capability in-house.

The Future of AI Will Be Written in Mathematics

AI progress slows when scale is the only lever. More data and compute raise cost faster than value. The next gains come from math that explains cause, not just correlation.

Key directions are clear:

Causal inference turns prediction into action by showing what changes outcomes
Decision intelligence links models to real choices and constraints
Verification and trust rely on bounds, guarantees, and testable assumptions
Safety and alignment depend on controlling failure, not hiding it
Robustness comes from understanding limits, not pushing scale

Scale alone breaks quietly. Math exposes where systems hold and where they do not.

Conclusion: Seeing AI Clearly

AI is not intuition or magic. It is formal mathematical reasoning applied at scale. Every result comes from assumptions, constraints, and trade-offs written in math.

Organizations that see this act differently:

They invest where structure matters, not where hype is loud
They build systems that hold under change, not just in tests
They avoid scale-first mistakes and control risk early

Math in AI is not optional. It is the difference between systems that perform briefly and systems that last.

If you remember one thing: scale hides problems, mathematics reveals them.

If you need to understand how mathematics can help you build robust AI system for your company, visit my page to see how I can help you.

References

The Mathematics of Artificial Intelligence (arXiv survey on math foundations in AI)
https://arxiv.org/pdf/2203.08890.pdf
Blessing of Dimensionality: Mathematical Foundations of Data (measure concentration theory with ML implications)
https://arxiv.org/abs/1801.03421
Deep Learning: An Introduction for Applied Mathematicians (applied math perspective)
https://arxiv.org/abs/1801.05894
Randomized Numerical Linear Algebra: Foundations & Algorithms (efficient linear algebra techniques)
https://arxiv.org/abs/2002.01387
MDPI — Advanced Algorithms for Multi-Modal Learning, Knowledge Graphs, and Trustworthy AI (special issue)
https://www.mdpi.com/journal/mathematics/special_issues/14U12A00N6
MDPI — Application of Knowledge Graphs in Computing and AI (context for relational models)
https://www.mdpi.com/journal/applsci/special_issues/Q3F19W098M
MDPI — Trustworthy AI for Graph Learning and Application (graph-centric research)
https://www.mdpi.com/journal/electronics/special_issues/QE275T245Q
Machine Learning and Knowledge Extraction (MDPI journal covering statistics, graph learning, etc.)
https://www.mdpi.com/journal/make
Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for machine learning. Cambridge University Press. https://mml-book.com
Ian Goodfellow, Yoshua Bengio, & Aaron Courville. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org/
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., & Aroyo, L. (2021). “Everyone wants to do the model work, not the data work”: Data cascades in high-stakes AI. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery. https://doi.org/10.1145/3411764.3445518
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv. https://arxiv.org/abs/1803.09010
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv. https://arxiv.org/abs/1606.06565
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017). PMLR. https://proceedings.mlr.press/v70/guo17a.html
La Cava, W. G. (2023). Optimizing fairness tradeoffs in machine learning with multiobjective meta-models. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’23). Association for Computing Machinery. https://doi.org/10.1145/3583131.3590487
Strang, G. (2019). Linear Algebra and Learning from Data. Wellesley-Cambridge Press. https://math.mit.edu/~gs/learningfromdata/
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020). PMLR. https://proceedings.mlr.press/v119/chen20j/chen20j.pdf
Ian Goodfellow, Yoshua Bengio, & Aaron Courville. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org/
Jacob Devlin, Ming-Wei Chang, Kenton Lee, & Kristina Toutanova. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://arxiv.org/abs/1810.04805
Leland McInnes, John Healy, Nathaniel Saul, & Lukas Großberger. (2018). UMAP: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861
Covington, P., Adams, J., & Sargin, E. (2016). Deep neural networks for YouTube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys ’16). Association for Computing Machinery. https://doi.org/10.1145/2959100.2959190
Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of EMNLP 2020. https://aclanthology.org/2020.emnlp-main.550.pdf
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In Proceedings of ICML 2021. https://proceedings.mlr.press/v139/radford21a/radford21a.pdf
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv. https://arxiv.org/abs/1910.01108
Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2018). Automatic differentiation in machine learning: A survey. Journal of Machine Learning Research, 18(153), 1–43. https://jmlr.org/papers/v18/17-468.html
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org/
D’Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein, J., Hoffman, M. D., Hormozdiari, F., Houlsby, N., Hou, S., Jerfel, G., Karthikesalingam, A., Lucic, M., Ma, Y., McLean, C., Mincu, D., … Sculley, D. (2022). Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 23(86), 1–61. https://jmlr.org/papers/v23/20-1335.html
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv. https://arxiv.org/abs/2001.08361
Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., & Tang, P. T. P. (2017). On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations (ICLR). https://openreview.net/pdf?id=H1oyRlYgg
Wilson, A. G., & Izmailov, P. (2020). Bayesian deep learning and a probabilistic perspective of generalization. arXiv. https://arxiv.org/abs/2002.08791
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016). PMLR. https://proceedings.mlr.press/v48/gal16.html
Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3), 1181–1191. https://www.sciencedirect.com/science/article/pii/S0169207019301888
Shi, H., Cao, J., & Chen, S. (2023). Cost-sensitive learning for medical insurance fraud detection. https://www.sfu.ca/science/stat/cao/Papers/Fraud.pdf
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017). PMLR. https://proceedings.mlr.press/v70/guo17a.html
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in R (2nd ed.). Springer. https://link.springer.com/book/10.1007/978-1-0716-1418-1
Dockès, J., Baratin, A., Lemaître, G., Varoquaux, G., & Cheplygina, V. (2021). Preventing dataset shift from breaking machine-learning biomarkers. GigaScience, 10(9), giab055. https://pmc.ncbi.nlm.nih.gov/articles/PMC8478611/
Polyanskiy, Y., & Wu, Y. (2023). Information theory: From coding to learning (prepublication PDF). Cambridge University Press (forthcoming). https://people.lids.mit.edu/yp/homepage/data/itbook-export.pdf
Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. arXiv. https://arxiv.org/abs/1802.01436
Jeon, H. J., et al. (2022). An information-theoretic framework for deep learning. In Advances in Neural Information Processing Systems (NeurIPS 2022). https://papers.neurips.cc/paper_files/paper/2022/file/15cc8e4a46565dab0c1a1220884bd503-Paper-Conference.pdf
Alemi, A. A., Fischer, I., Dillon, J. V., & Murphy, K. (2017). Deep variational information bottleneck. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1612.00410
Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., & Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57–81. https://www.sciencedirect.com/science/article/pii/S2666651021000012
Dou, Y., Liu, Z., Sun, L., Deng, Y., Peng, H., & Yu, P. S. (2020). Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of CIKM 2020. https://penghao-bdsc.github.io/papers/cikm20.pdf
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) (pp. 785–794). Association for Computing Machinery. https://dl.acm.org/doi/10.1145/2939672.2939785
Cai, C., & Wang, Y. (2020). A note on over-smoothing for graph neural networks. https://grlplus.github.io/papers/23.pdf
Dou, Y., Liu, Z., Sun, L., Deng, Y., Peng, H., & Yu, P. S. (2020). Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20)
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Navigli, R., Neumaier, S., Ngomo, A.-C. N., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys, 54(4), Article 71.
Wu, S., Zhang, W., & others. (2020). Graph neural networks in recommender systems: A survey. arXiv. https://arxiv.org/abs/2011.02260
McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861
Chazal, F., Levrard, C., & Royer, M. (2024). Topological analysis for detecting anomalies (TADA) in time series. Journal of Machine Learning Research, 25, 1–[pages]. https://jmlr.org/papers/volume25/24-0853/24-0853.pdf
Sekmen, A. (2024). Manifold-based approach for neural network robustness (manifold curvature estimation for robustness assessment). npj Artificial Intelligence. https://www.nature.com/articles/s44172-024-00263-8
Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. https://www.sciencedirect.com/science/article/pii/S0021999118307125
Cohn, T., et al. (2023). Non-Euclidean motion planning with graphs of geodesically-convex sets. (Preprint/PDF). https://groups.csail.mit.edu/robotics-center/public_papers/Cohn23.pdf
Peng, Y., et al. (2024). Distributed model predictive control for unmanned aerial vehicles and vehicle platoons: A review. Intelligent Robotics. https://www.oaepublish.com/articles/ir.2024.19
Delvecchio, G. P., et al. (2025). Neuro-Symbolic Artificial Intelligence: A task-directed survey in the black-box models era. In Proceedings of IJCAI 2025. https://www.ijcai.org/proceedings/2025/1157.pdf
Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2018). Learning certifiably optimal rule lists for categorical data. Journal of Machine Learning Research, 18(234), 1–78. https://www.jmlr.org/papers/volume18/17-716/17-716.pdf
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In ICML 2017. https://proceedings.mlr.press/v70/guo17a.html
Zhou, S. K., et al. (2021). A review of deep learning in medical imaging. [journal]. https://pmc.ncbi.nlm.nih.gov/articles/PMC10544772/
Collins, G. S., et al. (2024). TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ, 385, e078378. https://www.bmj.com/content/385/bmj-2023-078378
Huang, H., et al. (2024). Technology-driven financial risk management: Exploring machine learning approaches for risk prediction. Systems, 12(10), 416. https://www.mdpi.com/2079-8954/12/10/416
Azadi, Z., et al. (2019). Stochastic optimization models for joint pricing and [inventory/supply decisions]. Computers & Industrial Engineering. https://www.sciencedirect.com/science/article/abs/pii/S0360835218305424
Deng, A. (2019). On post-selection inference in A/B testing. arXiv. https://arxiv.org/pdf/1910.03788
Aizpurua, J. I., et al. (2022). Probabilistic forecasting informed failure prognostics framework for improved RUL prediction under uncertainty: A transformer case study. Reliability Engineering & System Safety. https://strathprints.strath.ac.uk/81355/7/Aizpurua_etal_RESS_2022_Probabilistic_forecasting_informed_failure_prognostics_framework_for_improved_RUL_prediction_under_uncertainty_A_transformer_case_study.pdf
Alemi, A. A., Fischer, I., Dillon, J. V., & Murphy, K. (2017). Deep variational information bottleneck. In ICLR. https://arxiv.org/abs/1612.00410
Chazal, F., Levrard, C., & Royer, M. (2024). Topological analysis for detecting anomalies (TADA) in time series. Journal of Machine Learning Research, 25. https://jmlr.org/papers/volume25/24-0853/24-0853.pdf
McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861
Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. https://www.sciencedirect.com/science/article/pii/S0021999118307125
Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2018). Learning certifiably optimal rule lists for categorical data. Journal of Machine Learning Research, 18(234), 1–78. https://www.jmlr.org/papers/volume18/17-716/17-716.pdf
Covington, P., Adams, J., & Sargin, E. (2016). Deep neural networks for YouTube recommendations. In Proceedings of RecSys ’16. https://doi.org/10.1145/2959100.2959190
Roald, L. A., et al. (2023). Power systems optimization under uncertainty: A review of methods and applications. Electric Power Systems Research. https://www.sciencedirect.com/science/article/abs/pii/S0378779622007842
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In ICML 2017. https://proceedings.mlr.press/v70/guo17a.html
DeLong, L. N., Fernández Mir, R., & Fleuriot, J. D. (2024). Neurosymbolic AI for reasoning over knowledge graphs: A survey (Version 3). arXiv. https://arxiv.org/abs/2302.07200
Gunning, D., & Aha, D. W. (2019). DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Magazine, 40(2). https://doi.org/10.1609/aimag.v40i2.2850
Bahri, Y., Dyer, E., Kaplan, J., Lee, J., & Sharma, U. (2024). Explaining neural scaling laws. Proceedings of the National Academy of Sciences, 121(??), e2311878121. https://doi.org/10.1073/pnas.2311878121
ter Beek, M. H., et al. (2024). Formal Methods in Industry (Manuscript submitted to ACM). Amazon Science. https://assets.amazon.science/f5/7b/9a668143460c98e8f68eae554cd8/formal-methods-in-industry.pdf
Garrido, Q., Balestriero, R., Najman, L., & LeCun, Y. (2023). RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank. In Proceedings of the 40th International Conference on Machine Learning (ICML) (PMLR). https://proceedings.mlr.press/v202/garrido23a/garrido23a.pdf
Jin, C. (2019). Machine Learning: Why Do Simple Algorithms Work So Well? (Technical Report No. UCB/EECS-2019-53). University of California, Berkeley, EECS Department. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-53.pdf
Yao, L., Mao, C., & Luo, Y. (2019). Graph Convolutional Networks for Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence. https://dl.acm.org/doi/10.1609/aaai.v33i01.33017370