AI Seminar @ CharMLab
Spring 2026



Time and location: biweekly on Fridays, 1:30 – 2:30pm, in Woodward Hall
Coordinator: Razvan Bunescu

Presentation schedule:
  1. January 30, 1:30 – 2:30pm, Woodward 335
    Speaker: Youssef Ait Alama
    Paper: Overtrained Language Models Are Harder to Fine-Tune, Springer et al., ICML 2025.

  2. February 13, 1:30 – 2:30pm, Woodward \d+
    Speaker: TBD
    Paper: TBD

  3. February 27, 1:30 – 2:30pm, Woodward \d+
    Speaker: TBD
    Paper: TBD

  4. March 13, Spring Break, no seminar

  5. March 27 27, 1:30 – 2:30pm, Woodward \d+
    Speaker: TBD
    Paper: TBD

  6. April 10, 1:30 – 2:30pm, Woodward \d+
    Speaker: TBD
    Paper: TBD

  7. April 24, 1:30 – 2:30pm, Woodward \d+
    Speaker: TBD
    Paper: TBD


Paper suggestions:
  1. Sirui Li et al., “Towards Foundation Models for Mixed Integer Linear Programming,” paper presented at The Thirteenth International Conference on Learning Representations, October 4, 2024, https://openreview.net/forum?id=6yENDA7J4G.
  2. Longxuan Yu et al., “CausalEval: Towards Better Causal Reasoning in Language Models,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), ed. Luis Chiruzzo et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.naacl-long.622.
  3. Zidong Wang et al., “LLM-Enhanced Score Function Evolution for Causal Structure Learning,” 1 (September 2025): 9086–94, https://doi.org/10.24963/ijcai.2025/1010.
  4. Guangya Wan et al., “Large Language Models for Causal Discovery: Current Landscape and Future Directions,” 2 (September 2025): 10687–95, https://doi.org/10.24963/ijcai.2025/1186.
  5. Wei Chen et al., “Causal-Aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting,” 1 (September 2025): 4292–300, https://doi.org/10.24963/ijcai.2025/478.
  6. Lars Lorch et al., “Amortized Inference for Causal Structure Learning,” Proceedings of the 36th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’22, November 28, 2022, 13104–18.
  7. “39026139 · NATURAL: End-To-End Causal Effect Estimation from Unstructured Text Data,” SlidesLive, accessed January 25, 2026.
  8. Piotr Bojanowski et al., “Enriching Word Vectors with Subword Information,” Transactions of the Association for Computational Linguistics (Cambridge, MA) 5 (2017): 135–46, https://doi.org/10.1162/tacl_a_00051.
  9. Xinzhi Zhang et al., “OptiMind: Teaching LLMs to Think Like Optimization Experts,” arXiv:2509.22979, preprint, arXiv, January 14, 2026, https://doi.org/10.48550/arXiv.2509.22979.
  10. “Jonas Peters,” accessed January 22, 2026, https://people.math.ethz.ch/~jopeters/elements.html.
  11. Katie Matton et al., “Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations,” paper presented at The Thirteenth International Conference on Learning Representations, October 4, 2024, https://openreview.net/forum?id=4ub9gpx9xw.
  12. Xiaoyu Ma and David Patterson, “Challenges and Research Directions for Large Language Model Inference Hardware,” preprint, January 8, 2026, https://doi.org/10.1109/MC.2026.3652916.
  13. Bradley H. Theilman and James B. Aimone, “Solving Sparse Finite Element Problems on Neuromorphic Hardware,” Nature Machine Intelligence 7, no. 11 (2025): 1845–57, https://doi.org/10.1038/s42256-025-01143-2.
  14. “A Static Analysis Tool in CS1: Student Usage and Perceptions of PythonTA | Proceedings of the 26th Australasian Computing Education Conference,” ACM Other Conferences, world, https://doi.org/10.1145/3636243.3636262.
  15. Valentyn Melnychuk et al., “Causal Transformer for Estimating Counterfactual Outcomes,” arXiv:2204.07258, preprint, arXiv, June 3, 2022, https://doi.org/10.48550/arXiv.2204.07258.
  16. Changshuo Zhang et al., “Test-Time Alignment with State Space Model for Tracking User Interest Shifts in Sequential Recommendation,” Proceedings of the Nineteenth ACM Conference on Recommender Systems (New York, NY, USA), RecSys ’25, September 7, 2025, 461–71, https://doi.org/10.1145/3705328.3748060.
  17. Ziwei Fan et al., “Modeling Sequences as Distributions with Uncertainty for Sequential Recommendation,” Proceedings of the 30th ACM International Conference on Information & Knowledge Management (New York, NY, USA), CIKM ’21, October 30, 2021, 3019–23, https://doi.org/10.1145/3459637.3482145.
  18. Jawad Chowdhury and Gabriel Terejanu, “CGLearn: Consistent Gradient-Based Learning for Out-of-Distribution Generalization,” arXiv:2411.06040, preprint, arXiv, November 9, 2024, https://doi.org/10.48550/arXiv.2411.06040.
  19. Eric V. Strobl et al., “Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery,” Journal of Causal Inference 7, no. 1 (2019), https://doi.org/10.1515/jci-2018-0017.
  20. “Random Features for Large-Scale Kernel Machines | Proceedings of the 21st International Conference on Neural Information Processing Systems,” Guide Proceedings, world, https://doi.org/10.5555/2981562.2981710.
  21. Benjamin Recht et al., “Do ImageNet Classifiers Generalize to ImageNet?,” Proceedings of the 36th International Conference on Machine Learning, May 24, 2019, 5389–400, https://proceedings.mlr.press/v97/recht19a.html.
  22. Vishal Misra, “Attention Is Bayesian Inference,” Medium, December 31, 2025, https://medium.com/@vishalmisra/attention-is-bayesian-inference-578c25db4501.
  23. Francesco Locatello et al., “Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations,” Proceedings of the 36th International Conference on Machine Learning, May 24, 2019, 4114–24, https://proceedings.mlr.press/v97/locatello19a.html.
  24. Roland Speicher, “Free Probability Theory,” arXiv:0911.0087, preprint, arXiv, October 31, 2009, https://doi.org/10.48550/arXiv.0911.0087.
  25. Raymond Khazoum et al., “A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments,” arXiv:2512.13517, preprint, arXiv, December 15, 2025, https://doi.org/10.48550/arXiv.2512.13517.
  26. Roger N. Shepard and Jacqueline Metzler, “Mental Rotation of Three-Dimensional Objects,” Science 171, no. 3972 (1971): 701–3, https://doi.org/10.1126/science.171.3972.701.
  27. Pengcheng Jiang et al., “Adaptation of Agentic AI,” arXiv:2512.16301, preprint, arXiv, December 18, 2025, https://doi.org/10.48550/arXiv.2512.16301.
  28. Bytez.com et al., “Artificial Hivemind: The Open-Ended Homogeneity of Langu...,” December 3, 2025, https://bytez.com/docs/neurips/121421/paper.
  29. Shangbin Feng et al., “Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, ed. Yaser Al-Onaizan et al. (Association for Computational Linguistics, 2024), https://doi.org/10.18653/v1/2024.emnlp-main.240.
  30. Bytez.com et al., “Heterogeneous Swarms: Jointly Optimizing Model Roles And...,” December 4, 2025, https://bytez.com/docs/neurips/115041/paper.
  31. Yitong Chen et al., “All-Optical Synthesis Chip for Large-Scale Intelligent Semantic Vision Generation,” Science 390, no. 6779 (2025): 1259–65, https://doi.org/10.1126/science.adv7434.
  32. ByteDance-Seed, “Seed-Prover/SeedProver-1.5 at Main · ByteDance-Seed/Seed-Prover,” GitHub, accessed December 20, 2025, https://github.com/ByteDance-Seed/Seed-Prover/tree/main/SeedProver-1.5.
  33. Andy Zhou and Ron Arel, “Tempest: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search,” arXiv:2503.10619, preprint, arXiv, May 28, 2025, https://doi.org/10.48550/arXiv.2503.10619.
  34. Tejal Patwardhan et al., “GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks,” arXiv:2510.04374, preprint, arXiv, October 5, 2025, https://doi.org/10.48550/arXiv.2510.04374.
  35. “OpenAI Evals,” accessed December 20, 2025, https://evals.openai.com/.
  36. Jan Betley et al., “Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs,” arXiv:2512.09742, preprint, arXiv, December 10, 2025, https://doi.org/10.48550/arXiv.2512.09742.
  37. Bytez.com et al., “AlgoTune: Can Language Models Speed Up General-Purpose N...,” December 4, 2025, https://bytez.com/docs/neurips/121543/paper.
  38. Zhangde Song et al., “Evaluating Large Language Models in Scientific Discovery,” arXiv:2512.15567, preprint, arXiv, December 17, 2025, https://doi.org/10.48550/arXiv.2512.15567.
  39. Osman Batur İnce et al., “Sample-Efficient Integration of New Modalities into Large Language Models,” arXiv:2509.04606, preprint, arXiv, September 4, 2025, https://doi.org/10.48550/arXiv.2509.04606.
  40. “Looking Back at Speculative Decoding,” accessed December 18, 2025, https://research.google/blog/looking-back-at-speculative-decoding/.
  41. Amita Kamath et al., “Selective Question Answering under Domain Shift,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ed. Dan Jurafsky et al. (Association for Computational Linguistics, 2020), https://doi.org/10.18653/v1/2020.acl-main.503.
  42. Zhengyan Shi et al., “Ambiguity Detection and Uncertainty Calibration for Question Answering with Large Language Models,” in Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), ed. Trista Cao et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.trustnlp-main.4.
  43. Jingyu Liu et al., “Do Not Abstain! Identify and Solve the Uncertainty,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ed. Wanxiang Che et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.acl-long.840.
  44. Tong Zhang et al., “CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ed. Lun-Wei Ku et al. (Association for Computational Linguistics, 2024), https://doi.org/10.18653/v1/2024.acl-long.578.
  45. Sewon Min et al., “AmbigQA: Answering Ambiguous Open-Domain Questions,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), ed. Bonnie Webber et al. (Association for Computational Linguistics, 2020), https://doi.org/10.18653/v1/2020.emnlp-main.466.
  46. Fengbin Zhu et al., “TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), ed. Chengqing Zong et al. (Association for Computational Linguistics, 2021), https://doi.org/10.18653/v1/2021.acl-long.254.
  47. Yilun Zhao et al., “MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ed. Smaranda Muresan et al. (Association for Computational Linguistics, 2022), https://doi.org/10.18653/v1/2022.acl-long.454.
  48. Chanyeol Choi et al., “FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation,” arXiv:2504.15800, preprint, arXiv, September 3, 2025, https://doi.org/10.48550/arXiv.2504.15800.
  49. Alexander Novikov et al., “AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery,” arXiv:2506.13131, preprint, arXiv, June 16, 2025, https://doi.org/10.48550/arXiv.2506.13131.
  50. Abhimanyu Das et al., “A Decoder-Only Foundation Model for Time-Series Forecasting,” arXiv:2310.10688, preprint, arXiv, April 17, 2024, https://doi.org/10.48550/arXiv.2310.10688.
  51. Joonas Lahikainen et al., Creativity and Markov Decision Processes, n.d.
  52. Masanori Kohda et al., “Cleaner Fish Recognize Self in a Mirror via Self-Face Recognition like Humans,” Proceedings of the National Academy of Sciences 120, no. 7 (2023): e2208420120, https://doi.org/10.1073/pnas.2208420120.
  53. Mingzhi Chen et al., “Stronger Normalization-Free Transformers,” arXiv:2512.10938, preprint, arXiv, December 11, 2025, https://doi.org/10.48550/arXiv.2512.10938.
  54. Zhijing Jin et al., “CLADDER: Assessing Causal Reasoning in Language Models,” Proceedings of the 37th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’23, December 10, 2023, 31038–65, https://proceedings.neurips.cc/paper_files/paper/2023/hash/631bb9434d718ea309af82566347d607-Abstract-Conference.html.
  55. Hsiang Fu et al., “Introduction to In-Context Learning,” Research, paper presented at CONFERENCE_NAME, 2025, https://YOUR_DOMAIN.com/YOUR_PROJECT_PAGE.
  56. Simon Colton et al., Automatic Generation of Expressive Piano Miniatures, n.d.
  57. Simon Colton et al., Neuro-Symbolic Composition of Music with Talking Points, n.d.
  58. Keshav Bhandari and Simon Colton, “Motifs, Phrases, and Beyond: The Modelling of Structure in Symbolic Music Generation,” Artificial Intelligence in Music, Sound, Art and Design: 13th International Conference, EvoMUSART 2024, Held as Part of EvoStar 2024, Aberystwyth, UK, April 3–5, 2024, Proceedings (Berlin, Heidelberg), April 3, 2024, 33–51, https://doi.org/10.1007/978-3-031-56992-0_3.
  59. Bytez.com et al., “Reasoning by Superposition: A Theoretical Perspective On...,” December 3, 2025, https://bytez.com/docs/neurips/117755/paper.
  60. LearnLM Team et al., “Towards an AI-Augmented Textbook,” arXiv:2509.13348, preprint, arXiv, September 30, 2025, https://doi.org/10.48550/arXiv.2509.13348.
  61. Jo Marchant, “Can AI Be Truly Creative?,” Nature 647, no. 8088 (2025): 24–26, https://doi.org/10.1038/d41586-025-03570-y.
  62. “Real-Time Reasoning Agents in Evolving Environments,” paper presented at The Fourteenth International Conference on Learning Representations, October 8, 2025, https://openreview.net/forum?id=n1AvXiU2lu.
  63. Yue Yang et al., “InterIDEAS: Philosophical Intertextuality via LLMs,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, ed. Christos Christodoulopoulos et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.emnlp-main.1180.
  64. Sasha Boguraev et al., “Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, ed. Christos Christodoulopoulos et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.emnlp-main.1271.
  65. Martin Tutek et al., “Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, ed. Christos Christodoulopoulos et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.emnlp-main.504.
  66. Hao Xu Liu Jiacheng, “Infini-Gram Mini,” Infini-Gram-Mini, accessed December 6, 2025, http://infini-gram-mini.io/.
  67. Hao Xu et al., “Infini-Gram Mini: Exact n-Gram Search at the Internet Scale with FM-Index,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, ed. Christos Christodoulopoulos et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.emnlp-main.1268.
  68. Michał Bortkiewicz et al., “Accelerating Goal-Conditioned RL Algorithms and Research,” arXiv:2408.11052, preprint, arXiv, November 23, 2025, https://doi.org/10.48550/arXiv.2408.11052.
  69. Kevin Wang et al., “1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities,” paper presented at The Thirty-ninth Annual Conference on Neural Information Processing Systems, October 29, 2025, https://openreview.net/forum?id=s0JVsx3bx1.
  70. Aarohi Srivastava et al., “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models,” Transactions on Machine Learning Research, January 19, 2023, https://openreview.net/forum?id=uyTL5Bvosj.
  71. Bytez.com et al., “Gated Attention for Large Language Models: Non-Linearity...,” December 4, 2025, https://bytez.com/docs/neurips/120216/paper.
  72. Thomas Hubert et al., “Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning,” Nature, November 12, 2025, 1–3, https://doi.org/10.1038/s41586-025-09833-y.
  73. “‘The Cat Sat on the …?’ Why Generative AI Has Limited Creativity - Cropley - 2025 - The Journal of Creative Behavior - Wiley Online Library,” accessed December 4, 2025, https://onlinelibrary.wiley.com/doi/10.1002/jocb.70077.
  74. Zihan Qiu et al., “Gated Attention for Large Language Models: Non-Linearity, Sparsity, and Attention-Sink-Free,” paper presented at The Thirty-ninth Annual Conference on Neural Information Processing Systems, October 29, 2025, https://openreview.net/forum?id=1b7whO4SfY.
  75. Benjamin F. Maier et al., “LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings,” arXiv:2510.08338, preprint, arXiv, October 27, 2025, https://doi.org/10.48550/arXiv.2510.08338.
  76. Nikita Dhawan et al., “End-to-End Causal Effect Estimation from Unstructured Natural Language Data,” Proceedings of the 38th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’24, vol. 37 (December 2024): 77165–99, https://neurips.cc/virtual/2024/poster/94106.
  77. Ming-Hui Chen and Joseph G. Ibrahim, “Power Prior Distributions for Regression Models,” Statistical Science 15, no. 1 (2000): 46–60, https://doi.org/10.1214/ss/1009212673.
  78. Fraida Fund et al., “The Cost of Teaching Operational ML,” Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (New York, NY, USA), SC Workshops ’25, November 15, 2025, 393–400, https://doi.org/10.1145/3731599.3767385.
  79. Razvan-Gabriel Dumitru et al., “CopySpec: Accelerating LLMs with Speculative Copy-and-Paste,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, ed. Christos Christodoulopoulos et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.emnlp-main.1337.
  80. Yu Sun et al., “Learning to (Learn at Test Time): RNNs with Expressive Hidden States,” arXiv:2407.04620, preprint, arXiv, August 31, 2025, https://doi.org/10.48550/arXiv.2407.04620.
  81. Ali Behrouz et al., “Nested Learning: The Illusion of Deep Learning Architectures,” paper presented at The Thirty-ninth Annual Conference on Neural Information Processing Systems, October 29, 2025, https://openreview.net/forum?id=nbMeRvNb7A.
  82. Elliot Meyerson et al., “Solving a Million-Step LLM Task with Zero Errors,” arXiv:2511.09030, preprint, arXiv, November 12, 2025, https://doi.org/10.48550/arXiv.2511.09030.
  83. Stephen Gould et al., “Deep Declarative Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence 44, no. 8 (2022): 3988–4004, https://doi.org/10.1109/TPAMI.2021.3059462.
  84. Aditya Kusupati et al., “Matryoshka Representation Learning,” Advances in Neural Information Processing Systems 35 (December 2022): 30233–49.
  85. Yue Wang et al., “CodeT5+: Open Code Large Language Models for Code Understanding and Generation,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, ed. Houda Bouamor et al. (Association for Computational Linguistics, 2023), https://doi.org/10.18653/v1/2023.emnlp-main.68.
  86. Hung Le et al., “CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning,” Proceedings of the 36th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’22, November 28, 2022, 21314–28.
  87. Wenbo Hu et al., “MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models,” paper presented at The Thirteenth International Conference on Learning Representations, October 4, 2024, https://openreview.net/forum?id=Usklli4gMc.
  88. Shi Yu et al., “VisRAG: Vision-Based Retrieval-Augmented Generation on Multi-Modality Documents,” paper presented at The Thirteenth International Conference on Learning Representations, October 4, 2024, https://openreview.net/forum?id=zG459X3Xge.
  89. “Visual Haystacks,” accessed November 8, 2025, http://visual-haystacks.github.io.
  90. Cynthia Rudin et al., “Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges,” Statistics Surveys 16, no. none (2022): 1–85, https://doi.org/10.1214/21-SS133.
  91. Ipsita Ghosh et al., “Q3R: Quadratic Reweighted Rank Regularizer for Effective Low-Rank Training,” arXiv:2511.04485, preprint, arXiv, November 6, 2025, https://doi.org/10.48550/arXiv.2511.04485.
  92. Kimi Team et al., “Kimi Linear: An Expressive, Efficient Attention Architecture,” arXiv:2510.26692, preprint, arXiv, November 1, 2025, https://doi.org/10.48550/arXiv.2510.26692.
  93. Bogdan Georgiev et al., “Mathematical Exploration and Discovery at Scale,” arXiv:2511.02864, preprint, arXiv, November 3, 2025, https://doi.org/10.48550/arXiv.2511.02864.
  94. Ning Shang et al., “rStar2-Agent: Agentic Reasoning Technical Report,” arXiv:2508.20722, preprint, arXiv, August 28, 2025, https://doi.org/10.48550/arXiv.2508.20722.
  95. Zongxi Li et al., “CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, ed. Christos Christodoulopoulos et al. (Association for Computational Linguistics, 2025), https://aclanthology.org/2025.emnlp-main.115/.
  96. Guan Wang et al., “Hierarchical Reasoning Model,” arXiv:2506.21734, preprint, arXiv, August 4, 2025, https://doi.org/10.48550/arXiv.2506.21734.
  97. Alexia Jolicoeur-Martineau, “Less Is More: Recursive Reasoning with Tiny Networks,” arXiv:2510.04871, preprint, arXiv, October 6, 2025, https://doi.org/10.48550/arXiv.2510.04871.
  98. Greg Yang and Edward J. Hu, “Feature Learning in Infinite-Width Neural Networks,” arXiv:2011.14522, preprint, arXiv, July 15, 2022, https://doi.org/10.48550/arXiv.2011.14522.
  99. Charlie Blake et al., “U-$μ$P: The Unit-Scaled Maximal Update Parametrization,” arXiv:2407.17465, preprint, arXiv, January 10, 2025, https://doi.org/10.48550/arXiv.2407.17465.
  100. Samuel H. King et al., “Generative Design of Novel Bacteriophages with Genome Language Models,” preprint, bioRxiv, September 17, 2025, 2025.09.12.675911, https://doi.org/10.1101/2025.09.12.675911.
  101. Wenyi Wang et al., “Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine,” arXiv:2510.21614, preprint, arXiv, October 29, 2025, https://doi.org/10.48550/arXiv.2510.21614.
  102. Jenny Zhang et al., “Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents,” arXiv:2505.22954, version 2, preprint, arXiv, September 26, 2025, https://doi.org/10.48550/arXiv.2505.22954.
  103. Liana Patel et al., “DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis,” arXiv:2508.20033, preprint, arXiv, August 27, 2025, https://doi.org/10.48550/arXiv.2508.20033.
  104. Alexander Novikov et al., “AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery,” arXiv:2506.13131, preprint, arXiv, June 16, 2025, https://doi.org/10.48550/arXiv.2506.13131.
  105. Eser Aygün et al., “An AI System to Help Scientists Write Expert-Level Empirical Software,” arXiv:2509.06503, preprint, arXiv, September 8, 2025, https://doi.org/10.48550/arXiv.2509.06503.
  106. Juraj Gottweis et al., “Towards an AI Co-Scientist,” arXiv:2502.18864, preprint, arXiv, February 26, 2025, https://doi.org/10.48550/arXiv.2502.18864.
  107. Tri Dao et al., “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness,” Advances in Neural Information Processing Systems 35 (December 2022): 16344–59.
  108. Jiayi Xin et al., “$\texttt{I$^2$MoE}$: Interpretable Multimodal Interaction-Aware Mixture-of-Experts,” paper presented at Forty-second International Conference on Machine Learning, June 18, 2025, https://openreview.net/forum?id=EuJaF5QsMP.
  109. Xiaoyang Wang and Christopher C. Yang, “MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction,” arXiv:2508.21793, preprint, arXiv, August 29, 2025, https://doi.org/10.48550/arXiv.2508.21793.
  110. “Introducing LangExtract: A Gemini Powered Information Extraction Library- Google Developers Blog,” accessed November 2, 2025, https://developers.googleblog.com/en/introducing-langextract-a-gemini-powered-information-extraction-library/.
  111. Akshay Goel, LangExtract, v. 1.0.9, Zenodo, released September 1, 2025, https://doi.org/10.5281/ZENODO.17015089.
  112. “LEXam: Benchmarking Legal Reasoning on 340 Law Exams,” accessed November 2, 2025, https://lexam-benchmark.github.io/.
  113. Leon Chlon et al., “Predictable Compression Failures: Why Language Models Actually Hallucinate,” arXiv:2509.11208, preprint, arXiv, September 14, 2025, https://doi.org/10.48550/arXiv.2509.11208.
  114. “Ollama,” accessed November 1, 2025, https://ollama.com.
  115. Leon Chlon et al., “LLMs Are Bayesian, in Expectation, Not in Realization,” arXiv:2507.11768, preprint, arXiv, July 15, 2025, https://doi.org/10.48550/arXiv.2507.11768.
  116. Boshi Wang et al., “Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization,” Proceedings of the 38th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’24, vol. 37 (June 2025): 95238–65.
  117. Shikai Li et al., MetaShuffling: Accelerating Llama 4 MoE Inference – PyTorch, n.d., accessed November 1, 2025, https://pytorch.org/blog/metashuffling-accelerating-llama-4-moe-inference/.
  118. Chenxia Tang et al., “Top-$nσ$: Not All Logits Are You Need,” arXiv:2411.07641, preprint, arXiv, November 12, 2024, https://doi.org/10.48550/arXiv.2411.07641.
  119. Janos Perczel et al., “TeachLM: Post-Training LLMs for Education Using Authentic Learning Data,” arXiv:2510.05087, preprint, arXiv, October 6, 2025, https://doi.org/10.48550/arXiv.2510.05087.
  120. Shengran Hu et al., “Automated Design of Agentic Systems,” arXiv:2408.08435, preprint, arXiv, March 2, 2025, https://doi.org/10.48550/arXiv.2408.08435.
  121. Eser Aygün et al., “An AI System to Help Scientists Write Expert-Level Empirical Software,” arXiv:2509.06503, preprint, arXiv, September 8, 2025, https://doi.org/10.48550/arXiv.2509.06503.
  122. Jenny Zhang et al., “Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents,” arXiv:2505.22954, preprint, arXiv, September 26, 2025, https://doi.org/10.48550/arXiv.2505.22954.
  123. Awslabs/Shuttle, Rust, March 1, 2021; Amazon Web Services - Labs, released October 24, 2025, https://github.com/awslabs/shuttle.
  124. Systems Correctness Practices at Amazon Web Services – Communications of the ACM, May 29, 2025, https://cacm.acm.org/practice/systems-correctness-practices-at-amazon-web-services/.
  125. James Bornholt et al., “Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3,” Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (New York, NY, USA), SOSP ’21, October 26, 2021, 836–50, https://doi.org/10.1145/3477132.3483540.
  126. Xudong Liao et al., “MixNet: A Runtime Reconfigurable Optical-Electrical Fabric for Distributed Mixture-of-Experts Training,” Proceedings of the ACM SIGCOMM 2025 Conference (New York, NY, USA), SIGCOMM ’25, August 27, 2025, 554–74, https://doi.org/10.1145/3718958.3750465.
  127. Mikhail Bernadskiy et al., “Accelerating Frontier MoE Training with 3D Integrated Optics,” preprint, September 9, 2025, https://doi.org/10.1109/HOTI66940.2025.0002.
  128. Yiwei Xie et al., “Complex-Valued Matrix-Vector Multiplication Using a Scalable Coherent Photonic Processor,” Science Advances 11, no. 14 (2025): eads7475, https://doi.org/10.1126/sciadv.ads7475.
  129. “Warp: The Agentic Development Environment,” accessed October 16, 2025, https://www.warp.dev/.
  130. Damai Dai et al., “DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models,” arXiv:2401.06066, preprint, arXiv, January 11, 2024, https://doi.org/10.48550/arXiv.2401.06066.
  131. Sangmin Bae et al., “Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation,” arXiv:2507.10524, preprint, arXiv, July 21, 2025, https://doi.org/10.48550/arXiv.2507.10524.
  132. Cong Li et al., “H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-Based Low-Batch LLM Inference,” Proceedings of the 52nd Annual International Symposium on Computer Architecture (New York, NY, USA), ISCA ’25, June 20, 2025, 194–210, https://doi.org/10.1145/3695053.3731008.
  133. Yanggyu Lee et al., “Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts,” Generative Intelligence and Intelligent Tutoring Systems: 20th International Conference, ITS 2024, Thessaloniki, Greece, June 10–13, 2024, Proceedings, Part I (Berlin, Heidelberg), June 10, 2024, 91–103, https://doi.org/10.1007/978-3-031-63028-6_8.
  134. Muntasir Hoq et al., “Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions,” 2025, 90–103, https://doi.org/10.5281/zenodo.15870203.
  135. “ARC Prize,” ARC Prize, accessed October 4, 2025, https://arcprize.org/.
  136. “LLM Comparator | Responsible Generative AI Toolkit | Google AI for Developers,” accessed October 2, 2025, https://ai.google.dev/responsible/docs/evaluation/llm_comparator.
  137. K. Deb et al., “A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation 6, no. 2 (2002): 182–97, https://doi.org/10.1109/4235.996017.
  138. Samuel Daulton et al., “Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization,” Proceedings of the 34th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’20, December 6, 2020, 9851–64.
  139. “Optuna: A Hyperparameter Optimization Framework — Optuna 4.5.0 Documentation,” accessed October 2, 2025, https://optuna.readthedocs.io/en/stable/index.html.
  140. “AGI Benchmarks: Tracking Progress Toward AGI Isn’t Easy - IEEE Spectrum,” accessed October 1, 2025, https://spectrum.ieee.org/agi-benchmark.
  141. “Limits of Transformer Language Models on Learning to Compose Algorithms | Proceedings of the 38th International Conference on Neural Information Processing Systems,” Guide Proceedings, world, https://doi.org/10.5555/3737916.3738161.
  142. Klaus Weihrauch, Computable Analysis: An Introduction (Springer-Verlag, 2000).
  143. Samuel J. Gershman, “The Molecular Memory Code and Synaptic Plasticity: A Synthesis,” Biosystems 224 (February 2023): 104825, https://doi.org/10.1016/j.biosystems.2022.104825.
  144. Wickliffe C. Abraham et al., “Is Plasticity of Synapses the Mechanism of Long-Term Memory Storage?,” Npj Science of Learning 4, no. 1 (2019): 9, https://doi.org/10.1038/s41539-019-0048-y.
  145. Elizabeth F. Loftus and John C. Palmer, “Reconstruction of Automobile Destruction: An Example of the Interaction between Language and Memory,” Journal of Verbal Learning and Verbal Behavior 13, no. 5 (1974): 585–89, https://doi.org/10.1016/S0022-5371(74)80011-3.
  146. “CVPR Poster Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters,” accessed September 25, 2025, https://cvpr.thecvf.com/virtual/2024/poster/31379.
  147. Yitong Ji et al., “A Critical Study on Data Leakage in Recommender System Offline Evaluation,” ACM Trans. Inf. Syst. 41, no. 3 (2023): 75:1-75:27, https://doi.org/10.1145/3569930.
  148. “Defeating Nondeterminism in LLM Inference,” Thinking Machines Lab, September 10, 2025, https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/.
  149. Biao Zhang and Rico Sennrich, “Root Mean Square Layer Normalization,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems (Curran Associates Inc., 2019).
  150. Bytez.com et al., “Scaling Laws for Differentially Private Language Models ...,” July 17, 2025, https://bytez.com/docs/icml/46020/paper.
  151. “VaultGemma: The World’s Most Capable Differentially Private LLM,” accessed September 13, 2025, https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/.
  152. L. M. Po, “Direct Preference Optimization (DPO) of LLMs: A Paradigm Shift,” Medium, September 1, 2025, https://medium.com/@lmpo/direct-preference-optimization-a-novel-approach-to-language-model-alignment-1f829d4ac306.
  153. Jinhui Ye et al., “T*: Re-Thinking Temporal Search for Long-Form Video Understanding,” arXiv:2504.02259, preprint, arXiv, August 25, 2025, https://doi.org/10.48550/arXiv.2504.02259.
  154. Brady Neal, “Introduction to Causal Inference,” accessed September 5, 2025, https://www.bradyneal.com/causal-inference-course.
  155. Kirill P. Kalinin et al., “Analog Optical Computer for AI Inference and Combinatorial Optimization,” Nature, September 3, 2025, 1–8, https://doi.org/10.1038/s41586-025-09430-z.
  156. Muntasir Hoq et al., “Pattern-Based Knowledge Component Extraction from Student Code Using Representation Learning,” arXiv:2508.09281, preprint, arXiv, August 12, 2025, https://doi.org/10.48550/arXiv.2508.09281.
  157. Zhangqi Duan et al., “Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems,” arXiv:2502.18632, preprint, arXiv, May 23, 2025, https://doi.org/10.48550/arXiv.2502.18632.
  158. Nancy Otero et al., “A Benchmark for Math Misconceptions: Bridging Gaps in Middle School Algebra with AI-Supported Instruction,” Discover Education 4, no. 1 (2025): 277, https://doi.org/10.1007/s44217-025-00742-w.
  159. Dario Di Palma et al., “Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M,” Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (New York, NY, USA), SIGIR ’25, July 13, 2025, 2582–86, https://doi.org/10.1145/3726302.3730178.
  160. Lecture11 - Protein Language Models - MLCB24, directed by ago • Manolis Kellis Manolis Kellis • 4 9K views • 10, n.d., accessed August 30, 2025, https://www.youtube.com/watch?v=uPoFdCUqBWk&list=PLypiXJdtIca4gtioEPLIExlAKvu64z7rc&index=11.
  161. David Alvarez-Melis et al., “Are GANs Overkill for NLP?,” Proceedings of the 36th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’22, November 28, 2022, 9072–84.
  162. Bytez.com et al., “Overtrained Language Models Are Harder to Fine-Tune | Re...,” July 17, 2025, https://bytez.com/docs/icml/44907/paper.
  163. Andreas Krause and Jonas Hübotter, “Probabilistic Artificial Intelligence,” arXiv:2502.05244, preprint, arXiv, February 7, 2025, https://doi.org/10.48550/arXiv.2502.05244.
  164. “[2501.00663] Titans: Learning to Memorize at Test Time,” accessed August 24, 2025, https://arxiv.org/abs/2501.00663.
  165. An Zhang et al., “On Generative Agents in Recommendation,” Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (New York, NY, USA), SIGIR ’24, July 11, 2024, 1807–17, https://doi.org/10.1145/3626772.3657844.
  166. Peter G. Chang et al., “Low-Rank Extended Kalman Filtering for Online Learning of Neural Networks from Streaming Data,” Proceedings of The 2nd Conference on Lifelong Learning Agents, November 20, 2023, 1025–71, https://proceedings.mlr.press/v232/chang23a.html.
  167. “Kernel density estimation,” Wikipedia, August 9, 2025, https://en.wikipedia.org/w/index.php?title=Kernel_density_estimation&oldid=1305066115.
  168. Bytez.com et al., “Roll the Dice & Look before You Leap: Going beyond the c...,” July 16, 2025, https://bytez.com/docs/icml/45769/paper.
  169. Gregor Bachmann and Vaishnavh Nagarajan, “The Pitfalls of Next-Token Prediction,” Proceedings of the 41st International Conference on Machine Learning, July 8, 2024, 2296–318, https://proceedings.mlr.press/v235/bachmann24a.html.
  170. Wenhan Xiong et al., “DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, ed. Martha Palmer et al. (Association for Computational Linguistics, 2017), https://doi.org/10.18653/v1/D17-1060.
  171. Leonardo Berti et al., “Emergent Abilities in Large Language Models: A Survey,” arXiv:2503.05788, preprint, arXiv, March 14, 2025, https://doi.org/10.48550/arXiv.2503.05788.
  172. David C. Krakauer et al., “Large Language Models and Emergence: A Complex Systems Perspective,” arXiv:2506.11135, preprint, arXiv, June 10, 2025, https://doi.org/10.48550/arXiv.2506.11135.
  173. Quentin RV Ferry et al., “Emergence and Function of Abstract Representations in Self-Supervised Transformers,” arXiv:2312.05361, preprint, arXiv, December 8, 2023, https://doi.org/10.48550/arXiv.2312.05361.
  174. Junhao Chen et al., “States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly,” arXiv:2407.11421, preprint, arXiv, July 16, 2024, https://doi.org/10.48550/arXiv.2407.11421.
  175. Fengbin Zhu et al., “TAT-LLM: A Specialized Language Model for Discrete Reasoning over Financial Tabular and Textual Data,” Proceedings of the 5th ACM International Conference on AI in Finance (New York, NY, USA), ICAIF ’24, November 14, 2024, 310–18, https://doi.org/10.1145/3677052.3698685.
  176. Atticus Geiger et al., “Causal Abstractions of Neural Networks,” Proceedings of the 35th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), NIPS ’21, December 6, 2021, 9574–86.
  177. Jacob Dunefsky et al., “Transcoders Find Interpretable LLM Feature Circuits,” paper presented at The Thirty-eighth Annual Conference on Neural Information Processing Systems, November 6, 2024, https://openreview.net/forum?id=J6zHcScAo0&noteId=aOaSSMtgfp.
  178. Bytez.com et al., “What Has a Foundation Model Found? Inductive Bias Reveal...,” July 15, 2025, https://bytez.com/docs/icml/44374/paper.
  179. Jacob Mitchell Springer et al., “Overtrained Language Models Are Harder to Fine-Tune,” arXiv:2503.19206, preprint, arXiv, March 28, 2025, https://doi.org/10.48550/arXiv.2503.19206.
  180. “Morph,” accessed August 23, 2025, https://www.morph.so/blog/trinity.
  181. Ke Weng et al., “Autoformalization in the Era of Large Language Models: A Survey,” arXiv:2505.23486, preprint, arXiv, July 3, 2025, https://doi.org/10.48550/arXiv.2505.23486.
  182. Sebastian Raschka PhD, “The Big LLM Architecture Comparison,” February 5, 2025, https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison.
  183. Roberto Cittadini et al., “Affective State Estimation Based on Russell’s Model and Physiological Measurements,” Scientific Reports 13, no. 1 (2023): 9786, https://doi.org/10.1038/s41598-023-36915-6.
  184. Mahnoor Hamid et al., “Sleep, Pericyte Subtypes and Cognitive Decline in Adults with and without Alzheimer’s Disease,” Brain, July 14, 2025, awaf161, https://doi.org/10.1093/brain/awaf161.
  185. “Formal Methods | DARPA,” accessed August 4, 2025, https://www.darpa.mil/research/research-spotlights/formal-methods.
  186. “expMath: Exponentiating Mathematics | DARPA,” accessed August 4, 2025, https://www.darpa.mil/research/programs/expmath-exponential-mathematics.
  187. “Morph,” accessed August 4, 2025, https://www.morph.so/blog/trinity.
  188. Dasha Metropolitansky and Jonathan Larson, “Towards Effective Extraction and Evaluation of Factual Claims,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ed. Wanxiang Che et al. (Association for Computational Linguistics, 2025), https://aclanthology.org/2025.acl-long.348/.
  189. David Basin et al., “It Takes a Village: Bridging the Gaps between Current and Formal Specifications for Protocols,” Commun. ACM 68, no. 8 (2025): 50–61, https://doi.org/10.1145/3706572.
  190. Denis Melanson et al., “Thermodynamic Computing System for AI Applications,” Nature Communications 16, no. 1 (2025): 3757, https://doi.org/10.1038/s41467-025-59011-x.
  191. Philippe Martin Wyder et al., “Robot Metabolism: Toward Machines That Can Grow by Consuming Other Machines,” Science Advances 11, no. 29 (2025): eadu6897, https://doi.org/10.1126/sciadv.adu6897.
  192. “Inspect,” Inspect, accessed July 31, 2025, https://inspect.aisi.org.uk/.
  193. Christopher McFadden, “MIT Unveils 3D Printer That Turns Food Scraps into Household Items,” Interesting Engineering, accessed July 31, 2025, https://interestingengineering.com/innovation/3d-printer-make-coffee-mug-from-banana-peels.
  194. Nasrin Mostafazadeh et al., “GLUCOSE: GeneraLized and COntextualized Story Explanations,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), ed. Bonnie Webber et al. (Association for Computational Linguistics, 2020), https://doi.org/10.18653/v1/2020.emnlp-main.370.
  195. Angelika Romanou et al., “CRAB: Assessing the Strength of Causal Relationships Between Real-World Events,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, ed. Houda Bouamor et al. (Association for Computational Linguistics, 2023), https://doi.org/10.18653/v1/2023.emnlp-main.940.
  196. Vy Vo et al., “ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), ed. Luis Chiruzzo et al. (Association for Computational Linguistics, 2025), https://doi.org/10.18653/v1/2025.naacl-long.49.
  197. “What Makes a Good AI Benchmark? | Stanford HAI,” accessed July 26, 2025, https://hai.stanford.edu/policy/what-makes-a-good-ai-benchmark.