Matt Fredrikson Associate Professor Office 2126 Mehrabian Collaborative Innovation Center Email mfredrik@cmu.edu Department Computer Science Department Research Interests Security and Privacy Formal Methods Programming Languages Advisees Saranya Vijayakumar Andy Zou Research/Teaching Statement My research is directed at understanding fundamental security and privacy issues that lead to failures in real systems. Some of the key outstanding challenges in this area lie in figuring out why promising theoretical approaches oftentimes do not translate into effective defenses. Much of my work is concerned with developing formal analysis techniques that provide insight into the problems that might exist in a system, building countermeasures that give provable guarantees, and measuring the effectiveness of these solutions in real settings. Most of my current research focuses on issues of privacy and data confidentiality. To an even greater extent than with other security issues, our scientific understanding of this area lags far behind the need for rigorous defensive strategies. I believe that in order to reason effectively about privacy in software systems, we need ways to characterize and limit adversarial uncertainty and inference. To address this, I am pursuing several threads related to this topic. Privacy in machine-learning applications: Predictive models generated by machine learning methods are used extensively in modern applications. They allow analysts to refine complex data sources into succinct programs that produce valuable information about underlying trends and patterns. A large body of previous research examines the risks that arise when these data sources contain sensitive or proprietary information, and are leaked either in their original form or after "anonymization". Much less well-understood are the risks that arise when machine learning models trained over these data sources are made available through applications. Although recent frameworks like differential privacy have started to shed light on this issue, it is often unclear how the underlying mathematical guarantees offered by these frameworks impact specific, tangible privacy concerns in real applications. The goal of this work is to develop a precise characterization of this threat, so that we can identify troublesome applications before they are published, as well as understand how to design and apply countermeasures that prevent it. Support for privacy-aware programming: Increasingly often, applications rely on detailed personal data collected from users---despite growing awareness among users and administrators of the risks involved with disclosing such information. A number of theoretical frameworks have emerged that give precise notions of acceptable disclosure, allowing developers to provide personal data-driven functionality while still placing hard limits on the degree to which confidentiality could be breached. The main appeal of these frameworks is their ability to provide rigorous guarantees, but subtle implementation mistakes often obviate these guarantees in practice. The goal of this work is to develop formal methods and language-based techniques that allow those without domain expertise to write correct, privacy-aware programs. Central to my approach has been developing automated reasoning techniques for logics with counting. These logics allow us to reduce adversarial uncertainty of confidential state to a series of constrained counting problems that are amenable to formal methods. Making effective use of this reduction in large-scale program analysis is ongoing work. Analysis of probabilistic programs: Probabilistic programming languages allow developers to specify generative models of probabilistic processes, and to condition these models on observed data. The "compiler" performs inference on the model specified in such a program, computing a posterior distribution over its parameters. By separating the task of model specification from inference, these languages make it possible for developers who lack domain expertise to exploit sophisticated statistical techniques in their applications. As such, they are often proposed as a promising approach to bring machine learning "to the masses". However, by forcing developers to write probabilistic assumptions and goals in a precise language with well-defined semantics, they also make it possible to reason formally about the properties embodied by their trained models. The goal of this work is to build analysis techniques that allow developers to make strong guarantees about the results of inference, and how they relate to security and privacy goals in machine learning applications. Publications Preprint A Recipe for Improved Certifiable Robustness: Capacity and Data 2023 Hu K, Leino K, Wang Z, Fredrikson M Preprint Is Certifying $\ell_p$ Robustness Still Worthwhile? 2023 Mangal R, Leino K, Wang Z, Hu K, Yu W, Pasareanu C, Datta A, Fredrikson M Preprint Representation Engineering: A Top-Down Approach to AI Transparency 2023 Zou A, Phan L, Chen S, Campbell J, Guo P, Ren R, Pan A, Yin X, Mazeika M, Dombrowski A-K, Goel S, Li N, Byun MJ, Wang Z, Mallen A, Basart S, Koyejo S, Song D, Fredrikson M, Kolter JZ, Hendrycks D Preprint Transfer Attacks and Defenses for Large Language Models on Coding Tasks 2023 Zhang C, Wang Z, Mangal R, Fredrikson M, Jia L, Pasareanu C Preprint Universal and Transferable Adversarial Attacks on Aligned Language Models 2023 Zou A, Wang Z, Carlini N, Nasr M, Kolter JZ, Fredrikson M
Preprint A Recipe for Improved Certifiable Robustness: Capacity and Data 2023 Hu K, Leino K, Wang Z, Fredrikson M
Preprint Is Certifying $\ell_p$ Robustness Still Worthwhile? 2023 Mangal R, Leino K, Wang Z, Hu K, Yu W, Pasareanu C, Datta A, Fredrikson M
Preprint Representation Engineering: A Top-Down Approach to AI Transparency 2023 Zou A, Phan L, Chen S, Campbell J, Guo P, Ren R, Pan A, Yin X, Mazeika M, Dombrowski A-K, Goel S, Li N, Byun MJ, Wang Z, Mallen A, Basart S, Koyejo S, Song D, Fredrikson M, Kolter JZ, Hendrycks D
Preprint Transfer Attacks and Defenses for Large Language Models on Coding Tasks 2023 Zhang C, Wang Z, Mangal R, Fredrikson M, Jia L, Pasareanu C
Preprint Universal and Transferable Adversarial Attacks on Aligned Language Models 2023 Zou A, Wang Z, Carlini N, Nasr M, Kolter JZ, Fredrikson M