Trustworthy Machine Learning

Graduate course, Penn State, College of IST, 2023


Machine learning techniques are widely used to solve real-world problems. However, a key challenge is that they are vulnerable to various security and privacy attacks, e.g., adversarial examples, data poisoning attacks, and membership inference attacks. In this course, we will discuss existing attacks and state-of-the-art defenses against those attacks.


This course requires the knowledge of basic machine learning (e.g., an undergraduate machine learning course) as well as linear algebra and calculus.


  • Instructor: Jinyuan Jia,
  • Teaching Assistant: Hangfan Zhang,
  • Time: TuTh 3:05 PM - 4:20 PM
  • Location: Leonhard Bldg 203
  • Office Hours: Jinyuan Jia: Wednesday 1:00 pm - 2:00 pm, E325 Westgate; Hangfan Zhang: Thursday 1:30-2:30 pm, E301 Westgate

Tentative Schedule (Subject to Change)

108/22Course overview   
108/24Adversarial examples in image domain (white-box)1. Towards Evaluating the Robustness of Neural Networks
208/29Adversarial examples in image domain (black-box)1. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack
2. (Optional) Delving into Transferable Adversarial Examples and Black-box Attacks
208/31Empirical defenses against adversarial examples1.Towards Deep Learning Models Resistant to Adversarial Attacks  
309/05Certified defenses against adversarial examples1. Certified Adversarial Robustness via Randomized Smoothing  
309/07Adversarial examples in (large) language models and their defenses1. Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks
2. (Optional) Certified Robustness to Text Adversarial Attacks by Randomized [MASK]
409/12Adversarial examples for good use1. AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning
2. (Optional) Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
409/14Data poisoning attacks to classifiers1. Poisoning Attacks against Support Vector Machines
2. (Optional) Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
509/19Data poisoning attacks to foundation models1. PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
2. (Optional) Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning
Speakers: Sai Naveen Katla, Salika Dave 
509/21Model poisoning attacks to federated learning1. Local Model Poisoning Attacks to Byzantine-Robust Federated Learning
2. (Optional) FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping
Speakers: Hari Pranav Arun Kumar, Manasa Pisipati 
609/26Certified defenses against data poisoning attacks1. Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
2. (Optional) Certified Robustness of Nearest Neighbors against Data Poisoning Attacks
3. (Optional) Certified Defenses for Data Poisoning Attacks
Speakers: Wei Zou, Yurui Chang 
609/28Backdoor attacks in image domain1. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
2. (Optional) Trojaning Attack on Neural Networks
Speakers: Yilong Wang, Minhua Lin 
710/03Backdoor attacks to pre-trained foundation models1. BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
2. (Optional) Poisoning and Backdooring Contrastive Learning
710/05Defending against backdoor attacks in image domain1. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
2. (Optional) STRIP: A Defence Against Trojan Attacks on Deep Neural Networks
Speaker: Ruimeng(Raymond) Shao 
810/10Backdoor attacks to (large) language models and their defenses1. Backdoor Pre-trained Models Can Transfer to All
2. (Optional) PICCOLO: Exposing Complex Backdoors in NLP Transformer Models
Speakers: Harish Kolla, Girish Nagarajan 
810/12Privacy attacks to image classifiers1. Membership Inference Attacks against Machine Learning Models
2. (Optional) Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
Speakers: Srija Akula, Hanzheng Wang 
910/17Privacy attacks to federated learning1. Deep Leakage from Gradients
2. (Optional) Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Speakers: Yaopei Zeng and Yuanpu Cao 
910/19Privacy attacks to (large) language models and diffusion models1. Extracting Training Data from Large Language Models
2. (Optional) Extracting Training Data from Diffusion Models
Speaker: Keaton Yukio Kraiger, Vishal Ahir 
1010/24Defending against privacy attacks1. Deep Learning with Differential Privacy
2. (Optional) SecureML: A System for Scalable Privacy-Preserving Machine Learning
Speaker: Yanting Wang 
1010/26Model stealing attacks1. Stealing Machine Learning Models via Prediction APIs
2. (Optional) Stealing Hyperparameters in Machine Learning
1110/31Intellectual property protection1. Prediction Poisoning: Utility-Constrained Defenses Against Model Stealing Attacks
2. (Optional) Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
3. (Optional) Certified Neural Network Watermarks with Randomized Smoothing
1111/02Image watermarking and its security1. Hidden: Hiding data with deep networks
2. (Optional) Evading Watermark based Detection of AI-Generated Content
1211/07Prompt injection attacks to large language models1. Ignore Previous Prompt: Attack Techniques For Language Models
2. (Optional) Prompt Injection Attacks and Defenses in LLM-Integrated Applications
1211/09Machine generated text detection1. A Watermark for Large Language Models  
1311/14Deepfakes1. Generative Adversarial Nets
2. Stable Diffusion
1311/16Safety of LLMs1. Jailbroken: How Does LLM Safety Training Fail?
2. (Optional) Universal and Transferable Adversarial Attacks on Aligned Language Models
Speaker: Bochuan Cao 
14 Thanksgiving   
1511/28Project PresentationGroup 1: Yurui Chang and Yuanpu Cao  
1511/30Project PresentationGroup 1: Yanting Wang and Wei Zou
Group 2 Hanzheng Wang and Srija Akula
Group 3: Yaopei Zeng
1612/05Project PresentationGroup 1: Harish Kolla and Nagarajan Girish
Group 2: Sai Naveen Katla and Salika Dave
Group 3: HARI PRANAV ARUN KUMAR and Manasa Pisipati
1612/07Project PresentationGroup 1: Ruimeng Shao
Group 2: Yilong Wang and Minhua Lin
Group 3 Vishal Ahir and Keaton Kraiger

Paper Review

  • Deadline: Monday or Wednesday 11:59 pm (EST). One paper per week. Please send your review to this email address:, and send it in a single thread (by replying). Note that ChatGPT should not be used to write the review.


  • Students can form groups of at most 2 students for the lecture and class project.
  • Lecture: Prepare the slides (or use others’ with proper citations) and give a lecture. Choosing a topic for lecture. A group sends three preferred dates to by 11:59 pm (EST), 08/31.

  • Class project: The project should be related to machine learning (published work cannot be used as the course project).

Grading Policy

  • 50% project
  • 25% reading assignment
  • 10% class participation and quiz
  • 15% class presentation

Final grade cutoff:

  • A [93%, 100%]
  • A- [90%, 93%)
  • B+ [87%, 90%)
  • B [83%, 87%)
  • B- [80%, 83%)
  • C+ [77%, 80%)
  • C [70%, 77%)
  • D [60%, 70%)
  • F [0%, 60%)

Late Submission Policy

  • 10% deduction for every 24 hours late.
  • No more late submission is accepted after 3 days.
  • Please email the instructor regarding extensions for special cases.


Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity, respect other students’ dignity, rights and property, and help create and maintain an environment in which all can succeed through the fruits of their efforts.

Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.


Penn State welcomes students with disabilities into the University’s educational programs. Every Penn State campus has an office for students with disabilities. Student Disability Resources (SDR) website provides contact information for every Penn State campus ( For further information, please visit Student Disability Resources website (

In order to receive consideration for reasonable accommodations, you must contact the appropriate disability services office at the campus where you are officially enrolled, participate in an intake interview, and provide documentation: See documentation guidelines ( If the documentation supports your request for reasonable accommodations, your campus disability services office will provide you with an accommodation letter. Please share this letter with your instructors and discuss the accommodations with them as early as possible. You must follow this process for every semester that you request accommodations.


Many students at Penn State face personal challenges or have psychological needs that may interfere with their academic progress, social development, or emotional wellbeing. The university offers a variety of confidential services to help you through difficult times, including individual and group counseling, crisis intervention, consultations, online chats, and mental health screenings. These services are provided by staff who welcome all students and embrace a philosophy respectful of clients’ cultural and religious backgrounds, and sensitive to differences in race, ability, gender identity and sexual orientation.

Counseling and Psychological Services at University Park (CAPS) ( 814-863-0395

Counseling and Psychological Services at Commonwealth Campuses (

Penn State Crisis Line (24 hours/7 days/week): 877-229-6400 Crisis Text Line (24 hours/7 days/week): Text LIONS to 741741


Consistent with University Policy AD29, students who believe they have experienced or observed a hate crime, an act of intolerance, discrimination, or harassment that occurs at Penn State are urged to report these incidents as outlined on the University’s Report Bias webpage (