cv

General Information

Name Shuangqi LI (李 双琪)
Email shuangqi.li@epfl.ch
Languages Chinese (Native), English (Fluent), French (Basic)

Education

  • 2022.09 - 2027

    Lausanne, Switzerland

    Ph.D.
    EPFL (Swiss Federal Institute of Technology in Lausanne)
    Machine Learning
  • 2020.09 - 2022.07

    Lausanne, Switzerland

    Master
    EPFL (Swiss Federal Institute of Technology in Lausanne)
    Data Science
  • 2019.09 - 2020.06

    Remote

    Master
    (Quit due to visa issues and COVID-19.)
    University of California, San Diego
    Computer Science
  • 2015.09 - 2019.06

    Chengdu, China

    Bachelor
    University of Electronic Science and Technology of China
    Microelectronic Science and Engineering

Work Experience

  • 2021.07 - 2021.09

    Zurich, Switzerland

    Research Intern
    Oracle Labs
    Developed a time series model that detects anomalous Linux sessions in the cloud servers.
  • 2018.10 - 2019.02

    Beijing, China

    Algorithm Engineering Intern
    DiDi (China's largest taxi-hailing platform)
    Developed an algorithm for learning road segment weights from historical ride data, significantly improving route planning quality for ride-hailing services in production environment.

Projects

  • 2026.01 - Present

    EPFL

    Ongoing
    research
    Dense Credit Assignment for RL via Token-Level Data Attribution
    Proposed a novel data attribution framework for reinforcement learning to estimate the marginal contribution of individual tokens to total rewards for GRPO/DAPO-style algorithms. Achieved fine-grained dense credit assignment, effectively mitigating the reward sparsity limitations in reasoning and agentic RL training.
  • 2025.08 - 2026.01

    EPFL

    Under review
    for ICML 2026
    Scalable Training Data Attribution for Large Language Models
    Developed a novel, highly scalable method for training data attribution in large-scale models by exploiting the low-rank properties of gradients, cutting storage cost and query latency 20x. Enabled, for the first time, the ability to efficiently trace the output of a 70-billion-parameter LLM back to individual examples in their SFT training data.
  • 2025.02 - 2025.09

    EPFL

    ICLR 2026
    Learning to Weight Parameters for Training Data Attribution
    Identified the heterogeneity of attribution signal across parameters/layers in diffusion models and LLMs. Proposed a method to re-weight layers, boosting attribution accuracy and enabling interpretable attribution.
  • 2025.03 - 2025.07

    EPFL

    LLM Development from Scratch
    Collaborative project with 25 PhD students to build a large language model from scratch. Engineered the pre-training pipeline, including environment setup and investigating optimal data mixing recipes for the training corpus. Implemented and validated the evaluation suite by reproducing the SmolLM2 benchmark to establish a robust performance baseline.
  • 2024.01 - 2024.10

    EPFL

    ICLR 2025
    Spotlight
    Enhancing Text-to-Image Generation with Reliable Random Seeds
    Identified the significant role of initial noise in text-to-image inconsistencies for diffusion models. Proposed a method that identifies reliable random seeds to improve text-to-image generation, leveraging reliable seeds to synthesize high-quality data for fine-tuning diffusion models.
  • 2023.02 - 2024.03

    EPFL

    TMLR 2024
    Poster at ICLR 2025
    Controlling the Fidelity and Diversity of Deep Generative Models
    Proposed an approach to bias generative models towards generating data with either enhanced fidelity or increased diversity. Enabled model training with data of better fidelity or diversity.
  • 2021.09 - 2022.02

    EPFL

    Semester
    project
    Interlock-Free Multi-Aspect Rationalization for Text Classification
    Proposed a multi-stage training method to alleviate the interlocking issue in training interpretable models.

Teaching Experience

Honors & Awards

  • 2018.09
    National Scholarship
  • 2018.05
    China Collegiate Programming Contest - Gold Medal
  • 2018.03
    China Collegiate Computing Contest - First Prize
  • 2017.12
    First-class People's Scholarship
  • 2017.10
    ACM International Collegiate Programming Contest (Asia Regional) - Bronze Medal
  • 2017.04
    China Collegiate Computing Contest (Group Programming Ladder Tournament) - First Prize
  • 2016.12
    First-class People's Scholarship

Skills

Programming
Python
C++
CUDA
Coding competition
Frameworks & Tools
PyTorch
Docker
Git
Linux
PySpark
Cursor
Claude Code

Publications