π Home
π About
πΊ Programs
Overview
π§ͺ Open Source Research Experience
πͺΊ Open Source Incubator Fellowship
π Open Source Education
π Resources
π Blog
πͺ Events
reproducibility
Understanding Data Leakage in Machine Learning: A Focus on TF-IDF
Hello again! This is my final blog post, and I will be discussing the second material I created for the 2024 Summer of Reproducibility Fellowship. As you may recall from my first post, I am working on the Exploring Data Leakage in Applied ML: Reproducing Examples of Irreproducibility project with Fraida Fund and Mohamed Saeed as my mentors.
Kyrillos Ishak
Last updated on Sep 5, 2024
SummerofReproducibility24
Reflecting on the ScaleRep Project: Achievements and Insights
Reproducing and validating fixes for throttling bugs in HDFS improved system stability and performance.
Shuang Liang
Last updated on Sep 2, 2024
SoR'24
Final Blog: ML in Detecting and Addressing System Drift
Hello! I’m Joanna! I have been contributing to the ML in Detecting and Addressing System Drift project under the mentorship of Ray Andrew Sinurat and Sandeep Madireddy. My project aims to design a pipeline to evaluate drift detection algorithms on system traces.
Joanna Cheng
Last updated on Aug 31, 2024
osre24
,
reproducibility
Final Blogpost: Reproducibility in Data Visualization
Hello everyone! I’m Triveni, a Master’s student in Computer Science at Northern Illinois University (NIU). I’m excited to share my progress on the OSRE 2024 project Categorize Differences in Reproduced Visualizations focusing on data visualization reproducibility.
Triveni Gurram
Last updated on Sep 4, 2024
SoR
Final Blogpost: Drift Management Strategies Benchmark
Background Hello there! I’m William and this is my final blog for my proposal “Developing A Comprehensive Pipeline to Benchmark Drift Management Approaches” under the mentorship of Ray Andrew and Sandeep Madireddy under the LAST project.
William Nixon
Last updated on Aug 27, 2024
Reproducing and addressing Data Leakage issue : Duplicates in dataset
Hello! In this blog post, I will explore a common issue in machine learning called data leakage, using an example from the paper: Benedetti, P., Perri, D., Simonetti, M., Gervasi, O.
Kyrillos Ishak
Last updated on Aug 24, 2024
SummerofReproducibility24
Final blog: Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon
The project aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure
Archit Dabral
,
RaΓΌl Sirvent
Last updated on Aug 24, 2024
SoR
Final Blogpost: HDEval's LLM Benchmarking for HDL Design
Introduction Hello everyone! I’m Ashwin Bardhwaj, an undergraduate student studying at UC Berkeley. As part of Micro Architecture Santa Cruz (MASC) my proposal under the mentorship of Jose Renau and Sakshi Garg looks to create a suite of benchmark programs for HDEval.
Ashwin Bardhwaj
Last updated on Aug 24, 2024
Deriving Realistic Performance Benchmarks for Python Interpreters
Hi, I am Mrigank. I am one of the Summer of Reproducibility fellows for 2024, and I will be working on deriving realistic performance benchmarks for Python interpreters with Ben Greenman from the University of Utah.
Mrigank Pawagi
Last updated on Aug 19, 2024
Final Blog: FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning
Background Hello, Iβm Lihaowen (Jayce) Zhu, a 2024 SoR contributor for the FEP-bench project, under the mentorship of Yuyang (Roy) Huang. Before we started, let’s recap the goal of our project and our progress until mid term.
Lihaowen (Jayce) Zhu
Last updated on Aug 19, 2024
»
Cite
×