Author Image

Hi, I am Jack

Jack Stehn

Data Scientist

Data Scientist driven by a passion to use data for positive change. My journey from homelessness to graduating top of my class at UC Berkeley fuels my commitment to impact-driven work. Worked as data scientist at a Bay Area startup. I optimized data pipelines, develop predictive models, and create in-house tools. Proficient in Python, SQL, AWS, NLP, machine learning, and data visualization. At SetSail, I slashed data-processing times by 75% and scaled capacity by 400%, enabling real-time sales insights. Mentored data science students at UC Berkeley, building an NGO chatbot, and recognized with the 2020-2021 Outstanding Undergraduate Award.

Python Programming
SQL & Database Interaction
Machine Learning & Statistics
Data Pipeline Development & ETL
Data Visualization
Communication & Collaboration

Skills

Experiences

1
SetSail Technologies

Sep 2021 - Jan 2023

SetSail Technologies is a sales automation platform that leverages data science to optimize sales processes. The company’s solutions provide insights into lead quality, enabling data-driven sales strategies.

Data Scientist

Sep 2021 - Jan 2023

Responsibilities:
  • Developed product that resulted in 33% faster ramp times, 16% higher revenue, and 15x ROI on sales programs.
  • Crafted NLP-powered predictive models, including custom-tuned sentiment analysis, to forecast sales lead conversion probability and expected value, empowering data-driven sales strategies.
  • Optimized model performance through iterative testing, feature engineering, and hyperparameter tuning.
  • Revamped data pipeline for near real-time analytics, slashing processing times by 75% and scaling data volume by 400%, enabling rapid, data-informed sales decision-making.
  • Architected scalable data infrastructure on AWS (S3, EC2, etc.) to support growing data needs and real-time model execution.
  • Championed data science best practices by developing well-documented, modular, and reusable Python and SQL code.
  • Collaborated effectively with engineering teams to integrate predictive models seamlessly into the sales automation platform.
  • Tools and techniques: AWS, Git, SQL, Logistic Regression, Decision Trees/Random Forests, SVMs, Naive Bayes, Linear Regression, Regularization, PCA, Neural Networks, Sentiment Analysis.

The SF-YEAH project is a multiphase study in the School of Public Health exploring the ways young people experiencing homelessness in San Francisco experience violence and find safety and resources. The project aims to identify the places, spaces, processes, and structures of safety and violence for young people in order to inform policies and best practices for serving vulnerable youth. The study data is in multiple formats and conditions, including qualitative, geospatial, administrative, and visual data, requiring the organization, curation, and conversion of disparate data into formats that are analyzable using spatial methodologies.

Data Science Lead Research Intern

Aug 2020 - May 2021

Responsibilities:
  • Led data science team in a comprehensive mixed-methods study of violence against homeless youth, combining geospatial analysis (ArcGIS), causal inference techniques, and qualitative analysis to drive evidence-based interventions.
  • Integrated insights from 45 in-depth walking interviews, conducting qualitative analysis to uncover lived experiences and contributing factors to violence.
  • Identified and visualized spatial patterns of violence using ArcGIS, informing targeted interventions by local government and non-profits.
  • Developed interactive dashboards (Tableau, Plotly) to communicate complex research findings, empowering policymakers, researchers, and service providers to design effective violence prevention strategies.
  • Partnered with experts in psychology and sociology to develop robust research methodologies, ensuring both rigor and relevance to the field.
2

3
DataGood @ Berkeley

Jan 2021 - Jul 2021

DataGood @ Berkeley focuses on creating an outlet for students to learn and utilize their data science skills in meaningful impact driven sectors. Members to design, develop, and ship out large-scale products to promote positive socio economic advancement.

Consulting Project Manager

Jan 2021 - Jul 2021

Responsibilities:
  • Led the design and implementation of an educational chatbot on fire ecology for CalSAFE, increasing user engagement with fire safety content.
  • Collaborated with stakeholders at CalSAFE to define project goals, gather requirements, and ensure alignment with their educational mission.
  • Designed the chatbot’s conversational logic and utilized NLP techniques (SpaCy) for natural language understanding.
  • Leveraged Python, pandas, BeautifulSoup, and scikit-learn to acquire, clean, process, and analyze social media data, identifying key themes and trends in fire ecology discussions.
  • Managed project scope, timelines, and effective communication with stakeholders, ensuring on-time delivery of the chatbot.

Contracted and worked independently to develop a web application used by Californian students to identify majors and related fields as various Californian universities.

Full Stack Developer (Contract)

May 2019 - Aug 2019

Responsibilities:
  • Independently designed and developed a comprehensive web application to guide college students’ program selection, utilizing a Python, JavaScript, PHP, PostgreSQL stack for robust functionality.
  • Built a 500+ university program database by leveraging web scraping (Beautiful Soup) and Python data cleaning, enabling streamlined searches and informed decision-making.
  • Consulted directly with stakeholders (California Community College District, career counselors) to understand user needs, tailoring the application for maximum impact in student guidance.
  • Created an intuitive management dashboard, empowering non-technical staff to seamlessly maintain and update program data, ensuring the application’s long-term sustainability.
4

Education

B.A. in Data Science (Domain Emphasis of Quantitative Social Science)
GPA: 4 out of 4
Taken Courses:
Course Name Total Credit Obtained Credit
Data, Inference, and Decisions 4 4
Data Structures and Algorithm 4 4
Artificial Intelligence 4 4
Game Theory in the Social Sciences 4 4
Linear Algebra 4 4
Language Models & Text Analysis 4 4
Extracurricular Activities:
  • Data Scholars Program - The Data Scholars Program provides spaces where students from many disciplines and backgrounds learn together, develop their data science skills, and co-create a diverse community.
  • DataGood - DataGood @ Berkeley focuses on creating an outlet for students to learn and utilize their data science skills in meaningful impact driven sectors.
  • Transfer Mentorship - Provide mentorship to incoming students in their transition to Cal.
  • Data Science Discovery Program - The Data Science Discovery Program connects undergraduates with hands-on, team-based opportunities in cutting-edge data research projects at UC Berkeley, government agencies, community groups, and entrepreneurial ventures.
A.S. Mathematics
GPA: 4 out of 4
Extracurricular Activities:
  • Academic Competition.
  • Transfer & Career Services.
  • Math Tutor.

Projects

Neural Networks By Hand
Neural Networks By Hand
Developer Dec 2020 - Jan 2021

An excercise of building a neural network by hand used to learn about how neural networks work and the math behind it.

Firewatch Chatbot
Firewatch Chatbot
Developer Jan 2021 - Jul 2021

An open source Chatbot that talks to Californians about Fire Ecology and Fire Safety over Facebook.

2018 Election Endorsements
2018 Election Endorsements
Author May 2021

An analysis of publically available endorsement data from the 2018 primaries.

Exploding Dice Exploration
Exploding Dice Exploration
Owner Oct 2020

A fun analysis on how the rule of “exploding dice” affects the properties of dice rolls. A quick romp in statistics and probability theory.

TransferBound
TransferBound
Developer May 2019 - Aug 2019

A tool so that students at Californian Community Colleges can find which programs are available at California’s public universities.

COVID-19 Infections Analysis
COVID-19 Infections Analysis
Developer May 2020

A quick analysis of factors that may predict the spread of COVID-19 performed a few weeks after the pandemic started.

Supreme Court Language Model
Supreme Court Language Model
Researcher Mar 2020

An analysis of a corpus of the decisions of the Warren Court. Used to analyze language used, model topics, and map similarity/centrality of decisions. Created a language model based on the corpus.

COVID-19 Time Series Analysis
COVID-19 Time Series Analysis
Student May 2021

Performed univariate time series analysis on a toy dataset of COVID cases. This is obfuscated data and a test of standard Time Series techniques for modeling signal, seasonality, and noise.

Achievements

B.A. Data Science Highest Distinction (Summa Cum Laude)

2020-2021 Outstanding Data Science Undergraduate award at University of California, Berkeley.

Featured Work By State of California Community Colleges Chancelor's Office

Winner in Academic Competition

Best Dancer Award