summary

My experiences, including navigating homelessness before graduating top of my class at Cal, gave me a unique perspective and a strong desire to apply data science thoughtfully. As a Data Scientist / ML Engineer, I enjoy connecting the dots: understanding the need, designing scalable data solutions, building effective ML models (like predictive, NLP, or deep learning), and deploying them responsibly. I believe the best results come from open collaboration with users and colleagues, grounded in solid engineering practices (Python, SQL, Cloud, testing, etc.) and a commitment to ethical development. I find real satisfaction in untangling complexity to create useful tools.

Skills

Programming & Core Data Skills

  • Python (Pandas, NumPy, SciPy)
  • SQL (PostgreSQL, BigQuery)
  • R
  • Bash/Shell Scripting
  • Excel for Analysis
  • Statistical Analysis
  • Feature Engineering
  • Unstructured Data Analysis

Machine Learning - Predictive & Classical

  • Predictive Modeling
  • Predictive Risk Modeling
  • Classification & Regression
  • Clustering Algorithms
  • Recommender Systems
  • Bayesian Methods (PyMC3)
  • ML Frameworks (Scikit-learn)
  • Model Eval & Selection
  • Time Series Analysis/Forecasting

Deep Learning & Advanced AI

  • Deep Learning (PyTorch, TF/Keras)
  • Neural Nets (CNN, RNN, Transformers)
  • Reinforcement Learning Concepts

Natural Language Processing (NLP)

  • NLP (Sentiment, Topic Models)
  • Text Vectorization (TF-IDF, W2V)
  • NLP Libs (spaCy, NLTK, HuggingFace)
  • Info Extraction & Entity Recog.

Generative AI & LLMs

  • Prompt Engineering
  • LLM Fine-tuning
  • Retrieval Augmented Generation
  • LLM Orchestration (LangChain)
  • LLM Evaluation & Benchmarking
  • Open-Weights Models (Llama)
  • LLM API Integration (OpenAI)
  • LLM Deployment Considerations

Data Engineering & Cloud Platforms

  • AWS Cloud (EC2, S3, Lambda, Athena)
  • GCP Cloud (BigQuery, GCS)
  • Data Pipelines (Dagster, dbt, dlt)
  • Data Warehousing (BigQuery, Athena)
  • Database Modeling (Star, Snowflake)
  • ETL/ELT Design & Implementation
  • API Design & Integration
  • Docker & Containerization
  • MLOps Principles & Tools

Software Engineering & DevOps Practices

  • API Utilization
  • Production Code Quality/Practices
  • Object-Oriented Programming (OOP)
  • Software Design Patterns
  • Testing (Unit, Intg, E2E, Pytest)
  • CI/CD Pipelines (GitHub Actions)
  • Version Control (Git, GitHub)
  • Agile Methods (Scrum, Kanban)
  • Code Review & Collaboration

Data Visualization & BI Tools

  • Plotly (Dash for web apps)
  • Tableau
  • Looker Studio
  • Matplotlib & Seaborn Graphics
  • Geospatial Viz (ArcGIS, GeoPandas)
  • Interactive Dashboard Design
  • Business Intelligence Reporting

Research, Experimentation & Ethics

  • Experimental Design (A/B,)
  • Causal Inference Methods
  • Quantitative & Qualitative Data
  • Survey Design & Analysis
  • Statistical Significance Tests
  • Ethical AI & Data Bias Analysis
  • Data Privacy & Security

Collaboration & Professional Skills

  • Team Leadership & Mentoring
  • Cross-functional Collaboration
  • Stakeholder Comms & Management
  • Requirements Elicitation/Define
  • Project Management (Agile)
  • Technical Documentation/Reporting
  • Presenting (Tech & Non-Tech)
  • Problem Solving & Critical Think

Familiar Technologies & Other Tools

  • Java
  • JavaScript
  • HTML/CSS
  • C++ (basics)
  • PHP (basics)
  • Databases (PostgreSQL, MySQL)
  • Web Scraping (Beautiful Soup, Selenium)
  • Apache Spark (PySpark basics)
  • Jupyter Notebooks & Lab

Work Experience (4)

Sep 2024 - Current
Data Scientist (ML, Data Engineering, MLOps)
Caliber Public Schools
Richmond, California
  • Owned full data lifecycle (DS, DE, ML) as sole data scientist, leading initiatives and working directly with heads of every department.

  • Modernized data infrastructure on GCP (BigQuery, GCS) with automated pipelines (Dagster, dbt, dlt).

  • Developed and deployed predictive risk models (Explainable ML, Deep Learning) for staff turnover.

  • Created interactive models and visualizations enabling leadership to make staffing decisions based on adjustable risk tolerance levels.

  • Updated organization's data security policy designed and delivered data literacy training modules for colleagues.

  • Partnered with leadership on high-impact, data-driven solutions; Presented findings to stakeholders including the board.

Aug 2021 - Feb 2023
Data Scientist (ML, Data Engineering, MLOps)
SetSail
San Mateo, California
  • Contributed to product achieving 33% faster ramp times, 16% higher revenue, & 15x ROI.

  • Developed/deployed production ML models predicting large B2B deal closure probability and forecasting value, leveraging unstructured text data (email content).

  • Performed causal analysis identifying sales behaviors that may lead to successful outcomes.

  • Led data pipeline overhaul on AWS, reducing processing times by 75% & scaling data handling 4x (TBs scale), enabling future LLM integration.

  • Architected scalable data solutions (star schema, optimized DAGs, async ingestion).

  • Championed SE best practices (testing, CI/CD, Agile, code quality).

  • Collaborated with Eng/Product/Support & worked directly with enterprise end-users.

Aug 2021 - Feb 2023
Data Science Research Team Lead
UC Berkeley School of Public Health
Berkeley, California
  • Led data science components in mixed-methods studies (e.g., violence against homeless youth), applying causal inference and qualitative analysis.

  • Analyzed diverse unstructured and non-traditional datasets (qualitative interviews, geospatial data, text corpora, hand-drawn maps) requiring development of novel data processing and analytical approaches.

  • Performed geospatial analysis (ArcGIS) identifying & visualizing spatial violence patterns.

  • Created interactive dashboards (Tableau, Plotly) to communicate findings to stakeholders.

  • Collaborated across disciplines (public health, psych, soc) ensuring ethical, robust research.

May 2019 - Aug 2019
Full Stack Engineer
Los Medanos College
Pittsburg, California

Independently designed and developed a web application guiding student program selection.

  • Developed full-stack web app (Python, JavaScript, PHP, PostgreSQL).

  • Built 500+ university program database via web scraping (Beautiful Soup) & data cleaning.

  • Consulted with college stakeholders (district, counselors) for user needs.

  • Created intuitive management dashboard for non-technical staff.

Projects (4)

Dagster People Team Pipeline
https://github.com/jstehn/dagster-people-team-pipeline
  • Python
  • Dagster
  • Data Pipeline
  • ETL
  • GCP
  • BigQuery
  • Data Engineering
  • Built automated ETL pipeline for HR/People analytics.

  • Integrated multiple data sources using APIs or other methods.

  • Loaded processed data into BigQuery data warehouse.

Firewatch Chatbot (CalSAFE DataGood)
https://github.com/jstehn/firewatch-chatbot
  • Python
  • Chatbot
  • NLP
  • Intent Classification
  • spaCy
  • scikit-learn
  • Flask
  • Heroku
  • Facebook API
  • DataForGood
  • Jupyter
  • Built NLP intent classification model using spaCy (vectorization) & scikit-learn (Logistic Regression).

  • Manually collected & classified ~450 question dataset with team.

  • Designed bot logic providing responses based on classified intent.

  • Integrated with Facebook API & planned deployment via Flask/Heroku.

Warren Court Language Model
https://github.com/jstehn/warren-court-language-model
  • Python
  • NLP
  • Language Modeling
  • Jupyter
  • Data Analysis
  • History
  • Applied language modeling to analyze patterns in historical legal text.

  • Focused on personal learning and exploration of NLP techniques.

Neural Network By Hand (Berkeley Exercise)
https://github.com/jstehn/nn-by-hand-188
  • Python
  • Neural Networks
  • Machine Learning
  • NumPy
  • Education
  • Implemented core NN components (layers, activation, backpropagation) based on class skeleton.

  • Focused on understanding mathematical foundations over code flexibility.

  • Solidified grasp of fundamental deep learning concepts.

Volunteer

Sep 2024 - Current
Impact Fellow (Placement @ Caliber Public Schools)
Education Pioneers

Selected for national fellowship applying leadership/management skills to advance educational equity via capacity-building projects and leadership development.

  • Applying data science & leadership skills to advance educational equity.

  • Building organizational capacity through strategic data projects at placement site.

  • Engaging in rigorous leadership development programming with diverse cohort.

Dec 2023 - Current
Data Team Lead
San Francisco Gay Men's Chorus

Provide data-driven insights for policy-making and organizational growth through survey creation and analysis (qualitative & quantitative).

  • Led volunteer team providing data analysis for organizational strategy.

  • Designed & analyzed surveys (qual/quant) informing policy & growth.

  • Presented data-driven insights to chorus leadership.

Event Producer
Bearrison Street Fair
  • Co-produced large-scale (~10k attendees) LGBTQ+ community street fair, overseeing all aspects from planning through execution.

  • Managed complex logistics, ~100 vendor relations, and multi-stage entertainment programming.

  • Led fundraising efforts, securing over $90k in sponsorships & donations, contributing to event profitability.

  • Coordinated hundreds of diverse stakeholders including volunteers, performers, operations teams, city agencies, and non-profits.

Jan 2018 - Current
Mentor and Trans Support Leader
San Francisco Gay Men's Chorus

Support members through mentorship and leadership within trans support initiatives, coordinating meetings and events.

  • Provided mentorship and peer support to chorus members.

  • Led coordination for trans member support group meetings and events.

  • Contributed to fostering an inclusive environment within the organization.

Transfer Mentor
UC Berkeley Division of Computing, Data Science, and Society
  • Mentored incoming transfer students transitioning into UC Berkeley Data Science.

  • Assisted students in developing data science skills & navigating coursework.

  • Fostered community and peer networking during remote learning (pandemic).

Jul 2017 - May 2019
Student Ambassador (Transfer & Career Services)
Los Medanos College

Supported transfer/career programs through data analysis, marketing, peer training, and event coordination.

  • Analyzed student transfer data (SQL, R, Excel) to inform program development.

  • Led marketing committee managing social media, web content, and outreach.

  • Presented transfer/career information via public speaking & workshops.

  • Trained new student employees on department policies & procedures.

  • Organized large campus events coordinating multiple stakeholders.

Education (1)

Aug 2019 - May 2021
Bachelor of Arts
Data Science
UC Berkeley
  • COMPSCI C8: Foundation of Data Science
  • COMPSCI 61A: Structure & Interpretation of Computer Programs
  • COMPSCI 61B: Data Structures
  • COMPSCI 188: Artificial Intelligence
  • DATA C100: Principles & Techniques of Data Science
  • DATA C102: Data, Inference, and Decisions
  • STAT 134: Concepts of Probability
  • STAT 153: Time Series Analysis
  • STAT 89A: Linear Algrebra for Data Science
  • DEMOG 110: Population Analaysis
  • HISTORY C184D: Ethics of Data
Grade: 4.00/4.00

BA, Data Science (Highest Distinction, 4.0 GPA). Domain: Quantitative Social Science.

Awards (1)

May 2021
2020-2021 Outstanding Data Science Undergraduate Award
UC Berkeley

Recognized for excellence in Data Science undergraduate studies, research, and community contributions at UC Berkeley.

Interests (2)

Technology & Tinkering

  • AI Research & Trends
  • Generative AI (Local Models)
  • Large Language Models (LLMs)
  • Raspberry Pi Projects

Creative & Community

  • Singing
  • Playing Instruments
  • Dancing
  • Language Learning
  • Community Engagement

References

“ Chosen from over 50 applicants and 5 finalists, Jack joined our organization at a pivotal moment and has been an invaluable team member ever since. As soon as they joined, they immediately took initiative on a complex survey design and analysis project that was critical to our success, bringing both expertise and ownership from day one. Jack’s approach is highly collaborative and mission-driven. They actively engage with departments across the organization, listen closely to their needs, and build thoughtful, scalable solutions—including dashboards, data quality reports, and automated systems that allow staff to focus on their core work with students. Jack is systems-oriented and consistently plans for long-term, sustainable outcomes. One standout example: Jack streamlined a survey and analysis process that previously took our team a month, developing a replicable system that now delivers actionable insights in just a few days. This perfectly captures their ability to problem-solve proactively and significantly boost our efficiency and decision-making. Beyond their technical and strategic skills, Jack is reliable, resourceful, and generous with their knowledge. They’ve led internal trainings to empower colleagues, handle ambiguity with ease, and bring a positive, solution-focused mindset to every challenge. Jack is an easy choice for any team seeking a results-driven, collaborative data scientist who elevates both projects and people. I recommend them without hesitation. ”
Brian Jimenez (Managed Jack directly at Caliber Public Schools) - Managing Director of People
“ I had the pleasure of interviewing Jack before they joined the SetSail team-- I gave them a 4 out of 4. It's important to note that on our hiring scale, a 4 meant "I will flip the table if you don't hire this person." One thing that stuck with me after the interview, and which was reaffirmed while we worked together at SetSail, is Jack's enthusiasm for data science and their love of learning (and sharing what is learned). Not only is Jack an extremely capable engineer and data scientist, they are also a collaborative team player who elevates everyone around them. Their contributions at SetSail were always valuable to the company-- whether it was their huge role in our data pipeline migration, or countless bug fixes and feature implementations that directly improved our user experience, you could always count on Jack to get the job done on time, with clean code, and great documentation. I wholeheartedly recommend Jack for any data science position—they would be an invaluable addition to any team. ”
Darrin Gilkerson (Worked with Jack on different teams at SetSail) - Software Engineer at QVT Financial
“ Jack is a sharp, human-first data person. They possess incredible passion for doing what is right and making good science happen. I highly recommend their work and their presence. ”
Ollie Downs (Studied with Jack at UC Berkeley) - Senior Data and Research Analyst, County of San Diego
“ Jack was an integral part of the planning and designing of data pipeline overhaul at SetSail. Even with a moving target and many dependencies, Jack was able to adjust the design of our new pipeline, maintaining conversations across the product and engineering teams as the project progressed. They are also a fast learner and willing to dig into new technologies, which I really admired as their coworker. They would be a great addition to any team looking for a fast-learning and flexible data scientist. ”
Sarah Nam (Worked with Jack on the same team at SetSail) - Senior Associate at Cancer Navigator
“ Jack is a hard-working Data Scientist with a keen eye for details. Their passion for data analytics and software development really stands out when tasked with complex problems. At SetSail, Jack worked on a variety of projects that involved teasing out actionable insights from complex data sets, enhancing modeling capabilities through feature development and algorithm development, and building out a data ETL process that transformed the data infrastructure to help SetSail scale for enterprise customer needs. In addition to these technical skills, Jack's collaborative work with the engineering and product team continually earned praises from fellow coworkers. They were never shy and was always proactive to jump in and help solve a problem. I highly recommend Jack as a Data Scientist and Data Engineer for any organization. Their technical skill and work ethic will be immediately apparent upon joining any team. Feel free to reach out to me as I am happy to provide additional reference or information as desired. ”
Danny Pan (Managed Jack directly at SetSail) - Data Science
“ I had the privilege of working alongside Jack at SetSail, and I can confidently say that they are a top-notch data scientist. Jack's expertise in data science, combined with their passion for software engineering, make them a valuable asset to any team. They have a keen ability to plan and lead complex cross functional projects and their software engineering skills are second to none. Jack's enthusiasm for learning is contagious, and they are always eager to dive into new projects and technologies. They are a great communicator and are able to explain technical concepts in a way that is easy for both technical and non-technical colleagues to understand. On top of all that, Jack is one of the kindest and most genuine people I've had the pleasure of working with. They truly care about their team and go above and beyond to support them. I highly recommend Jack for any data science or software developer role, and I have no doubt that they will excel in their next endeavor. ”
Josh Mantovani, M.A. (Senior to Jack, worked together at SetSail) - Data Scientist / Engineer
“ I am happy to recommend Jack for a variety of roles and positions. Jack is a motivated self-starter who loves to accomplish project tasks while developing and implementing smooth processes in their work environments. Jack is an accomplished leader, utilizing problem-solving skills to support their own work and the work of their colleagues and peers, taking time to ensure that their team has the skills, knowledge, and resources they need to finish their tasks and projects effectively. Jack has a wide array of skills that they readily apply to their work, and they are ready to search for answers and learn new skills to address problems that arise in their projects. Then they are ready and willing to teach peers and colleagues how to utilize those new skills, supporting team-based processes and accomplishing team projects and goals in addition to their own individual work. Jack is a leader who uses imagination, experience, and empathy to create sustainable processes and consistently complete their goals. I am happy to recommend Jack and I am confident that Jack will be a positive asset to any work that they set out to complete. ”
G. Allen Ratliff (Managed Jack directly at UC Berkeley SPH) - Assistant Professor of Social Work
“ I have had the great fortune of having Jack as project lead on the SFYEAH Research Project. One word to describe Jack, I would say "Integrity", Jack holds themself to the highest standard. It shows in the work they produce, Jack is meticulous. Jack is skilled coder and data scientist, with a wealth of geospatial analysis knowledge. They are are first rate leader, and an exceptional communicator. Jack keeps everyone on the same page, and is incredibly thorough. It is an absolute pleasure to work with Jack. ”
Conan Minihan (Jack was Project Lead for SFYEAH Research Project) - Data Scientist, PhD Student
“ Jack and I worked on the same research team and they effortlessly evolved into a pillar of leadership and direction. It's been an absolute pleasure and relief to be able to work alongside them. Jack learned quickly and worked beyond the expected and required amount to ensure deadlines and quality were kept. It truly astounded me how enthusiastic and exceptionally intelligent Jack was as I watched them surpass most of the team in their domain of expertise and knowledge in a matter of weeks. Jack's passion for details, design, and accuracy has made them one of the strongest assets on our team. Their work ethic and energy have and are contagiously inspiring and addicting to be around. Jack is just one of those people that you want on your team in every scenario because they really own the title "jack of all trades" ”
Eva Smolentseva (Worked with Jack on same research team at UC Berkeley) - Analyzing Natural Language Models @USAA