John Geer

Raleigh-Durham-Chapel Hill Area, NC

Technical Skills

Programming Languages

R, Python, SQL, Javascript, Stan, Rust, Lua

Infrastructure

AWS (EC2, S3, Batch, Lambda), GCP

Data Storage

PostgreSQL, Bigquery, Redshift, Parquet

Statistics

Time Series, Causal Inference, Survival Analysis, Probabilistic Programming

Triplebyte Certified Data Scientist

Top 10% of test-takers
Issued Aug 2020; No Expiration Date

Libraries

dplyr, Pandas, ggplot2, D3, Keras, Tensorflow

Experience

Data Science Lead
Tuft & Needle
Remote
2018 to Present
  • Lifted revenue by over $5 million by optimizing regional marketing using experiments, causal inference, and media mix modeling
  • Helped speed up financial reporting, from monthly to daily, by writing a program to combine several cost and revenue datasets
  • Wrote over 100 articles and presented to the CEO and Management Team weekly, clearly explaining insights about products, promotions, and marketing
  • Made accurate company data easily available to the over 100 person organization by building data pipelines and reporting systems
  • Prioritized projects for the 4 person Data Science team and helped team members grow their skills
  • Maintained and created projects by writing over 1,000 commits to the company's git repositories

Techniques used: Causal Inference, Media Mix Modeling with Shape and Carryover Effects, Design of Experiments

Data Scientist
Tuft & Needle
Remote
2016 to 2018
  • Produced daily sales forecasts and helped explain revenue changes with Bayesian time series analysis
  • Helped improve the conversion rate, leading to an increase of more than $20 million in annual revenue, by writing web experiment analysis software using R, Stan, and Bandit Algorithms
  • Managed our machine learning process by configuring a containerized batch system on AWS which orchestrates over 36 hours of jobs a day

Techniques used: Time Series Analysis, Hierarchical / Multilevel Models, Bayesian State Space Models, Survival Analysis, Probabilistic Programming

Data Scientist, Research Squad
Automated Insights
Durham, NC
2015 to 2016
  • Built a system to automatically optimize written content for conversion rates
  • Wrote software to catch errors in large amounts of text
  • Built system to make generated text more variable using word embeddings

Techniques used: Natural Language Processing (NLP), Contextual Bayesian Bandit Algorithms, Word Vectors, Python

Data Scientist
Automated Insights
Durham, NC
2014 to 2015
  • Identified meaningful anomalies using time series analysis
  • Co-wrote a program that produced over 6,000 natural language medical clinic reports
  • Wrangled, analyzed, and communicated insights from data on patient care, multinational financial flow, and TV viewership

Techniques used include: ARIMA Models for Prediction and Anomaly Detection in Times Series, Random Forest Supervised Learning Models, Proportional-Odds Cumulative Logistic Regression, Visualizations with D3 and R's ggplot2

Web Developer
Flying Apricot
2005 to 2012
  • Programmed web-apps and websites with Python and JavaScript
  • Improved site content with splits tests

Education

Master of Applied Statistics
Pennsylvania State University
4.0 GPA
2012 to 2014
  • Focused on Predictive Analytics and Data Mining
  • Thesis: Built a predictive model of the number of views a TED talk will receive
Bachelor of Arts in Philosophy
Davidson College
2001 to 2005
  • Thesis: "Skepticism Regarding the External World"
  • Meaning: "Are we sure we know what's going on?"

Award

Google & Eyebeam's Data Visualization Challenge
2011
  • Received the "Deep Thought Badge" and Honorable Mention
  • Visualization of the connections in the US federal budget
  • Created in collaboration with Catherine Jahnes