• Data Science

    Projects

    Supporting Proactive Diabetes Screenings to Improve Health Outcomes (~2M records): Applied data science tools and machine learning to identify patients at risk of developing diabetes. Tools: Python, Pandas, Numpy, Scikit-learn, SQL server → Web

    Motivation Factors and Participation Patterns in Crowdsourcing (~2k records): Applied exploratory data analysis (EDA) to study the profile of the participants of crowdsourced civic participation processes. Employed non-parametric statistical tests (e.g., Wilcox, Spearman, Chi-square, Kruskal-Wallis, Friedman) to examine change in the motivation factors that drive people to crowdsourced civic participation processes. Used Logistic Regression to predict the odds of participants to stay engaged in crowdsourcing processes. Tools: R, dplyr, ggplot2. → Github

    The Effectiveness of Social Sharing Practices (~35k records): Used Multivariate Linear Regression and parametric statistical tests (Pearson correlation, T-test) to study the effectiveness of the ubiquitous social media sharing buttons in increasing participation in online social communities. Tools: R, ggplot2, dplyr, reshape. → Github

    Collective and Individual Behavior in Online Communities (~300k records): Used K-means algorithm and non-parametric statistical tests (Chi-square, Kruskal-Wallis) to discover patterns in the collective behavior of online innovation communities. Tools: R, ggplot2, dplyr → Github

    Politic Bots (~200k records). Employed exploratory data analysis (EDA) techniques (summary statistics and visualizations) to understand how social media bots and fake accounts are used to promote and manipulate information during electoral periods. Tools: Python, Pandas, Numpy, Matplot, MongoDB → Github

    UTurn (~100 records). Used non-parametric Spearman correlation to examine the sense of presence, perspective-taking, and usability of a split sphere, first-person perspective 360-degree video. Tools: R → Github

    Improving Asuncion’s Open Street Map (~450k records). Employed data munging techniques to assess the quality of Asuncion’s Open Street Map for validity, accuracy, completeness, consistency, and uniformity. Tools: Python, MongoDB, ElementTree XML → Github

    Online Popularity of Movies (~10k records). Used exploratory data analysis (EDA) methods (summary statistics, visualizations, and correlation) to understand the factors that contribute to the online popularity of movies. Tools: Python, Pandas, Numpy, Matplot → Github

    Course

    Data Science for social impact. Catholic University of Asunción. October to December 2017. → Github