CS Night 2016

Following are some of the student projects presented at USF CS Night 2016. Projects were presented from the Masters and Senior Project courses, the Bioinformatics course, and from some faculty-sponsored projects . 

Using Animation to Alleviate Overdraw in Multi-class Scatterplot Matrices 

Presenter:  Helen Chen (hhchen@dons.usfca.edu) 

Faculty Advisors: Profs.Sophie Engle and Alark Joshi 

Abstract: 
Scatterplots are a widely-used technique for visualizing multivariate datasets. Even though scatterplots play an important role in data visualization, 
they have known issues with overdraw. Overdraw occurs when points or glyphs are drawn on top of each other and obscure the underlying data. 
Overdraw affects the ability of viewers to correctly understand the data distribution and discern relationships among subgroups of the data. 
There are a variety of techniques for alleviating overdraw, none of which involve animation. Our research aims to use animation to visualize 
multidimensional data for multi class scatterplot matrices and compare its efficacy in alleviating overdraw against that of other techniques. 

Student Record Verification App - A Decentralized Application 

Presenters: Mayank Thirani, Ryan Zhu, Jakob Tarnow 

Sponsor:  Jim Huang 

Faculty Advisors:  CS 690 Master's Project, Prof. Olga Karpenko 

Abstract: 
The Student Record Verification App will utilize the characteristics of Blockchain to solve the trust problem between recruiters from human resource 
departments around the world needing to verify the applicants' claim of authenticity of their education degree without relying on intermediary third party 
making that verification. Each transaction in the Blockchain is verified by consensus of a majority of the participants in the network. The Blockchain contains 
all verifiable student records and past transactions. Allowing a recruiter to validate that a student’s educational background matches that of it’s respective 
registrar via Blockchain, removes the need for a central entity and leads to faster attestation of student records. 

Ten-X Hackday Tool 

Presenters: Jeremiad Raymond, Teng Hu, Yi Xiao 

Sponsor: Jon Rahoi 

Faculty Advisor: CS490 Senior Project, Prof. Jeffrey Johnson 

Abstract: 
Ten-X Hackday Tool is an online web application designed as a platform to our sponsor company (Ten-X) for an annual programming competition 
called “Hackday”. This platform will be used to collect, store, and process data entered by participants and graders. The purpose of this project is to 
provide a better platform for Hackday organizers to control the flow of Hackday by handling the most repetitive tasks. 

Sudokil 

Presenter: Dominic Mortlock 
Sponsor: The client would be students interested in learning more about unix and scripting concepts. Also, the game aims to be a fun challenge that people can practice both their unix knowledge and their problem solving skills. 

Faculty Advisor:    Prof. David Wolber 

Abstract: 
Sudokil is a hacking/scripting themed puzzle game about using Unix-like commands on a terminal to control computers, robots, and various other devices. 
Progress through levels and get access to different puzzle elements while collecting more scripts, permissions and tools.

Customer Ticket Classification Engine - Applying Machine Learning Algorithms to SnapLogic Metadata 

Presenters: Min Chen, Shiyi Tan 

Sponsor: Prof. Gregory Benson 

Faculty Advisor:  CS690 Master's Project, Prof. Karpenko 

Abstract: 
SnapLogic customer service team needs to prioritize customer tickets and measure customer satisfaction, which was previously done manually and 
was very time consuming. To automate this process, we built two engines, one for prioritizing tickets and one for the sentiment analysis of customer 
comments. We first analyzed the ticket data and fit the two models, then used the models to predict the ticket priority and the sentiment of the comment 
(“neutral” vs “negative”). If the ticket is labeled as “high priority” or contains a negative comment, our system sends an alert to the customer support team. 
That allows the team to handle interactions with customers more wisely and saves their time.     
                          
Snap Recommendation Engine 

Presenter: Thanawut Ananpiriyakul 

Sponsor: Prof. Gregory Benson 

Faculty Advisor:  CS690 Master's Project, Prof. Karpenko 

Abstract: 
SnapLogic has been providing data integration services for years. A snap is a pre-built component that performs an operation on data. A pipeline is a 
graph (DAG) of snaps which executes a specific task. In order to successfully build a pipeline, the user needs to select the right snap and connect it correctly 
to the previous snap. For this project, we built the engine that recommends the most likely next snaps to users. We achieved 88% hit rate in the final prototype 
implemented in Python. It means that 88% of time "deciding on the type of snap + searching for it among 100 types of snaps + dragging and dropping it to 
canvas" will be reduced to "1 click.”  
                                         
My Smart Financial Advisor - A Mobile Application for Mutual Fund Investment Management 

Presenters: Richard Wang, Chen-Ning Chi, Kaynat Quayyum 

Sponsor:                    Stephen Y. Pak, The Core Group 

Faculty Advisor:         CS690 Master's Project, Prof. Olga Karpenko 

Abstract: 
Mutual fund investment currently makes up a vast proportion of the retirement assets for Americans. At the same time, as mobile devices attain increasing 
capabilities and popularity, more people switch from PC to mobile devices such as tablet computers and smartphones. We provide a platform to buy and sell 
mutual fund shares on both iOS and Android devices. This enables users to manage mutual fund investment anywhere and anytime. Our application is 
implemented in C# using Xamarin that allows us to build iOS and Android apps from a single shared codebase. Our app provides a good user experience 
with high level of security.  
                        
                         
An Exploration of Single Nucleotide Polymorphisms on Type 2 Diabetes Outcome 

Presenters: Michael Totagrande, Irina Popova 

Sponsor: Sean Kimbro, North Carolina Central University and La Creis Kidd from University of Louisville 

Faculty Advisor:  CS640 Bioinformatics, Professor Patricia Francis-Lyon 

Abstract: 
Type 2 Diabetes (T2D) affects millions and is characterized by the inability to produce enough insulin, resulting in improper glucose regulation. With 
numerous direct risk factors, including increased body mass index (BMI), race, high blood pressure, and the presence or absence of certain single 
nucleotide polymorphisms (SNPs), T2D is a complicated disease. Herein, we explore over 600,000 SNP frequencies for more than 2000 individuals 
in order to determine their impact on T2D outcome.  
                       
Using Face Tracking for Computationally-Efficient Visualization of Large Vector Data 

Presenter: Thanawut Ananpiriyakul 

Faculty Advisor: Prof. Alark Joshi 

Abstract: 
Visualizing large vector data is computationally expensive. Given that human beings can only visualize a certain region of a screen at a time, we have developed 
a novel face tracking-based technique for visualization of large vector data. This focus+context visualization of vector fields reduces visual clutter and helps the 
user visualize features of interest. We chose to use streamline and glyph-based methods to represent the vector data. Users can interact with the data in real time, 
choosing regions of interest through a mouse, a touch interface, or their face. The presented visualization technique results in frame rate that is almost 5 times 
higher than the full detail visualization of vector data.  
                       

Exploring Leap Motion for Intuitive Interaction of Scientific Data 

Presenter: Shiyi Tan 

Faculty Advisor: Directed Study, Prof. Alark Joshi 

Abstract: 
We explore use of the Leap Motion with intuitive interaction of medical data, trying to help practitioners interact with large, high-resolution datasets. 
We use VTK for the visualization pipeline that includes data processing, surface extraction/volume rendering, and basic user interaction. We facilitate 
freeform interaction without the use of a mouse and keyboard using the Leap Motion. With the Leap Motion controller, users can explore the 3D data 
and perform basic interaction such as rotation, translation, and zooming in.  
                        

       
Computational Enzymology 

Presenters: Stephanie Martin, Meriam Vejiga, Adrian Ramirez 

Sponsor: Distributed Bio is a an antibody discovery, engineering, informatics and services company focused on producing next generation antibody libraries and revolutionary vaccines. 

Faculty Advisor: CS640 Bioinformatics, Professor Patricia Francis-Lyon 

Abstract: 
Enzymes are the original organic chemists (OCC), capable of catalyzing a wide variety of reactions that have great therapeutic potential. Many enzymes 
have been cataloged and annotated using the Gene Ontology, a gene annotator's reference, and categorized by the Enzyme Commission, a database that 
classifies enzymes based on the nature of their enzymatic activity. We took advantage of these two databases to mine for homology groups with similar 
enzymatic activity, but different substrates. We characterized these enzyme groups by sequence variability and enzymatic variability. This work provides a 
foundation for the creation of a new class of enzyme replacement therapy and for the creation of a new generalized synthesis technology.  
                        

       
Mechanistic Indicators of Childhood Asthma 

Presenter: Stephanie Styx 

Sponsor:  Dr. ClarLynda Williams-DeVane from North Carolina Central University sponsored the mechanistic indicators of childhood asthma project. 
Her objective for this project is to identify key environmental exposure contributors to asthma subtypes of varying severity. 

Faculty Advisor: CS640 Bioinformatics, Professor Patricia Francis-Lyon 

Abstract: 
Understanding the relationship between environmental factors and their affect on asthmatic children. Through principal component analysis, we looked at the 
correlation of how much variance there is when asthmatic children are exposed to similar or different environmental factors. The cohort of patients analyzed in 
this project were asthmatic African American children from Detroit, Michigan. 

 

Computational Enzymology 

Presenters: Stephanie Martin, Meriam Vejiga, Adrian Ramirez 

Sponsor: Dr. Jacob Glanville, former Principal Scientist at Pfizer, PhD in Computational and Systems Immunology at Stanford University School of Medicine 
and current Chief Science Officer of Distributed Bio. 

Faculty Advisor: CS640 Bioinformatics, Professor Patricia Francis-Lyon 

Abstract: 
Knowing the three dimensional structure of proteins is essential to understanding how the protein functions. Currently protein structures are determined 
through x-ray crystallography, which can be difficult and laborious for some projects. Dr. Jake Glanville, CSO of Distributed Bio, created a coding package to 
predict the structure of B cell and T cell receptors. This code draws on probabilistic alignment and hidden markov modeling and uses Hmmr3.0 and the 
NCBI BLAST toolkit to identify potential templates for homology modeling then generates a model using UCSF’s Modeller. We tested the potential of the script 
pdb-getModels.pl to accurately produce models by using an input, self-models=0, to remove any template with more than 95% identity to the query sequence, 
ensuring the program didn’t fetch the known crystal structure of the query for homology modeling. After generating hundreds of models, we used another script, 
rmsd-Calculate.py, to calculate the root mean squared deviation (RMSD) of the generated model superimposed on the published structure to validate whether 
this package has the potential to accurately predict the variable regions of antibodies.  
                        

                        
Ozone exposure causes differential expression of genes involved in cell growth and DNA binding 

Presenter: Chelsea Yee, Amrita Rishi 

Sponsor: Dr. Mehrdad Arjomandi 

Faculty Advisor: CS640 Bioinformatics, Professor Patricia Francis-Lyon 

Abstract: 
Ozone - a gas with high oxidation potential is a major component of air pollution and has been found to damage the respiratory tissues in humans. 
To our knowledge, no one has yet published the results of an exacerbation study utilizing ozone as a model for the impact of air pollutants. An ongoing 
study by Dr. Mehrdad Arjomandi and associates at UCSF aims to establish the impact of ozone-induced injury and inflammation in asthma and other 
lung diseases. Currently, this study aims to determine the differentially expressed genes (DEGs) in subjects, both with and without asthma, that were 
exposed to medium (100ppm) and ambient (200ppm) levels of ozone. Gene expression levels for 18 subjects were determined by Affymetrix microarray 
in an ozone-exacerbation study performed by Dr. Arjomandi’s team at UCSF. In partnership with Dr. Arjomandi and Prof. Francis-Lyon(USF), our team 
performed statistical analysis of the Affymetrix microarray data in R using the limma package to identify DEGs in airway epithelial tissues in response to 
ozone exposure. A total of 68 DEGs was determined from the Affymetrix microarray data for all 18 patients. Among the 68 DEGs, 4 were more frequently 
differentially expressed (adjusted p-value < 0.1): MAPRE3, HKR1, MOB3B and ZFR. Genes MAPRE3 and HKR1 were up-regulated whereas MOB3B and 
ZFR were down-regulated. Further studies providing new knowledge of the function and downstream effects of these genes can lead to the possibility of 
new gene therapy and pharmacological targets. 

                                               
Muse Mobile App 

Presenter: MD Naseem Ashraf 

Faculty Advisor: CS640 Bioinformatics, Professor Patricia Francis-Lyon 

Abstract: 
An Android app that leverages Muse headbands to record and transmit eegs from mobile devices easily and quickly.  
                        

           
AI for Princes of California 

Presenters:  Kyle Baker, Austin Bushree, Cole Howard 

Sponsors: Jon Rahoi and Justin Sher. Noo Games 

Faculty Advisor: CS490 Senior Project, Prof. Jeffrey Johnson 

Abstract: 
Princes of California is a strategic board game that is similar to a hybrid of Monopoly and Poker. A single turn consists of playing a tile on the board and 
buying up to three shares of any companies that have been built from the tiles on the board. The current built-in opponent makes random moves and is 
easily defeated by human players. Our project seeks to use multiple techniques to build a competitive AI opponent for this game. 
We will be implementing a heuristic algorithm based on the strategies we have developed while playing the game. We will also be fine-tuning a neural 
network using TensorFlow, an open source machine learning package. The network will be trained by playing against random bots. The gameplay tactics 
change based on the number of players, so our AI will be trained separately for 2, 3, 4, 5, and 6 player games. The ultimate goal of our project is to build an 
AI that strategically places tiles and buys shares of companies to create an entertaining opponent for online players. 

 

Fitness App for Vue Smart Glasses 

Presenters: Scott Zhu, Ji Lu, Shengcai Cheng 

Sponsor: Jason Gui, Vigo Technologies 

Faculty Advisor: CS690 Master's Project, Prof. Olga Karpenko 

Abstract: 
Vue is a wearable device, a pair of “smart” glasses designed for everyday use. Our team developed the companion app for Vue on iOS and Android. 
The app provides fitness features such as step tracking, calorie counting and inactivity alert that help people lead healthier and more active lives. It 
also provides some additional features such as finding the device using the app, and delivering notifications.  
                        

                                              
Visualization of Hierarchical Time-Series Data Using the Sunburst Technique 

Presenters:  Joey Estella, Marissa Masangcay, Lyndon Ong Yiu, Mohammad Bazarbay 

Sponsors: Profs. Sophie Engle and Alark Joshi 

Advisor: CS490 Senior Project, Prof. Jeffrey Johnson 

Abstract: 
The Visualizing Time Series Data project addresses the need to visualize new ways to aggregate large time series data. Often times, data becomes 
too large when in its raw form. Then the problem becomes how to aggregate that data, i.e., what kind of metrics (mean, median etc.) and levels (days, 
hours, etc.) need to be used to summarize and see patterns and trends from this data. 
Our visualization tool attempts to address this problem. Our tool features an interactive dashboard where users are able to view organized data in a 
sunburst visualization that displays the data in a meaningful way. Included in the interactive dashboard is an interactive sunburst visualization with a 
complementary line chart that corresponds to data in the sunburst. This tool gives added context to otherwise ‘normal’ looking data in order for the user 
to gain meaningful and significant conclusions about the data at hand. This tool also features a non-interactive dashboard where various static sunburst 
visualizations are displayed in a grid for easy comparisons. These static visualizations feature multiple metrics (mean, median, etc.) across different levels 
(days, hours, etc.). 

                        

                       
Real Estate Recommendation Engine 

Presenters: Rob Reeves, Simon Kwong, Zhe Xu 

Sponsor:  Jon Rahoi, Ten-X 

Faculty Advisor: CS690 Master's Project, Prof. Olga Karpenko 

Abstract: 
Buying or selling a property is not something most of us do every day. But when the time comes, searching for a new home or office is exhausting. It's stressful, 
agitating, and searchers often find themselves settling for less. We developed a recommendation engine for a Ten-X real estate marketplace, that will assist in 
alleviating some of that stress. The engine recommends available properties based on past user activity. It uses a graph-based recommendation algorithm that 
combines collaborative and content-based filtering.The goal is to maximize the likelihood a user will interact with the recommended properties embedded in the ad.