Project

"Without deviation from the norm, progress is not possible."

Software Development

Personalized Job Recommendation System

View Code
Developed a job recommendation and search engine based on Amazon Web Service (AWS).
  • Front-end: designed an interactive web page (HTML, CSS, JavaScript) for users to search for positions and apply online.
  • Back-end: deployed a set of back-end services and databases, test and maintain it on AWS EC2.
  • REST API: used HTTP GET/POST for extracting and uploading data to the website, extracted job information through APIs that provided by Github Jobs.
  • Recommendation System: implemented a job recommendation system which promotes the jobs that have the same keywords with the jobs that the user saved. Keyword extraction is done by MonkeyLearn API.
  • Database: Used MySQL to store users' information, the jobs that they favorited, and information associated with those jobs.

Cloud and React based Social Network

View Code
Front-end: Used React JS framework to implement a social network, including creating view posts, searching, etc.

  • Implemented basic token-based registration/login/logout flow with React Router v4 and server-side user authentication with JWT.
  • Extracted pictures/videos from the backend database and displayed them on the home page.
  • Enabled users to upload pictures/videos through a drop-box feature.
  • Used Ant Design, GeoLocation API, and Google Map API to improve the user experience.
Back-end: Designed and implemented a scalable web service in Go to handle posts and deployed to Google Cloud (GKE) for better scaling.
  • Used ElasticSearch (GCE) to provide geo-location-based search functions.
  • Used Google Cloud Vision API to predict faces in images posted by users.

React JS Based NBA Player Data Visualization

View Code
Implemented a frond end webpage to display the shot statistics of NBA players.

  • Created a dashboard using React, D3, and Ant Design backed by API from stats.nba.com to visualize individual player’s shot data, including a shot chart and user profile view.
  • Created 4 extra filters and 2 shot themes (Hexbin and Scatter) to provide more customized visualization on the shot chart.
  • Developed an autocomplete player search bar providing a list of players (image and name) in the suggestion list.


Machine Learning and Data Science

H-1B Visa Status Classification and Classifier Analysing

View Poster, View Abstract
Classifier Performance Analysis through H-1B Visa Certification Status Classification.

  • Data Exploration: explore how individual features (employer, wage, worksite, etc.) related to visa certified rate using Sklearn.
  • Feature Engineering: remove highly correlated fields, standardize numerical data and one-hot-encode categorical data.
  • Classification: train data using Random Forest, Nearest Neighbors, SVM, MLP, Logistic Regression, and AdaBoost. Predict the results and analyze the performance of models.
  • Integration: explore voting classifier and stacking classifier, integrate seven classifiers listed above, achieve 91%+ accuracy, 90%+ precision and 99%+ recall.

Question-Answering System Word Embedding Optimization

View Code
Optimizing the Question-Answering System by replacing the exsisting word embedding to a self-trained word embedding.

  • Datasets: there are three datasets, SQuAD, NewsQA, and BioASQ. Initially, the model used GloVe embedding. I trained three embeddings associated with three datasets.
  • Word Extraction: extracted words from the datasets and organized words in sentences, which each sentence is a list of strings.
  • Vector Training: trained the word vectors using the Gensim Word2Vec framework, applied skip-gram algorithm with negative sampling. Stored word-vector mappings as .txt file.
  • Similarity Comparison: selected five common words from each dataset, extracted the words that are most similar to the selected words based on each embedding. Compared the similar words generated by each model.
  • Result: decreased negative loss by around 5 percent, decreased out-of-vocabulary rate from 10% to less than 0.1%.

Russell 1000 Future Income Growth Prediction

View Project Report
Estimated the net income growth rate of Russell 1000 companies for the next quarter.

  • Feature Engineering: selected 10 of 192 features using similarity matrices to find less correlated features.
  • Regression: grouped the data by industries and used regression models (Linear Regression, Neural Networks, SV Regression, Radius Nearest Neighbor Regression) to train the data and predict future net-income growth.
  • Prediction: selected the model with minimum mean square error and used the result predicted by this model as the final result.
  • Evaluation: The models decreased the MSE by 70% compared to naive prediction (output average growth rate as prediction).

Sentiment Analysis

Interpretation and classification of emotions (positive and negative) within text data using natural language processing techniques.

  • Perceptron: implemented a perceptron classifier with a bag-of-words unigram and bigram feature extraction.
  • Logistic Regression: implemented Logistic Regression with the same feature extractions and analyzed the log-likelihood.
  • Neural Networks: implemented Feedforward Neural Network with Glove word embedding and tried to maximize negative log-likelihood and minimize losses.
  • Used the three models for predicting sentiment example sentences, all the models achieved above 75% accuracy.

Decision Tree

  • Built a decision tree for data mining using Python that predicted the result of new coming data.
  • Calculated the entropy of each feature and obtained the best information gain, split the branches by the threshold with the best gain.
  • Used 5-fold cross-validation to train and test the decision tree. Completed the decision tree with 65% accuracy.

Association Analysis

  • Used the Apriori algorithm to implement the association analysis that finds the frequent itemsets with given minimum support.
  • Used the FP Growth algorithm to optimize the association analysis.
  • Generated the rules among the frequent items and calculated the confidence of those rules. Produced the rules that meet minimum confidence as the result.


Mobile Development

Crime Watch App (iOS)

Built a social application on IOS (using swift) for users to report crimes and view posts in their community.

  • Implemented Model-View-ViewController using UINavigationController and UITabBarController to support better capabilities.
  • Firebase Database: managed the database to store and access information of users and communities.
  • Community Feed: users are allowed to post a quick alert when they witness a criminal event, image/video uploading and Map kit are enabled.
  • Messages: enable users to message each other privately through Firebase.

Cloud and React based Social Network

View Code
Designed the Instagram Flavor News app based on Google Component Architectural MVVM Pattern.

  • Implemented the bottom bar & page navigation using JetPack navigation component.
  • Utilized Mindorks’s PlaceHolderView to support swipe gestures for liking/disliking the news.
  • Built the Room Database with LiveData & ViewModel to support local cache and offline model.
  • Integrated Retrofit and Rxjava to pull the latest news data from a RESTFUL endpoint (newsapi.org).