Lilian Sun

I'm

About

I am a Software Engineer specializing in server-side development (database-based web applications), Machine Learning Development, and MLOps.

From 2016 to 2021, I collaborated in small to mid-level multifunctional R&D teams, actively participating and gaining hands-on experience in DataOps (data ingestion, data processing/data batch job/ETL, quality assurance, data modeling, data visualization, database interactions, data migration), DevOps, CICD, MLOps.
My development journey has equipped me with a deep understanding of the entire Software Development Lifecycle, Agile Methodologies, Microservices Development Methodologies.
🏅 From 2017 to 2019, I refactored 7 enterprise web applications, saving $1.2 million for an internal enterprise web platform hosting 12 applications at MathArt Software Co., Ltd (a startup with 200+ staff).
🏅 I served as a lead developer in two traditional machine-learning projects, with one achieving significant breakthroughs in compressor development and earning the third prize in the Shanghai Electric Intelligent Manufacturing AI Competition (2020 "上海电气杯" 工业APP), out of 2,000+ entries.

Interests

Software Development

Data Engineering

Computer Vision

Natural Language Processing

Machine Learning

Visualization

Algorithms

Image Processing

Education

MS. in Computer Science | College of Computing |
GPA: 3.7/4.0 | Graduation: Dec 2022

Relevant Coursework

cs422 Data Mining
csp554 Big Data Technologies
cs528 Data Privacy and Security
cs587 Software Project Management
csp571 Data Preparation and Analysis
cs525 Advanced Database Organization
cs535 Design and Analysis of Algorithms
>>> More

BS. in Computer Science and Technology |
GPA: 3.11/4.0 | Graduation: Jun 2016

Relevant Coursework

Operating Systems
Software Engineering
Discrete Mathematics
Linux Operating System
Object-oriented Technology
Software Modeling and Analysis
Software Design and Architecture
>>> More

Online Certification

Machine Learning Engineering for Production (MLOps) Specialization

Issuer: DeepLearning.AI, Stanford University | Issued: Dec 2023 | Certificate

The MLOps Specialization teaches how to build and maintain integrated systems for continuous production, emphasizing efficient handling of evolving data. Explore tools and methodologies to master machine learning engineering, preparing usto contribute to cutting-edge AI solutions for real-world challenges.

Skills:

Data Pipelines, Model Pipelines, Deployment Pipelines
Machine Learning Engineering for Production
Managing Machine Learning Production Systems

What I Learned:

Design an ML production system end-to-end: project scoping, data needs, modeling strategies, and deployment requirements.
Establish a model baseline, address concept drift, and prototype how to develop, deploy, and continuously improve a productionized ML application.
Build data pipelines by gathering, cleaning, and validating datasets. Establish data lifecycle by using data lineage and provenance metadata tools.
Apply best practices and progressive delivery techniques to maintain and monitor a continuously operating production system.

Machine Learning Specialization

Issuer: DeepLearning.AI, Stanford University | Issued: Oct 2023 | Certificate

The Machine Learning Specialization, a collaboration between DeepLearning.AI and Stanford Online, led by AI visionary Andrew Ng, provides foundational machine learning knowledge. This 3-course program covers key concepts, including supervised and unsupervised learning, offering practical skills for real-world applications.

Skills:

Linear Regression, Logistic Regression, Decision Trees
Artificial Neural Network
Recommender Systems

What I Learned:

Build ML models with NumPy & scikit-learn, create & train supervised models for prediction & binary classification tasks (linear, logistic regression)
Build and train a neural network with TensorFlow for multi-class classification
Apply best practices for machine learning development for models to generalize to real-world data and tasks
Build and use decision trees and tree ensemble methods, including random forests and boosted trees
Use unsupervised learning techniques for clustering and anomaly detection
Build recommender systems with a collaborative filtering approach and a content-based deep learning method
Build a deep reinforcement learning model

Practical Data Science on the AWS Cloud Specialization

Issuer: DeepLearning.AI, Stanford University | Issued: Nov 2023 | Certificate

The Practical Data Science Specialization equips Python and SQL proficient developers, scientists, and analysts with skills for deploying scalable ML projects using Amazon SageMaker in the AWS cloud. This program integrates domain knowledge, mathematics, statistics, data visualization, and programming to build, train, and deploy end-to-end ML pipelines. With hands-on labs, I tackled challenges using state-of-the-art algorithms like BERT and FastText for NLP and NLU. Learned to overcome obstacles in moving projects from idea to deployment swiftly, handling massive datasets, tuning models, and managing deployments for thousands of models or more. Mastered the essential skills for transforming ideas into scalable, operational solutions.

Skills:

Data Labeling at Scale, ML Pipelines and MLOps
Automated Machine Learning (AutoML)
A/B Testing and Model Deployment
Amazon SageMaker, Amazon Glue, Amazon Athena, Amazon EMR
Natural Language Processing with BERT, Text Transformation

What I Learned:

Prepare data, detect statistical data biases, perform feature engineering at scale to train models, and train, evaluate, and tune models with AutoML
Store and manage ML features using a feature store, and debug, profile, tune, and evaluate models while tracking data lineage and model artifacts
Build, deploy, monitor, and operationalize end-to-end machine learning pipelines
Build data labeling and human-in-the-loop pipelines to improve model performance with human intelligence

Big Data Specialization

Issuer: University of California San Diego | Issued: Sep 2023 | Certificate

Skills:

SQL, Big Data, Mongodb, Neo4j
Hadoop Ecosystem, Apache Spark
Distributed Computing Architecture

NoSQL, Big Data, and Spark Foundations Specialization

IBM | Issued: Dec 2023 | Certificate

This certificate signifies mastery in NoSQL and Big Data, encompassing CRUD operations in MongoDB, IBM Cloudant, and Apache Cassandra. Proficiency extends to foundational knowledge of Apache Hadoop, Hive, HDFS, MapReduce, Apache Spark, Spark SQL, Spark Structured Streaming, Spark ML, scaling Spark with Kubernetes, and Monitoring. The learner gained hands-on skills, completing a project involving ETL processing and Machine Learning model deployment using Spark. Now equipped for essential big data management tasks and challenges with NoSQL and Spark expertise.

Skills:

Big Data, Cloud Database
Mongodb, Cassandra, Cloudant
Machine Learning Pipelines
Data Engineer
Apache Spark, SparkML, SparkSQL, Apache Hadoop

What I Learned:

Work with NoSQL databases to insert, update, delete, query, index, aggregate, and shard/partition data.
Develop hands-on NoSQL experience working with MongoDB, Apache Cassandra, and IBM Cloudant.
Develop foundational knowledge of Big Data and gain hands-on lab experience using Apache Hadoop, MapReduce, Apache Spark, Spark SQL, and Kubernetes.
Perform Extract, Transform and Load (ETL) processing and Machine Learning model training and deployment with Apache Spark.

Google Data Analytics Certificate

Issuer: Google Career Certificates | Issued: Aug 2023 | Certificate

Skills:

Data Collection, Data Cleansing, Data Visualization
Data Integrity, Data Aggregation, Data Calculations, Data Ethics
Spreadsheet, Metadata, SQL
R Markdown, R Programming, Rstudio, Tableau Software
Creating case studies, Data Analysis, Presentation
Sample Size Determination, Decision-Making
Developing a portfolio, Problem Solving, Questioning

What I Learned:

Gained an immersive understanding of the practices and processes used by a junior or associate data analyst in their day-to-day job
Learned key analytical skills (data cleaning, analysis, & visualization) and tools (spreadsheets, SQL, R programming, Tableau)
Understand how to clean and organize data for analysis, and complete analysis and calculations using spreadsheets, SQL and R programming
Learned how to visualize and present data findings in dashboards, presentations and commonly used visualization platforms

Experience

Graduate Teaching Assistant of Data Mining

Illinois Institute of Technology | Feb 2023 - Jun 2023

Collaborated with the professor to assist in teaching Data Mining concepts to undergraduate & graduate students. Led weekly discussion sessions during office hour to reinforce course material, and conducted hands-on demonstrations of data mining techniques.
Provided personalized guidance to students, offering clarification on complex topics and assisting with project development, resulting in improved student performance and engagement.
Utilized tools such as Python, R teach and demonstrate data mining algorithms and techniques.

System Implementation Engineer

Shanghai Highly Electric Co., Ltd | Mar 2019 - Dec 2021

Specialization: Data Engineering, Machine Learning Engineering

Collaborated with Data Engineers to streamline data extraction, transformation, optimize ETL processes in Java, Spring, SQL, PostgreSQL, Informatica

Spearheaded the development of machine learning systems with algorithms (linear regression, decision trees, random forests, support vector machines, clustering), serving with Flask RESTful APIs, MLflow, EC2. Enhanced product optimization (boosted performance by 15%)

Implemented end-to-end machine learning pipelines (preprocessing, feature engineering, model training), MLOps in Python (numpy, pandas, scikit-learn, Matplotlib), Git, CICD, Docker, Kubernetes, evaluated model performance with classification & regression metrics.

Participated in code reviews, provided constructive feedback, followed best practices to ensure code quality, maintainability, and scalability.

Software Engineer

Mathart Inc. (big data startup) | Mar 2017 - Mar 2019

Specialization: Enterprise Application Development, Data Engineering

Led cross-functional collaborations with data scientists & analysts to define user & data requirements, designed and implemented adaptable web services, RESTful APIs, Microservices architecture with Java, Maven/Gradle, Spring Boot, Mybatis/Hibernate, SQL, Git, Agile/Scrum methodologies.
Conducted data modeling with ER diagrams. Architected scalable data pipelines/ETL using Python, Apache Airflow, Snowflake, Docker, EC2, S3 to seamlessly integrate 5+ diverse data sources, enhanced data accessibility by 40%.
Streamlined data processing & automated batch jobs with Apache Spark, Spring Batch Job handle structured data for reporting & data visualizations, resulting in quicker report generation (40% faster) and unveiled deeper data insights (improved real-time charts, forecasting accuracy), enhancing decision-making accuracy by 35%.
Maintained & optimized 12 mission-critical Java enterprise web applications deployed on a WebLogic cluster through continuous development. Streamlined 7 internal web apps (admin tools, reporting dashboards, admin portal) by refactoring code, implementing Tableau dashboards & interactive charts, boosting app responsiveness by 30%. Delivered 30% more customer-facing features. Reduced maintenance costs by 65%, saving $1.2M annually.
Enhanced Java R&D team's DevOps efficiency by 20% through automation and optimizing documented procedures and workflows.

►Implemented automated CI/CD pipelines to streamline code integration, testing, and deployment.

►Maintained clear and up-to-date documentation for the deployment process, infrastructure configurations.
Spearheaded knowledge-sharing, boosted team collaboration. Conducted tech talks on emerging tech & best practices, empowering team to stay ahead. Maintained a team wiki for documentation that reduced onboarding time by 40%.

Android System Development Intern

Pactera Technology International Ltd. | Feb 2016 - Jun 2016

Flashed the Android OS source code onto smartphones per functional requirements. Enhanced features in native Android applications ('Settings', 'Framework') using Android Studio and NDK, core Java, CICD, Linux Kernel Internals.
Cooperated with QA team to improve user interfaces, applied Android system framework expertise to address novel challenges.
In the top 10% of interns in bug fix releases in the Android Framework R&D team.

Projects

All
Web Apps
AI Projects
Personal Code Gallery

Intelligent Trailer Routing Optimization System

Automotive After-sales QA Management Platform

Big Data Machine Learning System for Product Review Sentiment Analysis

Languages & Scripts

Frameworks & Libraries

SQL & NoSQL

Big Data Technologies

Tools

My Address

Staten Island

New York City

New York State, 10314

Social Profiles

Email

lilian.swen@outlook.com

Contact

lilian.swen@outlook.com