Feng-Guang Su

Master’s Student

Carnegie Mellon University

Biography

I’m currently a second-year Master’s student in the Language Technologies Institute at CMU, working with Prof.Carolyn Penstein Rose. Lately, I finished my Siri TTS R&D summer internship just now.

During my undergraduate studies at National Taiwan University, my research spanned areas in computational linguistics, signal processing, and computer vision, under the supervision of Prof. Lin-shan Lee, Prof. Hung-yi Lee, and Prof. Yu-chiang Frank Wang.

I was also a Natural Language Processing Intern in DeepHow, the first AI company bridging the skills gap in manufacturing, service and repair through an AI-powered knowledge capturing and training platform based on smart how-to videos. I was responsible for the AI algorithm development, which has been deployed to more than 40 Siemens service centers worldwide.

Interests

Artificial Intelligence
Computational Linguistics
Signal Processing
Computer Vision

Education

MS in Intelligent Information Systems, 2021

Carnegie Mellon University
BS in Electrical Engineering, 2020

National Taiwan University

Research Experience

Graduate Researcher

TELEDIA, Prof. Carolyn Penstein Rose

Sep 2020 – May 2021 Pittsburgh, PA

Learn relational representations of identity labels that provide insight to which dimensions of similarity and difference are relevant with respect to content propagation.
Developed an architecture for reblog prediction and performed comprehensive analysis of blog descriptions, communities, and following relationships using real-world data in Tumblr.

Undergraduate Researcher

Vision and Learning Lab, Prof. Yu-chiang Frank Wang

Feb 2018 – Jul 2020 Taipei, Taiwan

Disentanglement for 3D Point Cloud | Demo | Report

Researched on 3D representations and applied generative model to disentangle 3D point cloud.
Proposed an autoencoder-based model to disentangle the human poses by continuous labels.

Conventional Computer Vision | Demo

Researched and implemented various applications including segmentation, fisher face, depth map generation, etc.

Undergraduate Researcher

Speech Processing Lab, Prof. Lin-shan Lee and Prof. Hung-yi Lee

Sep 2017 – Jan 2020 Taipei, Taiwan

Speech Disentanglement and Voice Conversion

Researched on unsupervised voice conversion by extracting the personality and prosody information.

Personalized Dialogue Generation | Demo | Paper

Proposed a GAN-based model to produce responses for multiple persona using a single model by unsupervised learning and puts less constraint on required training data.
The proposed model obtains 18.3% increase in persona accuracy compared with the SOTA model, and the paper was accepted in INTERSPEECH2019.
Introduced the Bert model, did various experiments on the architecture of the discriminator as well as the initialization of the word embedding vector, and made comprehensive analysis on the detailed performance of each character.

Large-vocabulary Speech Recognition System

Implemented a large-vocabulary speech recognition system from scratch by Kaldi.
Developed a learning-based model on ASR and made comparison with the rule-based method.

Work Experience

Siri TTS R&D Intern

Apple Inc.

May 2021 – Aug 2021 Seattle, WA (Telecommuting)

Understood the needs of the modeling teams and created robust scripts and systems that meet the needs.
Developed a robust system that detects anomaly in data and reduces considerably the required evaluation time.

Natural Language Processing Intern

DeepHow Inc.

Apr 2019 – Aug 2020 Detroit, MI (Telecommuting)

Unsupervised Temporal Embedding for video segmentation

Researched on self-supervised multi-modal networks and helped develop video recommendation systems.
Implemented an unsupervised architecture, detected and segmented actions in untrimmed videos, and deployed on the DeepHow platform - AI Stephanie, improving the accuracy by 30%.

Step-embedding for video recommendation

Developed a brand-new sentence embedding method by encoding ASR sentences from video clips.
Verified on real-world videos, the generated embedding contains features from the texts and recommend other video clips.

Software Engineering Intern

HTC Taiwan (DeepQ)

Jul 2018 – Mar 2019 Taipei, Taiwan

Software Engineering Intern

Implemented various architecture search models and model compression models.
Applied the differentiable architecture search models, which use three orders of magnitude fewer computation resources, on the DeepQ product - AI platform.

Generative Model for Image Morphing

Developed a brand-new generative model for image morphing on human expressions.
Verified on real-world data, the proposed model can successfully generate vivid morphing images.

Teaching

Machine Learning (SPRING 2019)

Prof. Hung-yi Lee Feb 2019 – Jun 2019

I designed one homework on Linear Regression for the whole class ( ppt, Website ). I also led 10+ groups for the final project - Image Dehazing ( ppt ).

Machine Learning and having it deep and structured (SPRING 2019)

Prof. Hung-yi Lee Feb 2019 – Jun 2019

I led and advised the whole class more than 30 students, conducting and analyzing chatbot. I instructed the students on how to program a chatbot as well as gave them a short talk about recent papers. | Talk & ptt

Signal and System Processing (SPRING 2019)

Prof. Yu-chiang Frank Wang Feb 2019 – Jun 2019

Aside from setting up homework and answering course questions, I was also responsible for designing some problems for the midterm exam.

Selected Projects

Functionally Reduced And-Inverter Graph (FRAIG) [C++]

This is the implementation of circuit simplication simplification. By means of unused gate sweeping, trivial optimization, simplification by structural hash, and previous simulation, I try to preliminarily simplify the circuits in an efficient manner. After that, I also apply Equivalence gate merging to the circuits using Boolean Satisfiability (SAT) solver. I collected functionally equivalent candidates (FEC) by circuit simulation. Each simulation can split different FEC into groups. However, the number of simulation times was crucial for the performance. Therefore, I dynamically adjusted the stopping criteria of the simulation according to the splitting times of FEC.

Code

PokeCan [rpi][C++][Python][Node.js]

Pokecan is a trash can that can automatically detect the level of trash inside itself, and if it is full, it will walk along the path that is set by user and will dump the trash into a larger trash can. After it dumps all trash out, the Pokecan will walk back to its original location.

PDF Code

TTS Without T [Pytorch]

We compare two schema, the Multilabel-Binary Vectors (MBV) au-toencoder and the Vector Quantized Variational Autoencoder (VQVAE), in which discrete representations of subword units could be discovered from speech without any text label, phoneme label and alignment. By combining the methods, we aim to utilize their strengths and achieve a better performance in the ZeroSpeech2019 Challenge, in terms of either bitrate or quality.

PDF

Atari games using RL [pytorch]

Implement an agent to play Atari games using Deep Reinforcement Learning In this project, I implemented Policy Gradient, Deep Q-Learning (DQN), Double DQN, Dueling DQN, and A2C for the atari games, such as LunarLander, Assault, and Mario.

Code

Cartoon Face Generation Using Conditional GAN [pytorch]

In theis project, ACGAN and VAE were implemented for the cartoon face generation. Morevoer, I also did serveral experiments on the model architectures to verify the capabilities of the models.

Code

Depth Map Generation [pytorch]

To generate the disparity map given the left and right images, we utilize learning-based method and our understanding of stereo geometry. By training an end-to-end model, it can generate the disparity map only with two images.

PDF Code Slides

Housing Agency System [Python]

As STO (Security Token Offering) is gaining more and more attention, we consider that the concept can be applied to the real- estate transactions. Meanwhile, by introducing the consortium blockchain, the mechanism brings lots of advantages to the market.

PDF Code Slides Video

Door Friend [Rpi][Arduino][Python]

Imagine that you are busy cooking a great dinner for your party, and your friend is arriving at your house. Your friend pushes the bell, but you can’t open the door with the dirty hands! Now the Door Friend comes to save your day. It can recognize you and your friends’ faces and voices and open the door.

Code Slides Video

Fashion Ceiba [html][node.js][graphql][react.js][mongo]

It is an internet teaching platform that can assist the teaching system, such as taking real-time notes, asking questions, and updating handout. Deployed on https://fashion-ceiba.herokuapp.com/login .

Code Slides Video

Publications

Personalized Dialogue Response Generation Learned from Monologues

Feng-Guang Su, Aliyah R. Hsu, Yi-Lin Tuan, Hung-Yi Lee

Accepted in Interspeech. Graz, Austria. September 2019.

PDF Code Dataset Poster Video

Feng-Guang Su

Master’s Student

Carnegie Mellon University

Biography

Interests

Education

Research Experience

Graduate Researcher

TELEDIA, Prof. Carolyn Penstein Rose

Undergraduate Researcher

Vision and Learning Lab, Prof. Yu-chiang Frank Wang

Undergraduate Researcher

Speech Processing Lab, Prof. Lin-shan Lee and Prof. Hung-yi Lee

Work Experience

Siri TTS R&D Intern

Apple Inc.

Natural Language Processing Intern

DeepHow Inc.

Software Engineering Intern

HTC Taiwan (DeepQ)

Teaching

Machine Learning (SPRING 2019)

Machine Learning and having it deep and structured (SPRING 2019)

Signal and System Processing (SPRING 2019)

Recent Posts

Selected Projects

Functionally Reduced And-Inverter Graph (FRAIG) [C++]

PokeCan [rpi][C++][Python][Node.js]

TTS Without T [Pytorch]

Atari games using RL [pytorch]

Cartoon Face Generation Using Conditional GAN [pytorch]

Depth Map Generation [pytorch]

Housing Agency System [Python]

Door Friend [Rpi][Arduino][Python]

Fashion Ceiba [html][node.js][graphql][react.js][mongo]

Publications

Personalized Dialogue Response Generation Learned from Monologues

Contact