Department of Computer Science
USC Viterbi School of Engineering
I'm an Assistant Professor in the Department of Computer Science at USC affliated with USC Machine Learning Center and USC ISI. Priorly, I was a visiting researcher at Stanford University collaborating with Dan Jurafsky and Jure Leskovec, and received my PhD in CS@UIUC where I worked with Jiawei Han. I'm interested in computational methods and systems that extract machine-actionable knowledge from massive unstructured data (e.g., text data). I'm particularly excited about problems in the space of modeling sequence and graph data under weak supervision (learning with partial/noisy labels, semi-supervised learning) and indirect supervision (multi-task learning, transfer learning, reinforcement learning). My dissertation research recevied a Google PhD Fellowship, a Yahoo!-DAIS Research Excellence Award and a David J. Kuck Outstanding Thesis Award.
Research- I'm co-organizing the 1st Workshop on Knowledge Base Construction, Reasoning and Mining (KBCOM'18) co-located with WSDM'18 on Feb 9, 2018. We have invited speakers including Luna Dong, Oren Etzioni, Lise Getoor, Alon Halevy, Monica Lam, Chris Ré, Xifeng Yan and Luke Zettlemoyer. See you at LA!
Blog posts: Information Extraction with Indirection Supervision and Heterougeneous Supervision, Dynamic Network Embedding.
- Learning with weak supervision: In many information extraction tasks, direct supervision in the form of manually-annotated text sequences is expensive to obtain but different kinds of weak supervisions (e.g., KB facts, hand-craft rules, crowd-sourced labels, user feedbacks) are much easier to collect at a large scale. Our WWW 2017 tutorial summarize recent advances on denoising distant supervision, multi-tasking extraction, and leveraging QA data as indirection supervision.
- To self-learn from a few examples of given relations (and a large corpus), REPEL jointly optimize an embedding-based discriminator and a pattern-based generator.
- Both human annotators and external knowledge bases can provide weak supervision for information extraction tasks. Such heterogenous forms of weak supervisions trades off label quality with the amount of labeled data one can obtain. How could we leverage these heterogenous supervisions in a principled way?
- Indirection supervision may result in noisily- and partially-labeled data. This is especially challenging when dealing with a complex label space (e.g., a label hierarchy). We propose hierarchical partial-label embeddingn to overcome these issues.
NewsFeb 2018 - To talk about Scalable Construction and Reasoning of Knowledge Bases with William Wang and Nanyun Peng at NAACL 2018.
Feb 2018 - Serve as Area Chair for COLING 2018.
Jan 2018 - Co-charing the Data Challenge Contest for ICDM 2018. CFP will be up soon.
Jan 2018 - Doing relation extraction with very limited labeled data? See our WWW'18 paper on integrating representation learning and pattern boostrapping.
Dec 2017 - To give a full-day tutorial at WWW 2018 about "Construction and Querying of Large-scale Knowledge Bases".
Nov 2017 - Two papaers got accepted to AAAI 2018. Congrats Lucas and Lekui!
Oct 2017 - Our new paper on improving relation extraction with question-answer pairs has been accepted to WSDM'18. Congrats Ellen!
Tweets by xiangrenUSC