On Friday, February 3 the Computer Science Department will host its first colloquium of the Spring 2017 semester. Dr. Steven Skiena, a Distinguished Teaching Professor of Computer Science at Stony Brook University, will give a talk entitled “Applications of Word Embeddings“. An abstract of his talk can be found below.
Please join CS faculty and students in Forcina 408 from 12:30 – 1:30 PM for this talk.
Light refreshments will be provided.
Abstract:
Distributed word embeddings (word2vec) provides a powerful way to reduce large text corpora to concise features readily applicable to a variety of problems in NLP and data science. I will introduce word embedings, and review several of our recent efforts in my talk, including:
(1) Multilingual NLP — Our Polyglot project employs deep learning and other techniques to build a basic NLP pipeline (including entity recognition, POS tagging, and sentiment analysis) for over 100 different languages. We train our systems over each language’s Wikipedia edition, providing unified data resources in the absence of explicitly annotated data, but substantial challenges in interpretation and evaluation.
(2) Detecting Historical Shifts in Word Meaning — Words like “gay” and “mouse” have substantially shifted their meanings over time in response to societal and technological changes. We use word embeddings trained over texts drawn from different time periods to detect changes in word meanings. This is part of our efforts in historical trends analysis.
(3) Deep Learning for Feature Extraction from Graphs — We present DeepWalk, a novel approach for learning latent representations of vertices in a network. DeepWalk uses local information on truncated random walks to learn embeddings, by treating walks as the equivalent of sentences in a language. It is suitable for a broad class of applications such as network classification and anomaly detection.
This is joint work with Rami al-Rfou, Bryan Perozzi, Vivek Kulkarni, Yanqing Chen, and Charles Ward.
Bio:
Steven Skiena is Distinguished Teaching Professor of Computer Science at Stony Brook University. His research interests include the design of graph, string, and geometric algorithms, and their applications (particularly to biology). He is the author of five books, including “The Algorithm Design Manual” and “Who’s Bigger: Where Historical Figures Really Rank”. He was co-founder and Chief Scientist at General Sentiment, a media measurement company based on his Lydia text analysis system.
Skiena received his Ph.D. in Computer Science from the University of Illinois in 1988, and the author of over 150 technical papers. He is a former Fulbright scholar, and recipient of the ONR Young Investigator Award and the IEEE Computer Science and Engineer Teaching Award. More info at http://www.cs.stonybrook.edu/~skiena/.