An NLP Primer for Library Technologists

This talk will introduce a number of natural language processing techniques and their applications in computational linguistics and machine learning. Attention will be given to data preparation and modeling building, as well as the statistical and theoretical underpinnings of many current techniques. Examples will be derived from experiments using DPLA metadata as a document corpus. Techniques discussed will include clustering with Latent Dirichlet Allocations, feature vector generation using continuous bag of words (CBOW), and semantic encoding vectors built with neural networks like Doc2Vec. Additionally, classification, clustering, and recommendation techniques using the output of these models will be examined. Current research in these areas will be explored, including application to a variety of common text analysis problems. The presentation will conclude with a demonstration of attempts to use Twitter profiles and recent tweets to build a search vector for querying DPLA according to vector cosine similarity and nearest neighbor algorithms.

Speaker(s)

Corey Harper

March 8th

01:05 PM

20 minutes