Topic model
Encyclopedia
In machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 and natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, a topic model is a type of statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

 for discovering the abstract "topics" that occur in a collection of documents. An early topic model was probabilistic latent semantic indexing (PLSI), created by Thomas Hofmann in 1999. Latent Dirichlet allocation
Latent Dirichlet allocation
In statistics, latent Dirichlet allocation is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar...

 (LDA), perhaps the most common topic model currently in use, is a generalization of PLSI developed by David Blei, Andrew Ng
Andrew Ng
Andrew Ng is an Associate Professor in the Department of Computer Science at Stanford University. His work is primarily in machine learning and robotics. He received his PhD from Carnegie Mellon University and finished his postdoctoral research in the University of California, Berkeley, where he...

, and Michael Jordan
Michael I. Jordan
Michael I. Jordan is a leading researcher in machine learning and artificial intelligence. Jordan was a prime mover behind popularising Bayesian networks in the machine learning community and is known for pointing out links between machine learning and statistics...

 in 2002, allowing documents to have a mixture of topics. Other topic models are generally extensions on LDA, such as Pachinko allocation
Pachinko allocation
In machine learning and natural language processing, the pachinko allocation model is a topic model, i.e. a generative statistical model for discovering the abstract "topics" that occur in a collection of documents...

, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Although topic models were first described and implemented in the context of natural language processing, they have applications in other fields such as bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

.

Case studies

Templeton's survey of work of topic modeling in the humanities grouped previous work into synchronic and diachronic approaches. The synchronic approaches identify topics at a certain time, for example, Jockers used topic modelling to classify 177 bloggers writing on the 2010 'Day of Digital Humanities' and identify the topics they wrote about for that day. Meeks modeled 50 texts in the Humanities Computing/Digital Humanities genre to identify self-definitions of scholars working on digital humanities and visualize networks of researchers and topics. Drouin examined Proust to identify topics and show them as a graphical network

Diachronic approaches include Block and Newman's determination the temporal dynamics of topics in the Pennsylvania Gazette
Pennsylvania Gazette
The Pennsylvania Gazette was one of the United States' most prominent newspapers from 1728, before the time period of the American Revolution, until 1815...

 during 1728–1800. Griffiths & Steyvers use topic modeling on abstract from the journal PNAS to identify topics that rose or fell in popularity from 1991 to 2001. Nelson has been analyzing change in topics over time in the Richmond Times-Dispatch
Richmond Times-Dispatch
The Richmond Times-Dispatch is the primary daily newspaper in Richmond the capital of Virginia, United States, and is commonly considered the "newspaper of record" for events occurring in much of the state...

 to understand social and political changes and continuities in Richmond during the American Revolutionary War
American Revolutionary War
The American Revolutionary War , the American War of Independence, or simply the Revolutionary War, began as a war between the Kingdom of Great Britain and thirteen British colonies in North America, and ended in a global war between several European great powers.The war was the result of the...

. Yang, Torget and Mihalcea applied topic modeling methods to newspapers from 1829-2008. Blevins has been topic modeling Martha Ballard's
Martha Ballard
Martha Moore Ballard was an American midwife, healer, and diarist.Martha Ballard is known today from her diary, which gives us a rare insight to the life of the average midwife and woman in 18th century Maine. Born on February 20, 1735, Ballard grew up in a moderately prosperous family in Oxford,...

 diary to identify thematic trends across the 27-year diary. Mimno used topic modelling with 24 journals on classical philology and archaeology spanning 150 years to look at how topics in the journals change over time and how the journals become more different or similar over time.

External links


Further reading

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK