Archive for September, 2006

Information Extraction for DHS

Wednesday, September 27th, 2006

Slashdot has a story about a new project for information extraction for homeland security:

http://it.slashdot.org/article.pl?sid=06/09/25/0111231&from=rss

It further links to these two sites:

http://www.eurekalert.org/pub_releases/2006-09/cuns-sfa092206.php

http://blogs.zdnet.com/emergingtech/?p=364

Microsoft wants to patent verb conjugation

Sunday, September 24th, 2006

From the corpora list and from Slashdot:

http://rss.slashdot.org/~r/Slashdot/slashdot/~3/19728499/article.pl

I know of papers on automatic verb conjugation that are 15+ years
old. I am not sure what Microsoft is trying to accomplish here.

List of topics related to work in Clair

Monday, September 18th, 2006

I have tried to prepare a list of topics that members of CLAIR will find useful. Any comments are welcome.

List of skills for NLP/IR PhD students

Saturday, September 2nd, 2006

I decided to compile a list of skills that can be used to gauge progress in one’s research career in NLP/IR.

Here is what I figured out:

http://tangra.si.umich.edu/clair/PHD-LIST.

Any comments?

Drago

New large web corpora available

Saturday, September 2nd, 2006

Finally some useful corpora from the big search companies.

http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

N-gram corpus from Google

http://www.aolsearchdatabase.com

query logs from AOL (controversial). See also http://www.ugcs.caltech.edu/~dangelo/aol-search-query-logs/

List of NLP evaluations

Saturday, September 2nd, 2006

I had to quickly compile a list of existing NLP evaluations. Each of
these includes a standardized task description, a corpus, and
evaluation software.

NP bracketing http://www.cnts.ua.ac.be/conll99/npb/

Chunking http://www.cnts.ua.ac.be/conll2000/chunking/

Clause ident. http://www.cnts.ua.ac.be/conll2001/clauses/

NER http://www.cnts.ua.ac.be/conll2002/ner/

semantic roles http://www.lsi.upc.edu/~srlconll/st04/st04.html

dep. parsing http://nextens.uvt.nl/~conll/

summarization http://duc.nist.gov

pp attachment

parsing

MT http://www.nist.gov/speech/tests/mt/

WSD http://nextens.uvt.nl/~conll/

IE in biology http://biocreative.sourceforge.net/

entailment http://www.pascal-network.org/Challenges/RTE2/

QA http://trec.nist.gov

There are many other tasks, e.g., the KDD cup.

The new CSE building at the University of Michigan

Saturday, September 2nd, 2006

The new CSE building at UM:

http://www.mlive.com/news/aanews/index.ssf?/base/news-18/115425421627800.xml&coll=2

Google to open lab in Ann Arbor

Saturday, September 2nd, 2006

Google has decided to open a lab in Ann Arbor. Two of the main directions of work will be targeted ads and library scanning.

http://www.nytimes.com/2006/07/11/technology/11google.html
http://www.mlive.com/newsflash/michigan/index.ssf?/base/news-35/115259956475940.xml&storylist=newsmichigan
http://www.mlive.com/newsflash/michigan/index.ssf?/base/business-9/1152621861309560.xml&storylist=newsmichigan
http://www.freep.com/apps/pbcs.dll/article?AID=/20060710/NEWS99/307100004/1122
http://www.mlive.com/news/aanews/index.ssf?/base/news-18/11527152409090.xml&coll=2
http://www.freep.com/apps/pbcs.dll/article?AID=2006607120342
http://www.mlive.com/news/aanews/index.ssf?/base/news-18/115444325633730.xml&coll=2

Graph-based methods for NLP (and IR)

Saturday, September 2nd, 2006

Rada Mihalcea and I recently organized a tutorial and a workshop on Graph-based methods for NLP (and IR) at HLT-NAACL 2006 in Brooklyn.