Archive for the ‘language technologies’ Category

The 2009 North American Computational Linguistics Olympiad

Thursday, December 4th, 2008

The 2009 North American Computational Linguistics Olympiad has been announced. It will be on February 4, 2009 and March 11, 2009. Check http://www.naclo.cs.cmu.edu for details.

NACLO 2008 announced: the North American Computational Linguistics Olympiad

Wednesday, November 14th, 2007

Registration is open for the Second Annual North American
Computational Linguistics Olympiad

Please inform high school students in your area of the the second annual
North American Computational Linguistics Olympiad Open competition, which
will be held on February 5, 2007. Students may participate at one the host
sites listed below or in the internet category. The contest targets high
school students, but middle school students may also participate.

Students can register at: http://www.naclo.cs.cmu.edu.

Top scorers in the Open competition will be eligible to compete in the NACLO
Invitational competition in March, 2007. Top scorers in the Invitational
will be eligible to compete in the International Linguistics Olympiad in
Bulgaria in the summer of 2007. Two US teams competed in the International
Computational Linguistics Olympiad in St. Petersburg in 2007 with great
results, achieving the top score in the individual competition and tying for
first place in the team competition.

Brandeis University
Carnegie Mellon University/University of Pittsburgh
Columbia University
Cornell University
Middle Tennessee State University
San Jose State University
University of Michigan
University of Oregon
University of Pennsylvania
University of Toronto
University of Wisconsin/Edgewood college

If you are not listed here, and you would like to host the contest at
your university, contact Lori Levin, lsl at-symbol cs.cmu.edu.

In addition, any student may participate in the Internet category by
finding a local high school or university teacher to facilitate the
contest.

About Linguistics Olympiads:

The North American Computational Linguistics Olympiad (NACLO) is the
direct descendant of the Olympiad in Linguistics and Mathematics
founded in 1965 in Moscow, Russia. High school students compete by
solving linguistics and logic problems based on natural
languages. This program is credited with introducing thousands of
Russian students to the field of linguistics, many of whom have gone
on to become prominent professional linguists. NACLO includes
traditional Olympiad problems as well as some computational problems.
This is not a competition that deals with computer technology, but
with all aspects of natural language structure and function, including
computational thinking as it relates to natural language processing.

Thank you very much for your help in raising the profile of our
discipline among secondary school students. Please contact any of the
executive team members below if you have any questions or would like
to be involved in some way, including possibly hosting a competition
in your area and/or submitting a problem for future competitions.

Lori Levin – Co-chair
Thomas E. Payne – Co-chair
Dragomir R. Radev – Program chair and team coach

How many ways to say that the Red Sox won

Tuesday, October 30th, 2007

From Google News, 99 titles of news stories about the Red Sox
winning the world series for the second time. Here is a network drawn
in Pajek using the IDF-weighted cosine similarity between each pair of
titles. Two titles are connected if their similarity is above 0.7.

Network

My favorite corpora

Sunday, May 6th, 2007

Here are my favorite corpora:

Enron email
CIA world factbook
DBLP: papers in CS
US congressional speeches
AOL queries
Netflix recommendations
IMDB
PUBMED: biomedical paper abstracts
Wikipedia
ACL Anthology
DOTGOV: download of .GOV
biocreative: biomedical papers
WT100G: 100GB download of the web
Google n-grams
webfreq
SMS corpus
Citeseer
DMOZ
corpus of paraphrases
multilingual parallel parliamentary proceedings
textual entailment corpus
question answering corpus
summarization corpus
various text classification corpora (Reuters-21578, 20NG)
Peekaboom

Text compression as proxy for AI

Saturday, November 11th, 2006

A very interesting challenge:

http://cs.fit.edu/~mmahoney/compression/rationale.html

The goal is to compress Wikipedia losslessly. Intuitively, some
semantics aware compressor would do really well here. The problem is
that no one seems to know how to build one. The best entries so far are
all string-based (e.g., http://www.compression.ru/ds/).

The ACL wiki

Sunday, October 29th, 2006

The ACL wiki is now reality. A large portion of the existing ACL Universe will be folded into the Wiki and the “Universe” will likely disappear :)

The netflix challenge

Friday, October 6th, 2006

According to CNET, Netflix is offering $1M if you manage to improve their movie recommendation system.

I hope that many other organizations announce such contests.

More links:

http://hunch.net/?p=231
http://rss.slashdot.org/~r/Slashdot/slashdot/~3/31168783/article.pl
http://feeds.feedburner.com/~r/oreilly/radar/rss10/~3/31208774/netflixs_personalization_conte_1.html

Information Extraction for DHS

Wednesday, September 27th, 2006

Slashdot has a story about a new project for information extraction for homeland security:

http://it.slashdot.org/article.pl?sid=06/09/25/0111231&from=rss

It further links to these two sites:

http://www.eurekalert.org/pub_releases/2006-09/cuns-sfa092206.php

http://blogs.zdnet.com/emergingtech/?p=364

Microsoft wants to patent verb conjugation

Sunday, September 24th, 2006

From the corpora list and from Slashdot:

http://rss.slashdot.org/~r/Slashdot/slashdot/~3/19728499/article.pl

I know of papers on automatic verb conjugation that are 15+ years
old. I am not sure what Microsoft is trying to accomplish here.

List of topics related to work in Clair

Monday, September 18th, 2006

I have tried to prepare a list of topics that members of CLAIR will find useful. Any comments are welcome.