Archive for the ‘language’ Category

The 2009 North American Computational Linguistics Olympiad

Thursday, December 4th, 2008

The 2009 North American Computational Linguistics Olympiad has been announced. It will be on February 4, 2009 and March 11, 2009. Check http://www.naclo.cs.cmu.edu for details.

NACLO 2008 announced: the North American Computational Linguistics Olympiad

Wednesday, November 14th, 2007

Registration is open for the Second Annual North American
Computational Linguistics Olympiad

Please inform high school students in your area of the the second annual
North American Computational Linguistics Olympiad Open competition, which
will be held on February 5, 2007. Students may participate at one the host
sites listed below or in the internet category. The contest targets high
school students, but middle school students may also participate.

Students can register at: http://www.naclo.cs.cmu.edu.

Top scorers in the Open competition will be eligible to compete in the NACLO
Invitational competition in March, 2007. Top scorers in the Invitational
will be eligible to compete in the International Linguistics Olympiad in
Bulgaria in the summer of 2007. Two US teams competed in the International
Computational Linguistics Olympiad in St. Petersburg in 2007 with great
results, achieving the top score in the individual competition and tying for
first place in the team competition.

Brandeis University
Carnegie Mellon University/University of Pittsburgh
Columbia University
Cornell University
Middle Tennessee State University
San Jose State University
University of Michigan
University of Oregon
University of Pennsylvania
University of Toronto
University of Wisconsin/Edgewood college

If you are not listed here, and you would like to host the contest at
your university, contact Lori Levin, lsl at-symbol cs.cmu.edu.

In addition, any student may participate in the Internet category by
finding a local high school or university teacher to facilitate the
contest.

About Linguistics Olympiads:

The North American Computational Linguistics Olympiad (NACLO) is the
direct descendant of the Olympiad in Linguistics and Mathematics
founded in 1965 in Moscow, Russia. High school students compete by
solving linguistics and logic problems based on natural
languages. This program is credited with introducing thousands of
Russian students to the field of linguistics, many of whom have gone
on to become prominent professional linguists. NACLO includes
traditional Olympiad problems as well as some computational problems.
This is not a competition that deals with computer technology, but
with all aspects of natural language structure and function, including
computational thinking as it relates to natural language processing.

Thank you very much for your help in raising the profile of our
discipline among secondary school students. Please contact any of the
executive team members below if you have any questions or would like
to be involved in some way, including possibly hosting a competition
in your area and/or submitting a problem for future competitions.

Lori Levin – Co-chair
Thomas E. Payne – Co-chair
Dragomir R. Radev – Program chair and team coach

The International Linguistics Olympiad

Friday, July 27th, 2007

The International Linguistics Olympiad starts on Tuesday. I am
leaving on Sunday. The two US teams consist of eight amazingly smart
students.

http://www.ilolympiad.spb.ru/part.html

Some other references:

NAMCLO 2007:
http://namclo.linguistlist.org/

ILO 2007:
http://www.ilolympiad.spb.ru/

A message from the spelling police or “Riding the subway with Verlaine”

Thursday, November 23rd, 2006

NYC subway cars occasionally feature poetry excerpts on the inside
walls. Some are great. I was very pleased to see the beginning of
Verlaine’s “Automn Song” (”Chanson d’Automne”). Unfortunately, the
spelling police discovered a typo: “saglots” instead of “sanglots”.
Here is the full text of this wonderful poem:

Chanson d’Automne

Les sanglots longs
Des violons
De l’automne
Blessent mon coeur
D’une langueur
Monotone.

Tout suffocant
Et blême, quand
Sonne l’heure,
Je me souviens
Des jours anciens
Et je pleure;

Et je m’en vais
Au vent mauvais
Qui m’emporte
Deçà, delà
Pareil à la
Feuille morte.

Text compression as proxy for AI

Saturday, November 11th, 2006

A very interesting challenge:

http://cs.fit.edu/~mmahoney/compression/rationale.html

The goal is to compress Wikipedia losslessly. Intuitively, some
semantics aware compressor would do really well here. The problem is
that no one seems to know how to build one. The best entries so far are
all string-based (e.g., http://www.compression.ru/ds/).

EU wants Bulgarians to change the way they speak

Saturday, November 11th, 2006

According to http://www.novinite.com/view_news.php?id=72473 and http://www.novinite.com/view_news.php?id=72419,
the EU wants the pronunciation of EURO in Bulgarian to be made
consistent with the latinized pronunciation (”euro”) instead of the
currently adopted “evro”. What’s next? Change Sofia’s spelling to
Sophia and Bulgaria’s pronunciation in Bulgarian to “bulgaria”?

Microsoft wants to patent verb conjugation

Sunday, September 24th, 2006

From the corpora list and from Slashdot:

http://rss.slashdot.org/~r/Slashdot/slashdot/~3/19728499/article.pl

I know of papers on automatic verb conjugation that are 15+ years
old. I am not sure what Microsoft is trying to accomplish here.

New large web corpora available

Saturday, September 2nd, 2006

Finally some useful corpora from the big search companies.

http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

N-gram corpus from Google

http://www.aolsearchdatabase.com

query logs from AOL (controversial). See also http://www.ugcs.caltech.edu/~dangelo/aol-search-query-logs/

List of NLP evaluations

Saturday, September 2nd, 2006

I had to quickly compile a list of existing NLP evaluations. Each of
these includes a standardized task description, a corpus, and
evaluation software.

NP bracketing http://www.cnts.ua.ac.be/conll99/npb/

Chunking http://www.cnts.ua.ac.be/conll2000/chunking/

Clause ident. http://www.cnts.ua.ac.be/conll2001/clauses/

NER http://www.cnts.ua.ac.be/conll2002/ner/

semantic roles http://www.lsi.upc.edu/~srlconll/st04/st04.html

dep. parsing http://nextens.uvt.nl/~conll/

summarization http://duc.nist.gov

pp attachment

parsing

MT http://www.nist.gov/speech/tests/mt/

WSD http://nextens.uvt.nl/~conll/

IE in biology http://biocreative.sourceforge.net/

entailment http://www.pascal-network.org/Challenges/RTE2/

QA http://trec.nist.gov

There are many other tasks, e.g., the KDD cup.