New Hedge Fund Run by Natural Language Processing Researchers

By Jochen L. Leidner

Remember the group of ex-IBMers around Peter Brown and Robert Mercer that left the realm of automatic speech recognition and statistical machine translation to join algorithmic trading hedge fund Renaissance Technologies AKA "RenTec".

Recently, Bloomberg reported they are creating a new hedge fund, the first in 5 years.

J. Simmons, the retired founder of the company, is ranked #3 on the 2012 rich list, quietly earning a reported $2.1 billion in a single year. The new fund's capital reportedly comes mostly from the company's own profits.

Reuters: Pulitzer Finalists

Congratulations to my brave and gifted colleagues Samia Nakhoul, Marie-Louise Gumuchian, Emma Farge, Alastair Macdonald, Oliver Holmes, Paul Taylor, Jessica Donati, Lamine Chichi, Christian Lowe, Dmitry Zhdannikov, Regan Doherty, Mohammad Abbas, Rania El Gamal, Maria Golovnina, Michael Georgy and Peter Graff from the REUTERS News Middle East team on the happy occasion of making it to finalists in the competition for the 2012 Pulitzer Prize.

The Link Not Clicked

By Jochen L. Leidner

The Link Not Clicked, or: Robert Frost in Cyberspace

Jochen Leidner (September 2010)

Two links emerged on a yellow blog,
And sorry I could not click on both
And be one surfer, long I stood
And scrolled down one as far as I could
To where the blog post ended.
Then took the other, as just as fair,
And having perhaps the better claim,
Rendered in darker lilac and wanted clicks;
Though as for that the traversal there
Had CTRs really about the same.
And both that night equally lay
Laid out, no style sheet set to black.
Oh, I kept the first for another day!
Yet knowing how click leads on to click
In cyberspace, I doubted if I'd be back in a bit.
I shall be telling this with a sigh
Somewhere between meetings and bug fixes hence:
Two hyperlinks once emerged on a blog, and I--
I took the one less often clicked on,
And that has made all the difference.

Call for Papers: Geospatial Information and Documents Workshop (GeoDoc'2012)

###################
##########      Call for Papers
#######
#####    Geospatial Information and Documents Workshop (GeoDoc'2012)
###
##       PAKDD Workshop
#    May 29 - June 1, 2012, Kuala Lumpur, Malaysia
#
#
#  Web Page: http://www.lirmm.fr/~mroche/GeoDoc2012
#  Contact: geodoc2012@lirmm.fr

Geographical or spatial information is now included in most of exchanged data. Sometimes, it is directly provided through metadata, but it is very often hidden and it becomes crucial to automatically discover it.

Natural Language Processing (NLP) and Data Mining communities have thus merged their efforts in order to extract geospatial information from textual documents, web pages, field data, and so forth. In this way, recent researches take into account the content of documents (e.g. terms) to identify geospatial data or to predict its geographic location.

Nevertheless, spatial information has some specificities that make discovering spatial information and/or spatial correlations from large amount of data still challenging. In this context, some proposals have been focused on the formalization of geospatial concepts and relationships, on the extraction of geospatial relations (e.g. rivers / body of water, town / suburb) in free texts to offer to the database community a unified framework for geodata discovery.

This workshop aims at discussing and assessing some of these strategies, involving NLP or Data Mining techniques, covering all or part of the issues mentioned above.

Topics of interest but not limited to:
- Geospatial information retrieval in documens
- Geospatial knowledge acquisition from documents
- Classification of geospatial documents
- Geospatial analysis of textual data
- Integration of geospatial documents
- Extraction of geospatial information from documents
- Geospatial theasurus/ontology building from documents
- Quality of extracted geospatial information
- Analysis and integration of geospatial data from web documents
- Visualization of geospatial information from documents

--------------------
SUBMISSION
--------------------

The papers limited to 12 pages must be submitted electronically in PDF format, using the EasyChair system (https://www.easychair.org/conferences/?conf=geodoc2012).
Authors have to use LNCS Springer's manuscript submission guidelines (http://www.springer.de/comp/lncs/authors.html).
All papers will be double-blind reviewed by the Program Committee on the basis of technical quality, originality, significance, and clarity.
Outstanding papers will be selected for publication in LNCS Post-Proceedings published by Springer.

Important dates:
- Paper submission deadline (extended) : 22 January 2012
- Author notification: 10 February 2012
- camera-ready due: 24 February 2012

--------------------
COMMITTEE
--------------------

Program Chair :
- Maguelonne Teisseire, UMR TETIS, Cemagref, France (http://www.lirmm.fr/~teisseir/)
- Mathieu Roche, LIRMM, CNRS, University Montpellier 2, France  (http://www.lirmm.fr/~mroche/)

Program Committee:
- Masanori Akiyoshi, Osaka University, Japan
- Torben Bach Pedersen, Aalborg University, Denmark
- Jason Baldridge, University of Texas, USA
- Mete Celik, Erciyes University, Turkey
- Robert Haining, University of Cambridge, UK
- Tahar Kechadi, UCD School of Computer Science and Informatics, Ireland
- Stan Matwin, University of Ottawa, Canada
- Donato Malerba, University of Bari, Italy
- Pascal Poncelet, University of Montpellier 2, France

-- 
Mathieu Roche
LIRMM - UMR 5506                    web: www.lirmm.fr/~mroche
Université Montpellier 2               
34095 Montpellier Cedex 5 - France

Four Questions for Business Excellence

By Jochen L. Leidner

Sam Palmisano, IBM's CEO, used four questions to return the company to its roots (according to a recent New York Times article): excellence and leadership: 

 

• “Why would someone spend their money with you — so what is unique about you?”

• “Why would somebody work for you?”

• “Why would society allow you to operate in their defined geography — their country?”

• “And why would somebody invest their money with you?”

If you consider working for a company, investing your savings in a company, considering a consulting contractor, I recommend you ask yourself (and your conversation partners at the corporate entity at hand) all four questions.

Only unique (USP), employee valuing, ethical, profitable and sustainable businesses are worth having on this planet.

Teaching Computer Science to Eleven-Year-Olds

By Jochen L. Leidner

I am very blessed because I have always been permitted to pursue the career of my choice, and I am very grateful for that. Time to give something back! So in February 2012, I'm going to be teaching computer science to Scottish 11-year-olds for a day. I'm excited about this opportunity presented to me by the Royal Society of Edinburgh, because I love teaching, and I haven't done any in a while. Also, I've never taught to such a young age group, which coincides with the age that I personally started to develop software.

I will have 110 minutes available per class, and I will teach my little curriculum to three batches of pupils on that day. I'm thinking about splitting up the time into three parts, a presentation (50 minutes), group work exercises (50 minutes), and a questions and answers session (10 minutes).

My objective and challenge will be to ignite a passion in some of the students in order to make them pick a computing-related career.

Great Ideas of Computer Science

I have decided to pick my material from among the following Great Ideas of computer science, which are at the very core of the discipline:

  • information
  • algorithms
  • abstraction
  • automata
  • grammars
  • recursion
  • divide and conquer
  • computers

Now if you think these are boring, theoretical concepts, then you've probably had teachers who did not get the cosmic beauty of these ideas across (no pun intended). Also note that the computer ends the list, not because it isn't a fascinating device, but because it is usually over-emphasized compared to the other ideas, certainly by being in the very name of computer science the discipline. Perhaps "Abstractics" (hat tip to Günther Görz), "Informatics" (certainly spreading as a term), "Automatics" might be more appropriate, but even less suited to attracting tomorrow's students.

Choosing a Language

One interesting question is which programming language to choose for code snippets and exercises. No introduction to the fascinating field of computing could be complete without showing some code, explaining what it does, and encouraging modifying it in order to see what might happen.

The question which language is suitable for this purpose is far from obvious. The language should be easy and fast to understand (self-explanatory is ideal), and compact to write down. It should be portable and there should be free implementations widely available (in source code form and also on the major platforms, GNU/Linux, Apple Macintosh OS X, and Microsoft Windows). It should be interpreted, or at least there should be an interactive Read-Eval-Print-Loop (REPL) available to support exploratory learning. Graphics would be a plus for creating fun instead of pupil fatigue.

A few candidates come to mind: Python, Java, LOGO, LISP, Pascal, and BASIC. Python is nowadays often used at universities in undergraduate courses, and it is widely available (sometimes pre-installed. Java is the mainstream server-side application language used in industry, and one of the main languages used in research as well, however it is not very easy to set up (beware the stony CLASSPATH, kids!) and pretty verbose (even short methods don't fit on slides well). Logo has the turtle for interactive graphics and is particularly easy to learn. However, it is now very powerful. LISP is the oldest, yet most beautiful and orthogonal language. Wirth's Pascal supports structured programming, static typing and complex data structures much better than e.g. Python, and - while designed with education in mind - has been used widely in industry in the 1980s (remember Borland's Turbo Pascal and its offspring Delphi?). Basic has proven its worth in education over several decades, even if it is not particularly aesthetically pleasing. There are tons of Scheme/LISP implementations and there are some free Pascal ones, but they are not necessarily easy (enough) to install (e.g. Petit Chez Scheme didn't work immediately after a successful installation on Mac OS X, and pcom is available in source code, but you have to compile it with itself [sic] first).

Python would be a wonderful choice, if it wasn't so non-obvious how to set up an interactive graphics shell (I can do it, but an eleven-year-old might just get to the text prompt). Also, Python uses tabs to indicate block scope, which is pedagogically bad and leads to errors when students want to type in some code from a slide and key it in using spaces, which looks the same but won't work.

Chipmunk-basic-repl

Chipmunk-basic-graphics

 

Chipmunk Basic is a free implementation of BASIC that is available on all major platforms, and it has LOGO-like graphics primitives. Free documentation as well as published books are available, which makes it a hot contender - much to my own surprise!

 

Language Ease of Adoption Graphics Implementations Orthogonality Interactive IDEs
LISP - (prefix,
 parentheses)
- (complicated to use) + (DrScheme,
    Gambit-C,
    MIT Scheme)
++++
(Scheme)
+
Java - (verbose,
   OO)
- (not concise) ++ - (basic types v.
classes)
--- (no REPL)
Python ++ (tab) ++ +++ - +
BASIC +++ +++
(Chipmunk
 Basic)
+++
(Chipmunk
 Basic)
--- +++
(Chipmunk
 Basic)
LOGO +++ +++ + --- +++
Pascal ++ --- -- ++ ---

 

Son of Abdulfattah Jandali Dies

By Jochen L. Leidner

The son of political scientist Abdulfattah Jandali died today aged only 56.

When Jandali, who is from Syria, was a graduate student in Wisconsin, he had a baby and gave it up for adoption in California, under the condition that any potential foster parents would be able to provide for good education.

His son, eventually known as Steve Jobs, attended Reed college, but ended up dropping out. Nevertheless, he enjoyed his calligraphy classes there, which came in handy later in unexpected ways. Jobs turned out to be a passionate innovator, who showed that whenever traditional companies stop to innovate, there is always much more left that can still be improved, and he did so with a focus on good taste and ease of use.

While I never met Steve Jobs in person, I have benefitted from two presentations that he gave: the passionate launch event of the Apple Macintosh personal computer and a Stanford University graduation commencement speech.

 

My Windows 7 Installation

By Jochen L. Leidner

I'm using the following software on my Lenovo X220 Tablet Laptop under Windows 7 at work:

Chrome 14
Cygwin 1.7 (with GCC suite and other development tools)
NetBeans 7.0.1 with Java 1.6 Oracle bundle
Skype Windows/personal edition
Xemacs 21.7
Blackberry Desktop PC tools
Apache CouchDB for Windows 1.1.0
TortoiseHG 2.1.3 with Mercurial 1.9.2 x64 
msysgit
PuTTY 0.61
TortoiseGIT 1.6.5 64-bit
JabRef 2.7
Notepad++ 5.9.3
R 2.13.1
Strawberry Perl 5.12.3 64-bit
GATE 6.1.b3913
MikTeX 2.9 Complete 64-bit
WEKA 3.6.5
Apache HTTPD 2.2.21
PHP 5.3.8 threatsafe x84 (with MSVC9 C++ DLL)
Perforce P4V visual client 2010.2
TeXnicCenter 1.0 RC1
Paint .NET 3.5.8
MySQL 5.5.15 (MySQL Workbench 5.2.34)
DbVisualizer 8.0.4 x64
Dropbox 1.1.45
Ghostscript 9.0.4 Windows 64-bit/GSView 4.9_64
iTunes 10.4.1 64-bit (required by iPad)
Adobe Reader
Microsoft Project 2007
NuSphere PHP IDE
Xobni Outlook plug-in
Maven
Eclipse/UIMA

Any useful R&D tools missing?