exists to help researchers in their work on natural language processing (NLP) for articles in the biomedical literature.

This is a brand-new version of the site, simple and packed with links to resources - far different from the previous one.

This site was started by Bob Futrelle in early 2001.  Bob retired from Northeastern University in 2011 and is hard at work developing his own NLP system.

The BioNLP mailing list

The list is probably the most useful day-to-day resource. The message archive contains more than 2,000 messages starting in 2001. The list includes announcements, discussions, and pointers to resources such as software, text databases, conferences, and more.

You can join the mailing list here:

The message archive is here:

The archives are indexed by Google, and searchable here:

Additional searches
(this ACL search link was broken. Fixed Dec 19, 2013.)

PubMed itself supports limited phrase search, sometimes reporting, "Quoted phrase not found.", even when Google finds the phrase.
See this email archive item for further comments on the PubMed search.

Google Scholar below does a good job of harvesting papers on the web, including references to them.

Scholar Home

Additional resources

GATE - General Architecture for Text Engineering (The University of Sheffield)
GATE is a mature, powerful, and widely used system for working with text.
It is free and open source.  There is substantial documentation including numerous courses.

NLTK - The Natural Language Toolkit - A free Python-based set of tools

The site has not been responsive for me (mid-May 2013).
But the following Google Site appears to have almost everything
and points to large collections of code, data, documentation, courses, and more.
There is an excellent book that leads the reader through using the system,
along with explaining numerous aspects of natural language processing,
The National Centre for Text Mining (NaCTeM) (University of Manchester)

The NaCTeM is the first publicly-funded text mining centre in the world.
The website includes links to text mining services provided by NaCTeM; software tools, both those developed by the NaCTeM team and by other text mining groups; seminars, general events, conferences and workshops; tutorials and demonstrations; text mining publications.

An annotated list of NLP and corpus resources from Stanford

The list is extensive and reasonably up-to-date.  600 lines long
Stanford's own software is written in Java:

The Linguistic Data Consortium (LDC) (University of Pennsylvania)

The LDC supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.  A number of their largest resources are available for a fee or through membership.
Freely available collections of biomedical papers

BioMed Central data mining site
As of 16 May 2013 BioMed Central (with Chemistry Central and SpringerOpen) has published 160,020 articles of peer-reviewed research, all of which are covered by our open access license agreement which allows free distribution and re-use of the full-text article, including the highly structured XML version.  The entire XML set can be downloaded as a zip file.
(I use the XMLs in my personal research after applying their XSLT preview stylesheet. - Bob Futrelle)

The PubMed Central Open Access Subset (

This contains additional articles beyond the large BioMed Central collection.
They offer four tar.gz files containing XML (and only XML) for all the articles in the PMC open access subset.
Finding BioNLP-related conferences and proceedings

A useful strategy is to search the BioNLP mail archives for terms such as 'Conference', 'Workshop', or 'Proceedings'.
Adding a year to your search term(s) can help to narrow the search.

Some notable books
Two relational database systems - MySQL and PostgreSQL

Beyond these there is a slew of  NoSQL systems:

There are dozens of books about both of these systems.

All data that's worth anything needs to be persisted.


"MySQL Community Edition is a freely downloadable version of the world's most popular open source database ...."

"PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness."
(I now use PostgreSQL thanks to the prompting of my son, Joe Futrelle.  Works for me.  It includes its own GUI management tool, pgAdmin3. I typically use only two fields per table,  a column-oriented approach. The manual for PostgreSQL is extensive, >2,000 pages !   There's a nice little PostgreSQL book that I find useful: PostgreSQL: Up and Running, )

Site updated May 16, 2013 by Bob Futrelle - Developed using SeaMonkey and BBEdit.
Direct email:  bob then dot then futrelle at