The BioNLP mailing list

The list is probably the most useful day-to-day resource. The message archive contains more than 2,000 messages starting in 2001. The list includes announcements, discussions, and pointers to resources such as software, text databases, conferences, and more.

Additional searches
PubMed itself supports limited phrase search, sometimes reporting, "Quoted phrase not found.", even when Google finds the phrase.
Google Scholar below does a good job of harvesting papers on the web, including references to them.

Additional resources

GATE - General Architecture for Text Engineering (The University of Sheffield)
GATE is a mature, powerful, and widely used system for working with text.
It is free and open source.  There is substantial documentation including numerous courses.

NLTK - The Natural Language Toolkit - A free Python-based set of tools

There is an excellent book that leads the reader through using the system,
along with explaining numerous aspects of natural language processing,
The National Centre for Text Mining (NaCTeM) (University of Manchester)

The NaCTeM is the first publicly-funded text mining centre in the world.
The website includes links to text mining services provided by NaCTeM; software tools, both those developed by the NaCTeM team and by other text mining groups; seminars, general events, conferences and workshops; tutorials and demonstrations; text mining publications.

An annotated list of NLP and corpus resources from Stanford

The list is extensive and reasonably up-to-date.  600 lines long
Stanford's own software is written in Java:

The Linguistic Data Consortium (LDC) (University of Pennsylvania)

The LDC supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.  A number of their largest resources are available for a fee or through membership.
Freely available collections of biomedical papers

BioMed Central data mining site
As of 16 May 2013 BioMed Central (with Chemistry Central and SpringerOpen) has published 160,020 articles of peer-reviewed research, all of which are covered by our open access license agreement which allows free distribution and re-use of the full-text article, including the highly structured XML version.  The entire XML set can be downloaded as a zip file.
The PubMed Central Open Access Subset (

This contains additional articles beyond the large BioMed Central collection.
They offer four tar.gz files containing XML (and only XML) for all the articles in the PMC open access subset.
Finding BioNLP-related conferences and proceedings

A useful strategy is to search the BioNLP mail archives for terms such as 'Conference', 'Workshop', or 'Proceedings'.
Adding a year to your search term(s) can help to narrow the search.

Some notable books
Two relational database systems - MySQL and PostgreSQL

Beyond these there is a slew of  NoSQL systems:

There are dozens of books about both of these systems.

"MySQL Community Edition is a freely downloadable version of the world's most popular open source database ...."

"PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness."
"PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness." It includes its own GUI management tool, pgAdmin3. The manual for PostgreSQL is extensive, >2,000 pages! There's a nice little PostgreSQL book: PostgreSQL: Up and Running.

