This is the second grant solicitation to the Knight Prototype fund, of
2016‒11‒16. Thomas Krichel and Ross Mounce wrote it. Here we
show the questions and the answers. The sum was $35000. We did
not have to write a budget at that stage. The application was
declined.
-
1. Describe what you will make (150 words)
-
We want to build an open-access project to improve current awareness
of the biomedical literature. This includes the contents available
from the Nationl Library of Medicine's PubMed database. This database
grows by about 20000 items a week. The type of documents described is
quite broad. It includes material that is only of interest to
specialists. But there are also papers that are of interest to the
general public.
Biomed news will be a system that contains reports. Each report will
announce new additions to PubMed on a one-issue-per-week basis. Each
report will be specific to a subject and an audience. Each report will
be headed by a selector. Selectors will make decisions on what papers
are announced in the report. Report issues will be circulated to
subscribers via email, published on the biomed news web site, put into
RSS feeds and announced via Twitter.
-
2. What problem are you trying to_solve (200 words)
-
The project may appear unrealistic, because we are asking selectors to
look through thousands of documents a week. But it's not a lot of work
for them if we are using machine learning to help them. We know that
from experience. Since 1998, Thomas Krichel has been running the "NEP:
New Economics Papers" project at http://nep.repec.org. It's basically
the same system as we propose here. It uses data collected by the
RePEc digital library for economics. It takes about 10 minutes
for an editor to work through an issue. NEP produces close to 100
reports. It has over 75,000 subscriptions.
There are technical challenges. We need to take account of the fact
that PubMed is 25 times the size of RePEc. Nobody would possibly look
at all the 20000 papers we would get in a week. That means we need to
make a preselection. Before we can train for a report, we need
selectors to give us sample data. Caputuring that is a new system we
have to build. Having noted that we will definitely be able to launch
because we can use an adapted version of the selector interface of
NEP.
-
3. Who do you intend to impact with the project and how do you understand their needs? (200 words)
-
We have two types of users.
The first type of users are the selectors. We expect selectors to be
academics, patients of a certain disease, or journalists. All
selectors have to stay on top of the literature in their field. PubMed
is the primier source for many of them. The traditional way to access
PubMed are keyword searches. They are tedious to use. They do not
produce good results unless the topic has a very specific jargon. We
are convinced that the machine learning will bring more relevant
papers out than, say, PubMed alerts. The first benefit to selectors is
for them to get better information about the topic. But there is a
second benefit for selectors. Selectors will be publicly
acknowledged. Thus selectors get name recognition as thought leaders
in a group of people who have a similar interest.
The second type of users are the readers of report issues. These
users can comprise anybody with an interest in a biomedical
issue. Anybody can sign up at no cost. We make all the reports
available as open data for reuse, in simple XML files.
- 4. Please list team members and their qualifications (400 words)
-
There are two team members. We both feel important to
understand where we are coming from.
Thomas Krichel was born in Germany in 1965. He studied economics in
France. When worked as a research assistant in the UK, he started to
become interested in the infrastructure of publishing in economics.
In 1993, he published the first electronic research paper in economics
on a gopher server he had access to. In 1997, he founded a system
called RePEc for the publication of economics research papers. It's a
non-proprietary publishing system that over 1700 publishers of
economics papers contribute to. In 1999, he build the RePEc Author
Service. This was the first ever service to allow authors to register
their papers. In 2007 he released a co-authorship visualization site
called CollEc. Among his work in RePEc, the most salient to this
application is NEP: New Economics Papers, which he created in
1998.
Thomas Krichel has a history of creating free but self-sustaining
datasets and services. RePEc, NEP and other systems he founded are
non-proprietary and independent of external funding. The sustain
themselves by leveraging the contributions of self-interested
contributors. Thomas Krichel thus has a history of using initial grant
support to build freely available yet self-sustaining systems.
Dr Ross Mounce has a PhD in Evolutionary Biology and is a scientific
advisor with the ContentMine project (http://contentmine.org/),
mining and republishing facts from academic literature. His research
focus is on automating robust and reproducible identification and
re-use of facts from research articles to ameliorate impediments to
effective knowledge synthesis. In 2012, Ross was one of the first
awardees of a Panton Fellowship, for the promotion of open data in
science, by the Open Knowledge Foundation.
- 5. What progress, if any, have you made on this project (200 words)
-
Thomas Krichel founded the "NEP: New Economics Paper" system in 1998.
The first NEP issue had 24 papers. Manually selecting relevant papers was
adequate. In 2003, Thomas designed a purpose-built report composition
tool. The system is called ernad. It introduced machine learning into
the selection process. Thomas has been maintaining ernad ever since.
Starting in 2014, Thomas has refactored ernad to make it a more
general system. It can now run several services. Each service is
customizable for look and feel. In 2015 Thomas has concentrated on
improving the way the machine learning works. This is critical to
scaling it up to the level that PubMed would require. The Open Libary
Society is a PubMed vendor. We have access to daily updates of PubMed
data. We have indexed the data. We find that this year, there are an
average 3827 new papers a day. Thus we have measured the extent of
the scaling problem.
Some of our friends are ready to be selectors. But we can’t advertize the
system until we have built and ironed out most of the bugs.
This was the body of the proposal.
Applicants where allowed to attach some media to make their point.
So Ross made this demo video.