bims: | Biomed News |
NGI Zero Discovery application 2021
|
|
•
Abstract: Can you explain the whole project and its expected outcome(s).
“Bims: Biomed News” and “NEP: New Economics Papers” are
expertise sharing systems. They support discovery without any
search. And that’s why they are very hard to understand.
Experts on a subject need to know about latest documents published on
that subject on a regular basis, say once a week. Repeated searches
are a masochist’s pleasure. Bims and NEP allow experts to maintain
reports on a subject. They select the relevant documents every week.
A bespoke software called ernad processes the data about both the
selected and the non-selected documents. It then ranks next week’s
documents by likelihood of relevance. Our experts find that this is
more flexible, more precise, and more fun than searches. Yes, that is
wonderful for these experts. But the wider societal benefits only
arrive when the reports by these experts are disseminated. It’s not
much of a problem in NEP. It uses the wider array of RePEc
services. But is an issue for Bims. Individual experts have tweeted
report issues. That’s not enough. I want a non-proprietary system
based on email. This is what this proposal intends to fund.
• Have you
been involved with projects or organisations relevant to this
project before? And if so, can you tell us a bit about your
contributions?
In 1993, I published the first online economics working paper my
gopher server. In 1997, I lead the creation of the RePEc digital
library. RePEc enabled the working paper culture in economics to
transit to digital distribution. In 1998, I created NEP. In the
early 2000s, it became clear that the growth of new items in
RePEc would mean that NEP experts would be overloaded with
work. In 2004, I pioneered machine learning in the digital
library space with the ernad software written for NEP. In 2017,
I finally found a biomedical expert to direct me for a project
to run ernad on PubMed data. Thus Bims was born. Bims’ data is
openly available in bulk via anonymous rsync.
•
Explain what the requested budget will be used for? Does the project have other funding sources, both past and present?
The budget will be spent on me working on the project for about nine
months full-time. I know it’s ridiculously little but I rather take
little than nothing. I’m accustomed to living in poverty.
Since other stuff arises, I expect to finish in about 12
to 15 months. The result will be a software that be based on (1)
periodically appearing data stocks in XML (2) a non-periodically
appearing set of mapping between document handles to reports in
JSON, and (3) some subscriber data in a relational
database. Subscribers will have a web-based interface to subscribe
or unsubscribe to reports. Each email to a subscriber will be
customized. Subscribers that receive several reports will not see
the same document mentioned a second time round, if it was included
in a different report sent earlier. This is to encourage
subscriptions to many, potentially highly-specialized reports. The
system will be written in Python, JavaScript and XSLT. The
subscriber data will be held in a relational database. The system
will be able to handle any type incoming XML. The XSLT
customizations for NEP and Bims will be included as examples.
As for other funding: An initial version of ernad was
written in 2004 using leftover funds from the JISC-funded WoPEc
project. But none of that code is left in contemporary versions that
power NEP and Bims. NEP on occasions featured sponsored
advertising. Bims has had no funding ever. Getting some external
financial support, regardless of the amount, will boost Bims’
credibility. As it stands, it’s just quite literally incredible.
• Compare your
own project with existing or historical efforts.
The precedent for Bims is NEP. There is no precedent for NEP. When I
introduced machine learning to NEP in 2004 it was the first time that
machine learning was used on bibliographic data. Right now, the
closest there is to Bims is the LitSuggest system that the National
Library of Medicine in the US have brought out this year. But
LitSuggest features no dissemination, which is what this proposal is
about.
There are email list systems, such as Mailman and
Sympa. I have been using Mailman for NEP for over 15 years. NEP does
not use most of the features of Mailman, and many of these unused
features, such a owner addresses, are in fact a nuisance. I suspect
this is one of the reasons why smaller organizations rely on companies
like ConstantContact, MailChimp and suchlike. Basically, this proposal
aims to come up with a replacement of the software built and used by
these companies.
•
What are significant technical challenges you expect to
solve during the project, if any?
I suppose that for most people running your own email server is a
challenge. The problem with building a software to run on top of
email is to work around requirements to configure the mailer. Just
for me, it’s not rocket science. I’ve been running my own email
and Mailman mailing list systems for over twenty years. I will use
Django and see what I can use from the Mailman3 code. I will try
to incorporate the software that I write into Mailman3. But at
this stage, I can’t confirm that I will be able to do that with a
reasonable workload.
• Describe the
ecosystem of the project, and how you will engage with relevant
actors and promote the outcomes?
NEP and Bims are enabling blocks for reform of scholarly
communication. Scholarly communication suffers from the
stranglehold of publishers. They extract obscenely large rents
from intermediating between academics. Given that power structure,
swift change is impossible. NEP and Bims aim at
disintermediation. For Bims that effect will become rather
powerful as a preprint culture is developing in the biomedical
sciences. Up until now, the recruitment of experts for Bims has
been very slow. I believe that setting up the email system as
proposed here will be a turning point in the development of
Bims. Once we have more researchers on board we will be in a more
convincing position to attract other experts, such as journalists,
sufferers of chronic diseases and support organisations for rare
diseases. Research will get to readers much faster than before,
and the oligopoly currently enjoyed by highly visible outlets will
come under pressure.
Having said that, this proposal
is for an email system. I will keep the system configurable. It
will be able to run on various collections. How we will get others
to use the code its code is something that I’m not sure of at this
point.
I hope all this was not too boring a read!