bims: | Biomed News |
rfi pathways to ai enabled research
|
|
•Title of submission
expertise sharing systems
•Describe the AI-enabled tool or application ( 200 words maximum)
The tool is best understood by its historical context. In 1993, I
started working on projects to improve the dissemination of working
papers in economics. These project seeded what would become the RePEc
digital library. In 1998, I created “NEP: New Economics Papers”. It
provides readers with subject-specific reports about new working
papers. These reports are compiled by volunteer editors. As RePEc grew,
I had to cut the workload of editors. In 2003 I created a bespoke
software called ernad. It sorts the weekly additions to RePEc by
subject relevance. Ernad is the first purely AI-driven bibliographic
retrieval tool. In 2017, I found a biomedically trained person in
Gavin McStay. He leads PubMed-based ernad implementation called “Bims: Biomed News”.
Keeping up with
PubMed’s 30k new papers a week is notoriously hard. We have users who
have been “bimsing” for years. But uptake has been very slow. Making
the data open actually hinders uptake in two ways. First, we can’t
brag about users we do not have. Second, we have a core area that we
cover well. Potential users from outside that core area seem to be put
off, thinking that is is not for them.
•What is the
value proposition and/or potential outcomes of your AI-enabled
tool for facilitating the scientific process? (
200
words maximum)
For my users, I have a triple value proposition. First, users
get a tool for the keeping up with the most recent literature
that no search-based tool can beat. Second, without any extra
effort, I give them a place where they can demonstrate that
they are up-to-date. I make public record of their work is
available. Third, I manage a bespoke mailing list system,
where weekly report issues are sent to subscribers. Gaining
subscribers can be extra work but once you have them, you get
more name recognition.
Bims has an additional value proposition for non-academic
users. These are patient with long-run diseases, their carers
and support organizations. We see great potential for support
organizations for people with rare diseases. Bims can easily
find papers related to a disease even if the disease is not
mentioned by name.
For me personally, the systems are a loss proposition. Neither
Gavin no I get any financial rewards. They require the bulk of
my labor force to build and further develop. True to the
spirit of open science, we publish our hapless
attempts at getting funding at
https://biomed.news/requests_for_support.
•How will the
tool support open science and/or expand access to the scientific
process? ( 200 words maximum)
For the economics community, NEP is part of the RePEc services that keep its
working paper culture thriving. Thus economists have been able
to keep their non-commercial publication system. Over time,
RePEc has published over 1 million working papers. Computer
sciences had no equivalent of RePEc, thus working papers have
died out in computer science.
For the biomedical community, the central idea is that of expertise
sharing. You are an expert and you demonstrate that by staying
up-to-date. At the some time your selections diffuse your expertise.
In a rejected funding application at
https://openlib.org/home/krichel/proposals/tiumen.pdf I
developed the idea further, i.e., beyond a current awareness
service. The idea is to use machine learning to build machine
processable literature review objects. There can have a simple
set structure of accepted and rejected documents. These object
can not only be used on its own. They could also compile by
Boolean operators. This is would set another example of
sustainable open science practice based on literature reviews.
•
How does the tool mitigate harmful uses or risk associated with the technology?
( 200 words maximum)
First, ernad does not use neural networks. It does not hallucinate.
Second, yes, report editors may overlook a document that is relevant. But
that is an error of the editor, not of the technology. Even if there
are false negatives in the training data, machine learning has
safeguards against overfitting. Improved machine learning and
improved detection of what editors look at can reduce this risk
further.
[Warning: this paragraph is somewhat difficult to
understand.]
But there are is a more profound reason why
our systems are technologically risk-free. They are
more human intelligence tools than artificial
intelligence tools. Yes, AI is required to make them run.
However, any AI can only be trained on past data. But the task
of editors is to find what is new. Documents that contain only
old ideas will appear at the tippy top of the AI-based
rankings. Thus editors work against the AI to find the papers
that are just below the tippy top. They have actually new
ideas. This is something we can not automate. We need actual
people. Taking part in these projects can be a gentle
introduction to open science.
•
Progress made to date (200 words maximum)
The sites https://nep.repec.org and https://biomed.news
have end-user information. Here I talk about the technology
stack. There are two software components, ernad and nitpo.
Since 2018 ernad has a clear separation between procedural code,
written in python, Perl and JavaScript, and descriptive code written
in XSLT. NEP and bims are ernad implementations. They use the same
procedural code, but only partly overlapping descriptive code. Ernad
provides front-end services to report editors and readers, and backend
services that run the AI. External software compiles new
additions to RePEc and PubMed, respectively.
Nitpo goes back to a 2021 grant from NlNet. It emails report issues to
subscribers. It prepares individual email based on subscribers’report
portfolio to filter duplicates. It avoids the bane of social media
where the same stories forwarded many times. Its technology
stack is similar to ernad’s. But it can be used independently of ernad.
While NEP and bims use the same software, they live in different
environments. NEP covers most of economics in broad reports that rarely
change editorship. Bims covers a core area, has more variable
selectorships and has far fewer readers. It has more potential.