bims: rfi pathways to ai enabled research
Title of submission
expertise sharing systems
Describe the AI-enabled tool or application ( 200 words maximum)
The tool is best understood by its historical context. In 1993, I started working on projects to improve the dissemination of working papers in economics. These project seeded what would become the RePEc digital library. In 1998, I created “NEP: New Economics Papers”. It provides readers with subject-specific reports about new working papers. These reports are compiled by volunteer editors. As RePEc grew, I had to cut the workload of editors. In 2003 I created a bespoke software called ernad. It sorts the weekly additions to RePEc by subject relevance. Ernad is the first purely AI-driven bibliographic retrieval tool. In 2017, I found a biomedically trained person in Gavin McStay. He leads PubMed-based ernad implementation called “Bims: Biomed News”. Keeping up with PubMed’s 30k new papers a week is notoriously hard. We have users who have been “bimsing” for years. But uptake has been very slow. Making the data open actually hinders uptake in two ways. First, we can’t brag about users we do not have. Second, we have a core area that we cover well. Potential users from outside that core area seem to be put off, thinking that is is not for them.
What is the value proposition and/or potential outcomes of your AI-enabled tool for facilitating the scientific process? ( 200 words maximum)
For my users, I have a triple value proposition. First, users get a tool for the keeping up with the most recent literature that no search-based tool can beat. Second, without any extra effort, I give them a place where they can demonstrate that they are up-to-date. I make public record of their work is available. Third, I manage a bespoke mailing list system, where weekly report issues are sent to subscribers. Gaining subscribers can be extra work but once you have them, you get more name recognition.
Bims has an additional value proposition for non-academic users. These are patient with long-run diseases, their carers and support organizations. We see great potential for support organizations for people with rare diseases. Bims can easily find papers related to a disease even if the disease is not mentioned by name.
For me personally, the systems are a loss proposition. Neither Gavin no I get any financial rewards. They require the bulk of my labor force to build and further develop. True to the spirit of open science, we publish our hapless attempts at getting funding at https://biomed.news/requests_for_support.
How will the tool support open science and/or expand access to the scientific process? ( 200 words maximum)
For the economics community, NEP is part of the RePEc services that keep its working paper culture thriving. Thus economists have been able to keep their non-commercial publication system. Over time, RePEc has published over 1 million working papers. Computer sciences had no equivalent of RePEc, thus working papers have died out in computer science.
For the biomedical community, the central idea is that of expertise sharing. You are an expert and you demonstrate that by staying up-to-date. At the some time your selections diffuse your expertise.
In a rejected funding application at https://openlib.org/home/krichel/proposals/tiumen.pdf I developed the idea further, i.e., beyond a current awareness service. The idea is to use machine learning to build machine processable literature review objects. There can have a simple set structure of accepted and rejected documents. These object can not only be used on its own. They could also compile by Boolean operators. This is would set another example of sustainable open science practice based on literature reviews.
How does the tool mitigate harmful uses or risk associated with the technology? ( 200 words maximum)
First, ernad does not use neural networks. It does not hallucinate.
Second, yes, report editors may overlook a document that is relevant. But that is an error of the editor, not of the technology. Even if there are false negatives in the training data, machine learning has safeguards against overfitting. Improved machine learning and improved detection of what editors look at can reduce this risk further.
[Warning: this paragraph is somewhat difficult to understand.]
But there are is a more profound reason why our systems are technologically risk-free. They are more human intelligence tools than artificial intelligence tools. Yes, AI is required to make them run. However, any AI can only be trained on past data. But the task of editors is to find what is new. Documents that contain only old ideas will appear at the tippy top of the AI-based rankings. Thus editors work against the AI to find the papers that are just below the tippy top. They have actually new ideas. This is something we can not automate. We need actual people. Taking part in these projects can be a gentle introduction to open science.
Progress made to date (200 words maximum)
The sites https://nep.repec.org and https://biomed.news have end-user information. Here I talk about the technology stack. There are two software components, ernad and nitpo.
Since 2018 ernad has a clear separation between procedural code, written in python, Perl and JavaScript, and descriptive code written in XSLT. NEP and bims are ernad implementations. They use the same procedural code, but only partly overlapping descriptive code. Ernad provides front-end services to report editors and readers, and backend services that run the AI. External software compiles new additions to RePEc and PubMed, respectively.
Nitpo goes back to a 2021 grant from NlNet. It emails report issues to subscribers. It prepares individual email based on subscribers’report portfolio to filter duplicates. It avoids the bane of social media where the same stories forwarded many times. Its technology stack is similar to ernad’s. But it can be used independently of ernad.
While NEP and bims use the same software, they live in different environments. NEP covers most of economics in broad reports that rarely change editorship. Bims covers a core area, has more variable selectorships and has far fewer readers. It has more potential.