bims: | Biomed News |
Astera Fellowship Application
|
|
1Your professional background
I was a university professor in economics and library science. I found
my mission as a digital librarian for scholarly communication. I do
pioneering work that it is hard to communicate. Thus I have to start
on my own in the hope that somebody will join when I have already done
a good chunk of unfunded work on my own. Thus I walk the walk rather
than talk the talk. Since it’s a long walk, eventually I get something
to brag about. When I was an economics professor in 1993, I published
the first economics working paper in the Internet, at a time when my
colleagues did not know what the Internet was. Out of that grew the
RePEc project, the first distributed subject repository in 1997, at a
time when people did not know what a distributed repository was. In
1999, I created the RePEc Author service, the first author
registration service even and the only one to date that produces open
data. In 1998, I created “NEP: New Economics Papers”. There, experts
classify new papers into subject specific areas. As RePEc grew, I
build a bespoke machine-learning powered presorting system called
ernad. Born in 2003, ernad was the first bibliographic AI tool. In
2017, I found a mate with biomedical expertise. I created “bims:
Biomed News”. It is an ernad implementation based on PubMed. Enhancing
ernad is now my main concern. Thus using machine teaching rather than
searching to interact with the literature is the key technical theme
for this application.
2A problem in communication among scientists
Open Science is something that scientists do when funders
require it. Otherwise, it’s an extra hassle they can live
without. So to get people to adopt open science, you need a tool
that does an existing job they have to do better and that throws
in the open aspect for good measure, without any extra
labour. Interaction with the provides for a very rare sweet spot
where we can do better and open.
So having said this by
way of introduction, the “problem in communication among
scientists” I have been working on is the interaction with the
literature, usually known as “scholarly communication”. There
are two issues. One is to survey literature. The other is the to
stay abreast of of changes in the literature. Each issue
requires a different tool. bims and NEP are confined to the
second issue. You need to watch the latest literature, so if you
open a Biomed News report, you get a very good tool for it, and
the results are publicly available. You are an expert in the
topic and you can demonstrate that you are. You share your
expertise without having to do any extra effort. The result is
an expertise sharing system.
3Your proposed solution to this problem
A new system would use an improved end of ernad for literature
reviews. Yes, lit reviews are terribly unsexy. But everybody has
to do them. What if we can use the machine-learning approach to
make them faster and more fun. Each review would generate a triple
of (1) set of positive examples, (2) a set of negative examples,
and (3) an extend of the dataset from which the results are
taken. We can make them public, with some provenance data. Then
other researchers can reuse these datasets, plug them into our
machine learning system or reuse them in competing systems. Better
still, users can take several reviews and combine them using
simple set operations. This would be another feature that would
make reusable lit reviews so much more valuable.
We expect that your ideas are a starting point for future iterations.
Actually, I expect that too.
Please let us know explicitly if you are flexible and willing to collaboratively shape your idea prior to starting a potential fellowship at Astera)
Sure. I love to get free consultancy. There is an awful lot to this that does not fit into two pages.
4
The landscape of competitive products and services, and how your solution differs
There is a graveyard of startup companies that offer scientists
ways to connect with papers. I guess the most famous Meta,
bought and closed by Chan Zuckerberg. I suppose they figured
they could not make money with this. Some of those still alive
are Researcher-App, Research Rabbit, Scispace,
LitSuggest. LitSuggest is the one that is closest to bims. The
last new feature is from October 2022. As for all the others,
they appeal to casual users. Machine teaching only works if you
have a sustained user need. And the other sites keep the usage
private. The public nature of usage is a key distinguishing
aspect of ernad implementations. In addition to being completely
original in my approach, I hope to outlive the competition by
being cheaper and more open. If we go along with the lit review
idea, then there is a chance to make it work as a software as a
service business. There will be a free layer but if you want to
get it done fast, you have to pay so we can buy the computing
resources to serve you better.
5
Your skills, gaps in your skills that you have identified,
I am strong lateral thinker, a decent coder. I am totally poor
at bombastic prattle. You will have noted here that I state
things as they are rather than as what I think you may want to
read. The open nature of my work impedes projects appearing
bigger than they are.
and how these gaps might be complemented by team members or future team members.
In the past my disciples have not been much better at promoting our work than me.
We value your honest self-assessment.
I may well have been too honest.
6.
Openness of code, data, and other resources is strongly preferred.
All the data has been made available since the start. Software
written while at the fellowship will be made available if
required.
If you believe that keeping some resources proprietary is
necessary for the success of your project, please explain what
you intend to keep proprietary, how doing so would increase your
project's value for the public good, and whether making it open
instead necessarily precludes that impact.
My projects are more about open data than open source. Sure you
can do open data and open source if you have an open pocket of a
generous funder.
7. Quarterly
milestones you would like to achieve in a 1-year fellowship
Assuming that I get a coder and we work on an machine-learning
based reviewing tool, we could do a back end for a literature
survey based on machine teaching in six month, and a no-frills
user interface in the next six months, while still debugging the
back end.
8. Budget
(include major expense categories (compute costs, equipment,
travel, etc) and personnel costs broken down by individual
employees/contractors)
Well given that we will still be discussing the ideas, let me be
brief here. I don’t expect there be much of any compute cost. I
have five sponsored servers at this time. One more would be good
but not critical. Personally I use the same PC for 10 years and
I don’t plan to travel anywhere. If I get anything more than my
salary, I will rather spent it on a coder. That would free me to
document the code that the coder would write.
9. Whether
you prefer to be on-site, remote, or hybrid.
On site is best but at what cost? The idea of paying thousands
on rent in the Bay Area is ok as long as the fellowship will
yield some net savings for me. I live on $500 a month. I am
happy for every penny.