bims: Astera Fellowship Application

I was a university professor in economics and library science. I found my mission as a digital librarian for scholarly communication. I do pioneering work that it is hard to communicate. Thus I have to start on my own in the hope that somebody will join when I have already done a good chunk of unfunded work on my own. Thus I walk the walk rather than talk the talk. Since it’s a long walk, eventually I get something to brag about. When I was an economics professor in 1993, I published the first economics working paper in the Internet, at a time when my colleagues did not know what the Internet was. Out of that grew the RePEc project, the first distributed subject repository in 1997, at a time when people did not know what a distributed repository was. In 1999, I created the RePEc Author service, the first author registration service even and the only one to date that produces open data. In 1998, I created “NEP: New Economics Papers”. There, experts classify new papers into subject specific areas. As RePEc grew, I build a bespoke machine-learning powered presorting system called ernad. Born in 2003, ernad was the first bibliographic AI tool. In 2017, I found a mate with biomedical expertise. I created “bims: Biomed News”. It is an ernad implementation based on PubMed. Enhancing ernad is now my main concern. Thus using machine teaching rather than searching to interact with the literature is the key technical theme for this application.

Open Science is something that scientists do when funders require it. Otherwise, it’s an extra hassle they can live without. So to get people to adopt open science, you need a tool that does an existing job they have to do better and that throws in the open aspect for good measure, without any extra labour. Interaction with the provides for a very rare sweet spot where we can do better and open.
So having said this by way of introduction, the “problem in communication among scientists” I have been working on is the interaction with the literature, usually known as “scholarly communication”. There are two issues. One is to survey literature. The other is the to stay abreast of of changes in the literature. Each issue requires a different tool. bims and NEP are confined to the second issue. You need to watch the latest literature, so if you open a Biomed News report, you get a very good tool for it, and the results are publicly available. You are an expert in the topic and you can demonstrate that you are. You share your expertise without having to do any extra effort. The result is an expertise sharing system.

A new system would use an improved end of ernad for literature reviews. Yes, lit reviews are terribly unsexy. But everybody has to do them. What if we can use the machine-learning approach to make them faster and more fun. Each review would generate a triple of (1) set of positive examples, (2) a set of negative examples, and (3) an extend of the dataset from which the results are taken. We can make them public, with some provenance data. Then other researchers can reuse these datasets, plug them into our machine learning system or reuse them in competing systems. Better still, users can take several reviews and combine them using simple set operations. This would be another feature that would make reusable lit reviews so much more valuable.

Actually, I expect that too.

Sure. I love to get free consultancy. There is an awful lot to this that does not fit into two pages.

There is a graveyard of startup companies that offer scientists ways to connect with papers. I guess the most famous Meta, bought and closed by Chan Zuckerberg. I suppose they figured they could not make money with this. Some of those still alive are Researcher-App, Research Rabbit, Scispace, LitSuggest. LitSuggest is the one that is closest to bims. The last new feature is from October 2022. As for all the others, they appeal to casual users. Machine teaching only works if you have a sustained user need. And the other sites keep the usage private. The public nature of usage is a key distinguishing aspect of ernad implementations. In addition to being completely original in my approach, I hope to outlive the competition by being cheaper and more open. If we go along with the lit review idea, then there is a chance to make it work as a software as a service business. There will be a free layer but if you want to get it done fast, you have to pay so we can buy the computing resources to serve you better.

I am strong lateral thinker, a decent coder. I am totally poor at bombastic prattle. You will have noted here that I state things as they are rather than as what I think you may want to read. The open nature of my work impedes projects appearing bigger than they are.

In the past my disciples have not been much better at promoting our work than me.

I may well have been too honest.

All the data has been made available since the start. Software written while at the fellowship will be made available if required.

My projects are more about open data than open source. Sure you can do open data and open source if you have an open pocket of a generous funder.

Assuming that I get a coder and we work on an machine-learning based reviewing tool, we could do a back end for a literature survey based on machine teaching in six month, and a no-frills user interface in the next six months, while still debugging the back end.

Well given that we will still be discussing the ideas, let me be brief here. I don’t expect there be much of any compute cost. I have five sponsored servers at this time. One more would be good but not critical. Personally I use the same PC for 10 years and I don’t plan to travel anywhere. If I get anything more than my salary, I will rather spent it on a coder. That would free me to document the code that the coder would write.

On site is best but at what cost? The idea of paying thousands on rent in the Bay Area is ok as long as the fellowship will yield some net savings for me. I live on $500 a month. I am happy for every penny.