The Longevity Genie project aims to enhance the capacity of large language models (LLMs) to address inquiries on personal health, genetics, and longevity research. Presently, LLMs like ChatGPT, and many open-source LLAMA2-like models, lack domain-specific biological knowledge, do not reference sources well, often can hallucinate and provide inaccurate responses.
We use Retrieval-Augmented Generation (RAG) here we extend LLM knowledge with additional data from research papers and databases.
Additionally, we use LLM agents to search for information in longevity databases.
The project consists of the Longevity GPT part that has been deployed long ago, multiple research open-source modules, and datasets that have not been created yet.
We implemented LLM agents to search information in multiple aging databases, we also prepared ageing related papers datasets.
- Develop ChatGPT Longevity plugin which will bring more users and save big part of the cost (as we will not have to pay for ChatGPT API for part of the requests)
- Move LLM agents that deal with Longevity databases to open-source LLMs to avoid paying large ChatGPT API prices
- Hybrid search (mix of vector-based and keyword based ones).
- Move new modules to production
- Improve answers to questions about:
- Longevity activism, and ageing in general
- Genetic and drug lifespan interventions
- Genes and gene products about longevity
- Ageing research articles
- Drugs and ageing
- Questions that require biological databases or APIs
0. Longevity GPT creation:
In Spring 2023 Nikhil Yadala deployed Longevity GPT that used SemanticScholar API to augment ChatGPT answers with research papers.
The limitations of the solution were dependent on the Semantic Scholar API (with 100 papers limit) and code problems that did not allow to open-source everything. Nevertheless, many users are using LongevityGPT, and it allowed collecting meaningful statistics of what the users want.
1. Zuzalu Hackathon:
At the Zuzalu hackathon, open-source agents were developed that can use ageing research databases when answering user questions, as well as Telegram bot interface for interacting with them.
The VitaDAO contribution of 500 VITA supported the GitHub organization and the hugging face datasets repository creation.
Progress in implementing Vector Database search with several embedding on ageing-related subset of S2Orc dataset, the largest dataset of open-source research papers.
So far the main blockage is expensiveness of ChatGPT-powered agents that we are trying to mitigate by switching code generation to open-source models deployed at our collaborators.
3. Collaboration with Rostock university and HEALES:
Collaborations with Rostock University might allow us running LLM agents with OpenSource LLMs that will allow us to finally deploy open-source LLMs on Rostock GPUs and start fine-tuning them to decrease our dependency on ChatGPT API and associated costs.