We are very fortunate in having Matthew Pye and Jules Pye providing the key learning infrastructure for climate. “Climate” is so huge that no one can understand all of it, so there’s no simple answer. Matthew and his collaborators have created a system for late high-school where learners are likely to be reasonably fluent in their first language, familiar with English as a Second Language and understand the basis of numeracy.
Matthew’s “Climate Academy” combines key Climate {facts+science} with {philosophy, politics, history}. This is unusual so it’s a great opportunity to develop a simple chatbot.
My main fear of chatbots is that the general models on the web are untrustable. Most of the time they’ll be reasonably ok, but occaisonally they’ll hallucinate. I’ve found this especially with numbers, dates, names, references, etc. We can’t afford mistakes so we are providing guardrails using RAG
Wikipedia >Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources.[1] With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM’s pre-existing training data.[2] This allows LLMs to use domain-specific and/or updated information that is not available in the training data.[2] For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources.
We have started by using Matthew’s book (350pp) as the RAG guardrails. The first version of the chatbot explicitly shows every paragraph in the book which contributes to the answer. In this way readers can decide how good or useful the answers are. (We’d expect to add feedback tools soon).
In implementing “Responsible” I look for trust and traceability . Trust requires the original “ground truth” text. It may not be truthful in an absolute sense but it’s persistent and accessible. Traceability means we can follow how the machine arrives at its final answer .
In the current prototype (led by Aleena Harold Peter from semanticClimate) the reply form the chatbot is supplemented by "chips"with links to the original text. There can be up to six or seven of these so the user has a chance to see how well the chatbot is able to synthesize the info. Ultimately we hope to have readers give their own feedback.