Newsletter #52 - Google extends AI models to less resourced languages

Welcome to Nural's newsletter where you will find a compilation of articles, news and cool companies, all focusing on how AI is being used to tackle global grand challenges.

Our aim is to make sure that you are always up to date with the most important developments in this fast-moving field.

New this week - we have a new Jobs section featuring an exciting data scientist role at startup AxionRay
Reach out to advertise your own tech roles!

Packed inside we have

Google extends AI models to less resourced languages
DeepMind launches smaller, more transparent AI language model
and, first steps in AI-generated novel proteins

If you would like to support our continued work from £1 then click here!

Graham Lane & Marcel Hedman

Key Recent Developments

Text and image retrieval across less resourced languages

What: Google researchers have extended image and text models to less well resourced global languages. Datasets of image-text pairs exist for the largest languages but not for less resourced languages such as Swahili and Hindi. The researchers trained a model using both image-text datasets and also translation pairs between 100+ languages. Consequently, a text search in Hindi will bring up a better quality set of images then previous approaches. Furthermore, it produces more accurate text descriptions of images in both less-resourced and better resourced languages, such as French.
Key Takeaway: A key challenge for AI is to respect and benefit from global diversity rather than creating a global mono-culture. Work to extend AI models to less-resourced languages is an important step.

DeepMind says its new language model can beat others 25 times its size

What: DeepMind has proposed new architecture for AI language models, called Retro, that matches the performance of much larger models. Large language models ingest and model huge amounts of text, but these may contain biased or harmful content. Retro, on the other hand, is a smaller neural network that makes use of a large external database of passages of text. These are integrated with the generated responses to improve the overall quality. This also provides greater transparency about how responses are generated and an opportunity to mitigate responses that are inappropriate or incorrect.
Key Takeaway: Expensive large language models are beyond the reach of most academic or state organisations. This DeepMind work plus the Google image-text model and the Hungarian language model (also reported in this edition) all seek to address this issue.
DeepMind blog: Language modelling at scale: Gopher, ethical considerations, and retrieval

AI used to design new proteins

What: DeepMind AlphaFold predicted the 3D structure of human proteins but the design of new proteins with novel chemical capabilities remains a challenge. A group of researchers addressed this problem. The process starts with proteins made up of 100 random amino acids, the 3D structure is predicted and this is then accessed to see if it resembles known protein structures. A random tweak is then made to see whether or not this improves the similarity. The cycle is repeated about 20,000 times. The researchers succeeded in making novel AI-designed proteins. Three of these were examined in detail and found to closely match the AI-predicted structures.
Key Takeaways: This technique generates novel proteins but there is no information about their function. This is still a long way from generating useful novel proteins but it may be a first step in that direction.
Paper: De novo protein design by deep network hallucination

AI Ethics

🚀 Clearview AI on track to win U.S. patent for facial recognition technology

Controversial company Clearview AI is on track to receive a US patent for its “search engine for faces”.

🚀 Crime prediction software promised to be free of biases - new data shows it perpetuates them

Report argues that residents in poorer Black and Latino areas were “relentlessly” targeted by the crime prediction software used by U.S. police

🚀 'Worker Data Science' can teach us how to fix the gig economy

Gig workers demand access to data to understand remuneration and work allocation models

Jobs

Data scientist - AxionRay

Axion are looking to hire a talented NLP DS lead as they enter hypergrowth. Axion is a stealth AI decision intelligence platform start-up working with electric vehicle engineering leaders to accelerate development, funded by top VCs.

Comp: $100k – $180k, meaningful equity!

If interested contact: marcel.hedman@axionray.com

Cool companies found this week

Culture

Palm NFT Studio - is one of the new NFT (Non-fungible token) ecosystems and specialises in culture & creativity (currently hosting a major project by Damien Hirst). It has raised $27 million in round B funding from investors led by Microsoft’s venture fund.

Embedded AI

Edge Impulse - provides a platform for building machine learning models working with data from small sensors and IoT (Internet of Things) devices. The company has raised $34 million in round B funding.

AI Safety

Robust Intelligence - enables stress testing of AI models in order to detect vulnerabilities and automatically prevent unwanted outcomes. The company has raised $30 million in round B funding.

And Finally ...

https://arstechnica.com/science/2021/12/getting-software-to-hallucinate-reasonable-protein-structures/?comments=1

AI/ML must knows

Foundation Models - any model trained on broad data at scale that can be fine-tuned to a wide range of downstream tasks. Examples include BERT and GPT-3. (See also Transfer Learning)
Few shot learning - Supervised learning using only a small dataset to master the task.
Transfer Learning - Reusing parts or all of a model designed for one task on a new task with the aim of reducing training time and improving performance.
Generative adversarial network - Generative models that create new data instances that resemble your training data. They can be used to generate fake images.
Deep Learning - Deep learning is a form of machine learning based on artificial neural networks.

Best,

Marcel Hedman
Nural Research Founder
www.nural.cc

If this has been interesting, share it with a friend who will find it equally valuable. If you are not already a subscriber, then subscribe here.

If you are enjoying this content and would like to support the work financially then you can amend your plan here from £1/month!