• Agricultural production is negatively affected by climate change
  • There could be food shortages by 2050
  • Predicting crop yield is important for farmers, businesses and solving this global grand challenge
  • ML and DL use has been enabled by better and larger datasets
  • However, the quality of data and the predictors selected must be correct
  • Plant Village is an example of a start-up putting this into practice

Climate change is threatening the future of food production due to the increased prevalence of extreme weather events like droughts and the correlation between increasing global temperatures and lower food crop yields [1]. This poses a major problem given that global populations are expected to grow to greater than 10bn people by 2050, evidently a lot must be done to ensure agriculture can meet this anticipated larger demand. However, there are some methods that have grown to combat this problem.

A major step to mitigate these negative effects is the ability to predict crop yield ahead of time. This ability not only provides certainty for farmers and thus the supply chain of large corporations but also enables strategic measures to be put in place in areas which are expected to be adversely affected by climate driven events. These combative measures can vary from the amount of pesticide to use to which particular crop to plant to maximise output. Knowledge of yields ahead of time opens up an wealth of opportunity.

And this realm of prediction is where machine learning comes to play...


Machine learning

Numerous papers exist documenting the use of machine learning in agriculture, dating back to beyond 1995 [2]. However, the growing number of global datasets such as MODIS has meant that machine learning has only become more effective.

In relation to crop yield, ML algorithms can be descriptive or predictive and the value of descriptive models should not be overlooked [3]. Descriptive models analyse historic datasets and draw out the key features that have historically affected crop yield. On the other hand, predictive models are used to determine what crop yields will be in the future. ML algorithms have a varying degree of interpretability (the ability to determine which factors are the most important and in what way do they relate to the dependent variable) meaning that the best algorithm for prediction may not be the best for description.

Common machine learning algorithms used in crop yield prediction papers [3]
Neural Networks (deep learning)
Linear Regression (as a benchmark algorithm)
Random Forest
Support Vector Machine
Gradient Boosting Tree

A key aspect of using a machine learning algorithm is to identify which variables to use to predict crop yield. This will ultimately determine the effectiveness of the modelling exercise and the process of identifying relevant predictors is often arduous. Fortunately, techniques such as principle component analysis (PCA) allows the reduction of dimensionality for the dataset which can often lead to better performance.

Some common features (predictors) used in a representative sample of ML and crop yield papers [3]

ML and DL (e.g. LTSM) techniques outperform traditional statistical techniques, however there is sufficient variability to warrant a large amount of research into whether there is an optimal algorithm to use (I doubt there will be one clear winner) [4].

It must also be noted that climate factors are not the only drivers of decreasing crop yield. Therefore, most of the technology being developed in this area is also vital to mitigate other factors such as disease which may adversely affect crop yield. Thus, it is evident that there are limitless use cases for using machine learning to predict crop yields.


A major limitation, as with most machine learning tasks, is the quality of data that the ML models are trained on. Several models may work only on a macro/ national scale but may be of less value for individual farmers who would be looking to make use of the technology on a farm-by-farm basis. Furthermore, there is a large number of features that could feasibly be used as predictors and so determining the correct features to use is a big challenge in itself. However, this has not stopped start-ups such as Plant Village from diving in to provide AI assistance to this demographic.

Company profile - ML in action

PlantVillage Logo

Who: PlantVillage

What: PlantVillage has created Nuru, an AI assistant for farmers. Nuru has three components to its artificial intelligence:

1) Human expert level crop disease diagnostics using computer vision;

2) Above human capabilities in anomaly detection and forecasting based on ground and satellite derived data;

3) Human language comprehension and automated responses to questions posed by farmers

Tools: TensorFlow

[1] "Temperature increase reduces global yields of major crops in four independent estimates", Chuang Zhao, Accessed Jan 2021,
[2] "Applying machine learning to agricultural data", Robert J.McQueen, Accessed Jan 2021,
[3] "Crop yield prediction using machine learning: A systematic literature review", Thomasvan Klompenburg, Accessed Jan 2021,
[4] "Integrating Multi-Source Data for Rice Yield Prediction across China using Machine Learning and Deep Learning Approaches", Juan Cao, Accessed Jan 2021,