Precision Agriculture in Citrus: Non-Destructive Nutrient Analysis with Machine Learning and Hyperspectral Imaging

Revolutionizing Citrus Farming with Sustainable Nutrient Monitoring Techniques

Guillem Miralles

16 min readSep 17, 2024

Publication: 2023_Miralles_Estimación.pdf

GitHub Repository: Nutritional Sleuths — GitHub Repository

Part 1: Introduction: Leveraging Technology to Address Nutrient Management Challenges in Citrus Farming

1.1. The Evolution of Nutrient Management in Citrus Farming Through Precision Agriculture

Agriculture has always been at the forefront of technological innovation, with significant progress made in recent years, particularly through the implementation of Information and Communication Technologies (ICTs). These technologies allow farmers to make informed decisions based on data, thus improving productivity and reducing environmental impact. This new approach to farming is called Precision Agriculture, where data is collected and analyzed to optimize resource use while maintaining high standards of quality.

Among the key applications of Precision Agriculture are satellite crop monitoring, automated machinery, and the use of drones for irrigation control. These tools help farmers improve yields while using fewer resources. However, one critical area where technology is still underutilized is in determining the nutritional status of crops, especially in fruit trees like citrus.

1.2. The Problem: Nutrient Management in Citrus Farming, with a Focus on My Homeland, Valencia

Citrus trees, like all living organisms, require proper nutrition to thrive. Adequate nutrition ensures that the trees can resist pests and diseases, grow healthy, and produce high-quality fruit in large quantities. Currently, farmers use fertilizers or organic manure to meet the nutritional needs of their trees. However, there is a fundamental challenge: determining the exact type and amount of nutrients each tree requires. This is not an easy task, as it involves understanding the deficiencies in the soil and the trees themselves.

This challenge is particularly relevant in my homeland, Valencia, one of the world’s leading citrus-producing regions [1]. Citrus farming has been a vital part of the Valencian economy for generations. However, farmers here face the delicate balance between maximizing yields and protecting the environment. Traditional destructive analyses to know the nutritional levels of the tree, which involve taking soil or leaf samples and sending them to labs for testing, are highly accurate but come with significant costs and time delays. Given the scale of citrus production in Valencia, many farmers are unable to conduct these tests frequently, leading to inefficient fertilization practices.

As a result, over-fertilization is common, posing serious environmental risks. Excess nutrients can wash into nearby water bodies, causing eutrophication [2]— a process where water systems become overloaded with nutrients, depleting oxygen levels and threatening aquatic life.

For the farmers of Valencia, this issue also represents a financial burden. Using fertilizers without precise knowledge of each tree’s specific nutrient needs can lead to unnecessary expenses. The development of more efficient, non-destructive methods to monitor nutrient levels is crucial not only for environmental protection but also for supporting the economic sustainability of local farmers.

1.3. Defining the Challenge: Moving Towards Sustainable and Non-Destructive Nutrient Analysis in Citrus Farming

Given the limitations of traditional methods, the challenge becomes clear: how can farmers assess the nutritional status of their trees in a way that is both cost-effective and non-destructive? To address this, researchers have turned to hyperspectral imaging and machine learning. These technologies offer a powerful, innovative solution to the problem by allowing the estimation of nutrient levels without the need for destructive sampling.

Hyperspectral imaging involves capturing data across a wide range of wavelengths, both in the visible spectrum (VIS) and the near-infrared spectrum (NIR) [3]. This data provides a detailed snapshot of how the leaves reflect light at different wavelengths, which can be correlated with the presence of specific nutrients. When this imaging data is combined with machine learning models, such as Support Vector Regression (SVR), Random Forest, and AdaBoost, the system can predict the concentration of key nutrients in the leaves with high accuracy.

Acquisition of Hyperspectral Images — Photo of the Author

In this study, we focus on the use of hyperspectral imaging to estimate both primary macronutrients (like Nitrogen (N), Phosphorus (P), and Potassium (K)) and micronutrients (Manganese (Mn), Copper (Cu), Boron (B), among others) in citrus leaves. The combination of hyperspectral data and advanced regression models provides an efficient, precise, and scalable method for monitoring tree nutrition, helping farmers make informed decisions about fertilizer use. This not only reduces costs but also promotes sustainable agricultural practices by minimizing unnecessary fertilizer applications.

Part 2: Theoretical Framework: Leveraging Hyperspectral Imaging and Machine Learning for Sustainable Citrus Farming

This section explains the theoretical foundation behind these technologies and how they are applied in this study to monitor and manage nutrient levels in citrus trees effectively.

2.1 Hyperspectral Imaging: A Non-Destructive Method for Nutrient Monitoring

Hyperspectral imaging is a technology that captures images across a wide range of wavelengths, from visible light (VIS) to near-infrared (NIR), offering detailed information about the object’s physical and chemical properties. In agriculture, this technology allows us to detect subtle variations in crop health that are invisible to the naked eye.

In the case of citrus farming, hyperspectral imaging provides valuable insights into nutrient levels in the leaves by analyzing how different nutrients affect the way light is absorbed and reflected. For example, nitrogen and phosphorus absorb light at specific wavelengths, which allows us to infer their concentration in the leaves based on their spectral signature.

This study uses hyperspectral imaging to monitor macronutrients such as nitrogen (N), phosphorus (P), and potassium (K), as well as micronutrients like manganese (Mn), copper (Cu), and boron (B) [4] . By collecting data from the visible and near-infrared ranges (400 nm to 1050 nm) across 65 spectral bands, we obtain a highly detailed profile of the leaf’s reflectance properties, enabling the prediction of nutrient levels.

Hyperspectral Image Cube (hypercube) — Article: LeafSpec-Dicot: An Accurate and Portable Hyperspectral Imaging Device for Dicot Leaves

2.2. Experimental Setup: Data Collection and Model Implementation

The experimental setup for this study involved the collection of citrus leaf samples from a commercial orchard in Almenara (Castellón), Valencia. A total of 33 citrus trees of the Clemenules variety were sampled, with 60 leaves collected from each tree in different orientations (east, west, and zenith). These samples were divided into young and old leaves, representing different growth stages. From each tree, three hyperspectral images were obtained per orientation, resulting in a dataset of 5940 leaves.

The reflectance spectra from these leaves were captured using a hyperspectral imaging system equipped with two liquid crystal tunable filters (LCTFs), covering a spectral range from 400 nm to 1050 nm. After image acquisition, the leaves were subjected to destructive ionomics analysis to obtain the true nutrient levels, which were then used as ground truth for training the machine learning models.

The combination of spectral data with nutrient ground truth allowed the models to learn the relationships between reflectance and nutrient content, providing a predictive tool that can be applied to new leaf samples.

Acquisition of the set of data: Experimental Development — Author Design

2.3. Data Preprocessing: Ensuring Quality and Accuracy in Hyperspectral Data

Preprocessing the hyperspectral data is crucial to remove noise and improve prediction accuracy. Several methods were used:

Standard Normal Variate (SNV) Normalization: This method removes noise caused by external factors like leaf texture or lighting variations by standardizing the spectra to have a mean of 0 and a variance of 1, allowing the model to focus on true nutrient-related differences. [5]
Savitzky-Golay (SAVGOL) Filtering: A smoothing technique that preserves key spectral features like peaks and valleys, ensuring that important data related to nutrient levels is not lost. [6]
Savitzky-Golay with First Derivative: Highlights subtle changes in the spectra by calculating the first derivative, making it easier to distinguish between different nutrient levels.
Spectral Averaging (Mean Centering): Subtracts the mean spectrum from each individual spectrum to reduce baseline shifts and emphasize relative differences, helping to detect nutrient variations more clearly.
Savitzky-Golay + SNV: This combination applies SAVGOL smoothing followed by SNV normalization, offering both noise reduction and standardization, improving the accuracy of nutrient predictions.

Savitzky–Golay Filter Process — Wikipedia

2.4. Dimensionality Reduction

Hyperspectral images consist of a large number of spectral bands, which leads to a high-dimensional dataset. While high-dimensional data offers more detail, it also increases the risk of overfitting, where the model learns noise rather than meaningful patterns. To address this, Principal Component Analysis (PCA) is applied to reduce the number of features while retaining the most important information. We also use Random Forest is used not only as a predictive model but also as a tool for band selection, helping to reduce the dimensionality of the dataset.

2.4.1. With PCA: Simplifying High-Dimensional Data

PCA is a widely-used statistical technique that transforms the high-dimensional data into a smaller set of new variables known as principal components [7]. These components are linear combinations of the original variables, ordered by how much variance in the data they explain. The first few components capture the most important patterns in the data, while the remaining components represent noise or less informative aspects.

How PCA works: PCA identifies the directions (components) in which the data varies the most and projects the data onto these new axes. By selecting only the components that explain the largest amount of variance, the dimensionality of the dataset is reduced while retaining most of the valuable information.
Results: In this study, PCA was tested with different numbers of components (e.g., 5, 7, 9, 12). It was found that using 7 components explained over 99% of the variance in the spectral data, making it possible to significantly reduce the dataset’s size without losing critical information for nutrient prediction.
Benefits: PCA reduces the risk of overfitting by eliminating irrelevant features, speeds up model training, and improves the generalization of the models on new data.

First Component Caluclation of the PCA — Wikipedia

2.4.2 Through Band Selection with Random Forest: Optimizing Data for Efficiency

In addition to PCA, Random Forest was employed as a method of feature selection, focusing specifically on identifying the most relevant spectral bands. Random Forest [8] is an ensemble learning technique that creates multiple decision trees and aggregates their predictions. One of its key advantages is that it can rank the importance of each feature (in this case, each spectral band) based on how much it contributes to the accuracy of the predictions.

How it works: During model training, Random Forest assigns an importance score to each spectral band based on its contribution to reducing prediction error. Bands with higher importance scores are more influential in determining nutrient levels. The method then selects the top bands for model training, discarding less important ones.
Results: In this study, Random Forest was used to select 3, 6, and 12 bands for different tests. These subsets of bands were chosen based on their importance scores, allowing the model to focus only on the most informative spectral bands. The use of just 3 ,6 or 12 bands provided a good balance between data adquisicion (we only need to measure the reflectance of 3, 6, or 12 bands instead of 65, significantly reducing the time required). This band selection method offers a practical solution for real-time applications.
Why it’s useful: Band selection via Random Forest simplifies the model by eliminating redundant or less relevant features. This not speeds up the processing time, reduces the complexity of the model and make this project more suitable for deployment in real-world agricultural.

2.5. K-Fold Cross-Validation and Grid Search: Optimizing Model Performance

Once the dataset has been preprocessed, the next step is to train and fine-tune the machine learning models. K-Fold cross-validation [9] is employed to ensure that the models are robust and not overfitting. This technique splits the dataset into ‘K’ subsets (or folds). The model is trained on K-1 folds and tested on the remaining fold. This process is repeated K times, so that each subset is used as a test set once. K-Fold cross-validation provides a more reliable estimate of model performance by testing it on multiple different subsets of the data.

In addition, Grid Search [10] is used to fine-tune the hyperparameters of the models. Hyperparameters are critical to a model’s performance but cannot be learned directly from the training data.

2.4. Machine Learning Models: Transforming Data into Accurate Nutrient Predictions

While hyperspectral imaging provides the raw data, machine learning models are essential for interpreting that data and turning it into actionable insights for farmers. The complexity of hyperspectral data, which consists of multiple wavelengths and features, requires advanced algorithms capable of handling high-dimensional data.

In this study, five machine learning regression models [11] were evaluated to estimate the nutrient content in citrus leaves:

Linear regression was used as a reference method to compare with the other techniques used in the work. It bases its operation on assuming a linear relationship between the independent variables and the dependent variable, its main advantage lies in the interpretability of the model.
PLS is a regression method especially indicated when the input variables of the problem are highly correlated. The method allows modeling the problem by reducing the set of variables to a smaller set of uncorrelated components.
SVR, is a technique based on support vector machines. This regression technique is able to handle nonlinear relational between the input variables and the independent variable of the problem, providing a robust solution against possible outliers.
Random Forest is a regression method based on the combination (ensemble methods) of decision trees. It is characterized by aggregating multiple solutions to improve the prediction and robustness of the solution.
Ada Boost is a boosting-based regression method based on the idea of creating a strong model from several weak models.

2.5 Results and Model Performance Evaluation

To evaluate the performance of each machine learning model, several metrics were used, including the coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) [12] [13]. These metrics provide insight into how well the model predictions align with the actual nutrient levels in the citrus leaves.

The R² value represents the proportion of the variance in the dependent variable (in this case, the actual nutrient levels) that is predictable from the independent variables (the spectral data). It provides insight into how well the model captures the relationship between input and output.
MAE measures the average magnitude of the errors in a set of predictions, without considering their direction (whether the predictions are too high or too low). Essentially, it tells us how far, on average, the model’s predictions are from the actual nutrient levels.
RMSE also measures the differences between predicted and actual values, but it squares the errors before averaging them and then takes the square root of the average. This squaring means that RMSE gives more weight to larger errors, making it useful when large errors are more problematic.

Part 3: Results, Conclusions, and Future Directions

3.1 Results: Model Performance and Key Findings

The results of this study demonstrate the effectiveness of hyperspectral imaging combined with machine learning models for estimating nutrient levels in citrus leaves. The models showed varying levels of success across different nutrient categories, with the following key findings:

3.1.1. All Spectral Bands

Macronutrient Estimation: The machine learning models performed particularly well in estimating the levels of primary macronutrients such as nitrogen (N), phosphorus (P), and potassium (K). The Random Forest model achieved the highest accuracy for nitrogen estimation, with an R² value of 0.853, indicating a strong correlation between the spectral data and the actual nitrogen levels. And we get a 0,762 for Phosphorous and 0,651 for the Potassium.

Macronutrients Estimation Results: All the bans were used for this estimation.

Micronutrient Estimation: Predicting micronutrient levels was more challenging due to the lower concentrations and complex spectral signatures of nutrients like magnesium (Mg), calcium (Cu), and sulphur(S). However, AdaBoost achieved promising results, particularly for calcium, with an R² of 0.769.

Micronutrient Estimation Results: All the bans were used for this estimation.

3.1.2. Spectral Band Selection (3,6 and 12)

Macronutrient Estimation: The machine learning models performed particularly well in estimating primary macronutrients such as Nitrogen (N), Phosphorus (P), and Potassium (K), with reduced band configurations offering promising results:

With 3 bands, Nu SVR achieved the highest accuracy for Nitrogen, with an R² value of 0.815 for calibration and 0.675 for test, demonstrating strong predictive power with minimal data. This indicates that a small set of carefully selected bands can still provide reliable predictions.
For Phosphorus (P), using 12 bands yielded the best result, with the Extra Trees model achieving an R² of 0.762.
Potassium (K) also showed solid results with 12 bands, reaching an R² of 0.654.

Macronutrients Estimation Results: Using Bands Selection

Micronutrient Estimation (Mg, Ca, S): Micronutrient estimation was more challenging due to the lower concentrations and complex spectral signatures of nutrients like Magnesium (Mg), Calcium (Ca), and Sulfur (S). However, band selection still provided valuable insights:

For Calcium (Ca), the 12-band configuration produced the best results, with PLS achieving an R² of 0.809.
Magnesium (Mg) was best predicted with 6 bands, achieving an R² of 0.613 with the K-Neighbors model.
Sulfur (S) also showed better results with 12 bands, reaching an R² of 0.590.

MicronutrientsEstimation Results: Using Bands Selection

The results of this study underscore the versatility and power of hyperspectral imaging combined with machine learning models for nutrient estimation in citrus leaves. The ability to use all spectral bands provided the highest accuracy in general. However, the exploration of reduced spectral band configurations (3, 6, and 12 bands) revealed that fewer bands can still deliver reliable predictions, especially for macronutrients, offering a more practical and efficient approach for real-time field applications. While micronutrient estimation posed more challenges due to lower concentrations and complex spectral signatures, the results show that using targeted band selection can still yield valuable insights.

Ultimately, the study demonstrates that band selection can enhance both the efficiency and accessibility of hyperspectral imaging technologies, paving the way for more sustainable and precise nutrient management in agriculture.

3.2 Conclusions: Advancing Sustainable Nutrient Management in Citrus Farming

This study highlights the potential of combining hyperspectral imaging and machine learning for addressing one of the key challenges in modern agriculture: how to monitor and manage nutrient levels in crops without resorting to destructive and costly methods. In the context of Valencia, where citrus farming plays a vital economic role, these technologies offer a promising solution to the long-standing issues of over-fertilization and nutrient imbalances.

The key conclusions of this study include:

Hyperspectral imaging provides a comprehensive, non-invasive method for capturing nutrient-related data, making it possible to monitor crop health more frequently and with greater accuracy.
Machine learning models can effectively predict the nutritional levels. However, in this project, advanced neural networks were not used due to the limited amount of data available, as nutrient analysis through destructive techniques is both time-consuming and costly. Therefore, in future projects, given the success of this approach, a larger data acquisition should be considered to enhance the results.
This combination of technologies can help farmers in regions like Valencia reduce their reliance on fertilizers, optimize resource use, and minimize their environmental footprint, promoting more sustainable agricultural practices.

3.3 Future Directions: Enhancing the Precision of Nutrient Estimation

While this study has demonstrated the feasibility of using hyperspectral imaging and machine learning to estimate nutrient levels in citrus trees, there are several areas for further research and development:

Improving Micronutrient Estimation: Future work should focus on enhancing the models’ ability to accurately predict micronutrient levels. This may involve experimenting with more advanced machine learning techniques, such as deep learning, to capture the subtle spectral differences associated with these nutrients.
Field Testing and Real-Time Monitoring: While the results of this study are promising, the next step is to implement these models in real-time field conditions. Integrating this technology into precision farming tools, such as drones or mobile platforms, could enable continuous, automated monitoring of crop health.
Expanding to Other Crop Types: Although this study focused on citrus farming, the methodology could be applied to a wide range of crops. Future research could explore the adaptation of these techniques for other high-value crops, extending the benefits of non-destructive nutrient monitoring to other agricultural sectors.

It has been a pleasure to share this project with you. You can view the complete work and all related articles at the following link:

Nutritional Sleuths — GitHub Repository

Feel free to explore the full study, presentations, and articles.

Resources

[1] FreshPlaza contributors. (2024, September 19). Spain continues to be the world’s leading trader of fresh citrus fruits. https://www.freshplaza.com/europe/article/9641975/spain-continues-to-be-the-world-s-leading-trader-of-fresh-citrus-fruits

[2] Nature Communications. (2019). Article: Evidence for multiple testing in nature and its implications. https://www.nature.com/articles/s41467-019-09100-5

[3] Specim. (2024). What is hyperspectral imaging? https://www.specim.com/technology/what-is-hyperspectral-imaging/

[4] Cleveland Clinic. (2024). Macronutrients vs. Micronutrients: What’s the Difference? https://health.clevelandclinic.org/macronutrients-vs-micronutrients

[5] STAT 500 contributors. (2024). The Standard Normal Distribution. Statistics Online. https://online.stat.psu.edu/stat500/lesson/3/3.3/3.3.2

[6] Savitzky-Golay Filtering. Stanford University Lecture Notes. https://web.stanford.edu/class/archive/cs/cs109/cs109.1202/lectureNotes/LN10_normal_gaussian.pdf

[7] Lever, J., Krzywinski, M., & Altman, N. (2017). Principal component analysis. Nature Methods, 14, 641–642. https://www.nature.com/articles/nmeth.4346

[8] O’Sullivan, S. (2019). Variable Importance in Random Forests. Towards Data Science. https://towardsdatascience.com/variable-importance-in-random-forests-20c6690e44e0

[9] Brownlee, J. (2020). A Gentle Introduction to k-Fold Cross-Validation. Machine Learning Mastery. https://machinelearningmastery.com/k-fold-cross-validation/

[10] Chatterjee, S. (2018). GridSearchCV for Beginners. Towards Data Science. https://towardsdatascience.com/gridsearchcv-for-beginners-db48a90114ee

[11] Scikit-learn contributors. (2024). Supervised Learning. Scikit-learn Documentation. https://scikit-learn.org/stable/supervised_learning.html

[12] Scribbr contributors. (2024). Coefficient of Determination (R²) | Definition, Formula, and Example. Scribbr. https://www.scribbr.com/statistics/coefficient-of-determination/

[13] Willmott, C. (2018). What are RMSE and MAE? Towards Data Science. https://towardsdatascience.com/what-are-rmse-and-mae-e405ce230383