Innovative approaches to solar energy forecasting: unveiling the power of hybrid models and machine learning algorithms for photovoltaic power optimization

Zhu, Chaoyang; Wang, Mengxia; Guo, Mengxing; Deng, Jinxin; Du, Qipei; Wei, Wei; Zhang, Yunxiang

doi:10.1007/s11227-024-06504-z

Innovative approaches to solar energy forecasting: unveiling the power of hybrid models and machine learning algorithms for photovoltaic power optimization

Published: 17 October 2024

Volume 81, article number 20, (2025)
Cite this article

Download PDF

Access provided by China Pharmaceutical University

The Journal of Supercomputing Aims and scope Submit manuscript

Innovative approaches to solar energy forecasting: unveiling the power of hybrid models and machine learning algorithms for photovoltaic power optimization

Download PDF

Chaoyang Zhu^{1,2,3,4,5,6,7,8},
Mengxia Wang⁹,
Mengxing Guo¹⁰,
Jinxin Deng²,
Qipei Du²,
Wei Wei¹¹ &
…
Yunxiang Zhang^5,6,11

396 Accesses
1 Citation
Explore all metrics

Abstract

As the world endeavors to shift toward sustainable energy solutions, the pivotal role of solar energy, specifically photovoltaics, becomes increasingly evident. This study investigates the critical task of accurately predicting photovoltaics power output, a fundamental aspect of maximizing economic benefits and ensuring stability in modern electric power systems. Three categories of models, including deterministic, statistical, and hybrid, are explored, with a focus on machine learning (ML) models such as AdaBoost and HGBoost. The results indicate that AdaBoost generally outperforms HGBoost in terms of accuracy metrics, such as R², RMSE, and VAF, demonstrating its effectiveness in photovoltaics power prediction. However, the difference in performance, while notable, may not be substantial across all metrics, suggesting that the choice of the best model could depend on specific use cases and trade-offs between accuracy and computational efficiency. In addition, this study introduces hybrid models incorporating optimization algorithms such as fruit-fly optimizer, satin bowerbird optimizer, and particle swarm optimizer, with the integration of HGBoost and satin bowerbird optimizer emerging as the top-performing hybrid model (R²= 0.9907), depicting enhanced accuracy and reduced error rates. The study concludes that combining ML with optimization algorithms significantly enhances PV power prediction accuracy, offering valuable insights for integrating renewable energy into modern energy systems.

A next-generation hybrid energy converter empowered by machine learning: pioneering sustainable integration of photovoltaic and grid power

Article 18 February 2025

Advanced machine learning techniques for predicting power generation and fault detection in solar photovoltaic systems

Article Open access 19 February 2025

Advanced hybrid machine learning models for enhanced electricity consumption forecasting: integrating MLP, HGBoost, and AdaBoost and hybrid machine learning

Article 30 July 2025

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Confronting the imperative of shifting toward a sustainable energy infrastructure to alleviate the repercussions of climate change stands as one of the foremost challenges confronting humanity in the forthcoming decades. Solar energy emerges as a pivotal player in this shift, possessing the potential to fulfill all our energy requirements on its own, thereby establishing itself as the foremost energy source for the future [1]. Photovoltaics (PV) stands out as a preeminent technology, boasting a global installed capacity exceeding 760 GW and experiencing a remarkable annual growth rate of 20%. This technology excels in directly converting solar radiation into electricity [2]. Notably, what was once a significant obstacle—the high cost of PV a decade ago—has now evolved, with PV becoming one of the most economically viable methods for electricity generation in numerous regions around the world [3].

The ascent of solar energy’s popularity continues unabated, propelled by its environmentally friendly nature, low-maintenance characteristics, and the diminishing costs of PV modules [4]. Concurrently, the ongoing depletion of fossil fuels contributes to this trajectory. This evolving trend is anticipated to lead to a heightened penetration of PV energy into contemporary energy systems. Nonetheless, the unpredictable nature of weather systems introduces significant uncertainties in the power output of PV energy systems, marked by intermittency, volatility, and randomness. These uncertainties pose challenges to real-time control performance, economic viability, and the overall management of electric power and energy systems. The effective prediction of the power output from a PV power plant becomes crucial for maximizing its economic benefits. While it is well-established that solar irradiance directly impacts PV output power, additional meteorological parameters, including ambient temperature, relative humidity, wind speed, and dust accumulation, have also been recognized as influential factors affecting PV efficiency [5,6,7,8].

Furthermore, the anticipation of PV power becomes increasingly invaluable in the context of integrating multiple energy sources to establish a comprehensive hybrid energy matrix. Managing system stability encounters heightened challenges due to the inherently intermittent nature of solar energy, especially when there is a considerable influx of renewable energy [9]. To tackle this challenge, solar power forecasting emerges as an essential solution, fortifying system stability through the provision of precise predictions for the upcoming power generation. This empowers system control engineers with the requisite insights to make informed decisions and adeptly oversee the assimilation of renewable energy sources. This predictive capability allows for better management and control of the system, addressing the challenges associated with the variability of solar power and contributing to overall grid stability [10].

Numerous recent studies have delved into a variety of methodologies to forecast and estimate PV output power. In the realm of predicting power production from PV plants, the literature prominently outlines three distinct model types: phenomenological, stochastic/statistical learning, and hybrid models. Deterministic approaches, firmly rooted in physical phenomena, strive to predict PV plant output by considering the electrical model of the PV devices constituting the plant. As an illustration, a deterministic approach was employed to intricately model the electrical, thermal, and optical characteristics of PV modules [11]. This method involves a detailed consideration of the underlying physical principles governing the PV system to enhance the accuracy of power output predictions.

Various forecasting approaches, encompassing statistical and ML methods such as ANN, SVM, MLR, and ANFIS, function without requiring prior knowledge of the specific system. These models seek to comprehend the relationship between inputs and outputs by thoroughly analyzing a dataset comprising collected input and output variables. Statistical learning algorithms offer distinct advantages. Firstly, they possess the ability to learn from the data at hand and demonstrate effectiveness even when dealing with incomplete datasets. Secondly, once trained, these models showcase the capacity to generalize their learning, providing predictions that extend beyond the confines of the initial training dataset. These features make statistical and ML methods suitable for various applications and contexts. Different ML techniques contribute to the versatility and adaptability of power output forecasting models in the realm of PV energy systems [12, 13]. Jawaid et al. [14] conducted a comparative analysis of solar irradiance forecasting, employing artificial neural networks (ANN) and standard regression algorithms. Their study revealed that integrating azimuth and zenith parameters significantly enhanced model performance, aiding in the accurate sizing of solar power generation units. This research emphasized the importance of precise predictions for system sizing, ROI calculations, and system load measurements [14]. Numerous studies have employed ML techniques to forecast solar irradiance instead of directly predicting photovoltaic (PV) power output. Li et al. [15] developed ML-based algorithms (hidden Markov model and SVM regression) for precise short-term solar irradiance forecasting, showcasing their effectiveness in predicting 5–30-min intervals. The study, utilizing the Australian Bureau of Meteorology dataset, highlights the importance of advanced forecasting for integrating distributed photovoltaic systems into a reliable electricity market [15]. Moosa et al. [16] aimed to promote solar energy adoption by developing a model that assesses an individual’s potential for switching to solar energy. Their research focused on making this information widely accessible, reducing sales and marketing costs, and addressing uncertainties in solar energy production. Additionally, the study emphasized the importance of precise solar irradiance forecasting for system load calculations and advanced grid planning [16]. Kayri et al. [17] conducted a comparative analysis of estimation accuracy using multiple linear regression, random forest, and artificial neural network based on power production data from a photovoltaic module. The artificial neural network demonstrated the highest correlation (R = 0.997), with all methods recognizing global radiation as the most pivotal predictor. The significance of other predictors varied among the models [17]. Trigo-Gonzalez et al. (2020) devised a methodology to evaluate energy viability prior to the installation of PV plants. They compared three statistical techniques—artificial neural network, support vector machine, and multiple linear regression—for estimating production in PV plants spanning Spain and Chile. By integrating atmospheric variables, the models demonstrated an accuracy with a normalized root-mean-square error close to 3%. This tool assists companies in precise PV plant sizing and performance prediction, accounting for losses, and establishing a robust economic foundation for investments [18]. Theocharides et al. [19] introduced an ML-based methodology for accurate day-ahead photovoltaic power forecasting. Through the incorporation of linear regression adjustments, the model attained impressively low errors, reaching as little as 4.7%. Notably, this accuracy held consistent across diverse climatic conditions [19].

Some studies exclusively focused on the training and testing of individual ML algorithms for predicting PV power. Sheng et al. (2018) proposed an innovative approach for PV power forecasting using weighted Gaussian process regression and outlier detection. Their method, incorporating a density-based local outlier detection and a novel concept of nonlinear correlation, demonstrated higher estimation accuracy compared to typical data-based approaches in experimental analyses [20]. Syafaruddin et al. (2010) examined the effectiveness of radial basis function (RBF) and three-layered feed-forward neural networks (TFFN) in addressing partial shading challenges in PV systems. Assessing network structure, training, and validation processes, the study compared their performance with the adaptive neuro-fuzzy inference system (ANFIS) using a real-time simulator [21]. Mellit [22] proposed an innovative approach using adaptive neuro-fuzzy inference (ANFIS) to model optimal sizing parameters for stand-alone photovoltaic power systems in locations without meteorological data. The study, comparing various artificial intelligence techniques, achieved high accuracy in estimating optimal configurations for 200 locations in Algeria, demonstrating the model’s potential for global applications based solely on geographical coordinates [22].

Thirdly, a hybrid model can overcome the limitations of a single technique by combining different models. Wu et al. (2014) tackled short-term PV output forecasting challenges using an experimental database from multiple locations. Their study introduced a combination model employing ARIMA, SVM, ANN, ANFIS, and GA algorithms, demonstrating high precision and efficiency for ensuring stable operation in photovoltaic generation systems [23].

Moreover, “ensemble” methods represent an approach where predictive models are constructed by integrating multiple strategies to collectively enhance the overall accuracy of predictions. Yokoyama et al. [24] introduced and assessed an ensemble method for short-term load forecasting (STLF), aiming to enhance accuracy in the power industry’s planning processes. The evaluation, based on actual load and meteorological data in PJM in the USA, showed promising results [24].

1.1 Research gaps and main contributions

ML is widely used for modeling PV plants, but statistical methods remain crucial, especially for tasks like scheduling PV plant power output for the day-ahead market. PV power forecasts often rely on meteorological agencies, requiring conversion from irradiance to power forecasts by PV plant owners. In cases of limited information about distributed PV parks, the necessity of user-friendly data-driven methods becomes evident. Despite this need, the existing literature lacks dependable recommendations for selecting models, predictors, and hyperparameters. This study seeks to bridge this gap through a comparative analysis of single ML methods and hybrid models for PV power forecasting. The goal is to offer guidance for practical and research applications, aiding in the thoughtful selection of models and hyperparameters. The main contributions of the presented study lie in advancing the field of PV power forecasting through the application of innovative ML methods and hybrid models. Particularly, the study introduces and evaluates the performance of AdaBoost and HGBoost algorithms, and it is expected that these ML models can effectively predict PV power output with high accuracy. Moreover, the study delves into the integration of optimization algorithms, including the fruit-fly optimization algorithm (FOA), satin bowerbird optimizer (SBO), and particle swarm optimization (PSO), to further elevate the predictive capabilities of these models.

In addition to improving forecasting accuracy, this study contributes to the broader field of renewable energy by providing practical insights and guidance for the selection of models and hyperparameters in PV power prediction. The research fills a critical gap in the literature by offering a comparative analysis of single ML methods and hybrid models, which can serve as a valuable reference for both researchers and practitioners. By addressing the challenges associated with the variability and uncertainty of solar energy, this study supports the integration of PV systems into modern energy grids, ultimately contributing to more stable and economically viable renewable energy solutions.

The achieved results in this study demonstrate that the integration of ML models with optimization algorithms significantly improves PV power forecasting accuracy. Among the various models developed, the HGBoost-SBO hybrid model emerged as the top performer, achieving the lowest root-mean-square error and mean absolute percentage error compared to other models. This model consistently depicted superior predictive accuracy, outperforming models like AdaBoost and other HGBoost variants. The results highlight the effectiveness of hybrid optimization techniques, particularly SBO, in fine-tuning machine learning models for better performance.

1.2 Paper organization

The remainder of the presented study is structured as follows:

Section 2 describes the methodology and all details to model the presented study, such as data, utilized methods, and performance metrics. Section 3 discusses the main outputs of this study, along with the comparisons with existing works and simple methods. Eventually, Sect. 4 summarizes the outputs and presents a concise version of what is carried out in this evaluation.

2 Methodology

In this research, a comprehensive exploration of ML techniques, such as CatBoost and XGBoost, was undertaken to forecast PV power. Hour-ahead forecasting was performed to predict PV power output, which is crucial for real-time energy management and integration into the electrical grid. The models and hybrid approaches were developed and evaluated with this specific forecasting horizon in mind. Initial efforts involved sourcing pertinent data from reputable outlets and subjecting it to meticulous preprocessing. A stringent data quality regimen was instituted initially to validate the acquired datasets, ensuring their dependability for subsequent model development and performance assessment. Thorough examinations were conducted to ascertain the consistency of recorded data and identify any gaps before delving into supplementary analyses. Beyond data quality verification, a secondary pivotal analysis focused on appraising the significance of input features on the output by scrutinizing the correlations among various parameters. The Pearson correlation coefficient was utilized for this purpose. The subsequent phase involved the application of selected models through the segmentation of data into training and testing sets. Consequently, CatBoost and XGBoost models were formulated for power prediction. In the initial phase, these models were employed individually, and in subsequent stages, hybrid models incorporating optimization algorithms, specifically particle swarm optimization (PSO), satin bowerbird optimizer (SBO), and fruit-fly optimization algorithm (FOA), were introduced. The optimization process encompassed tuning hyperparameters, including maximum depth, learning rate, and decision tree number. All employed ML models were implemented utilizing the scikit-learn library in Python software. Moreover, the optimization algorithms were implemented by the use of custom Python scripts. The experiments were conducted in a standard computational environment.

Following the prediction of PV power, the accuracy of the models and results underwent evaluation using statistical methods and parameters elaborated upon in subsequent sections. The entirety of the modeling process is visually delineated in a flowchart, as illustrated in Fig. 1.

2.1 Data

Securing comprehensive, high-quality historical data is a critical aspect of effective ML applications, but this task is often impeded by financial conflicts among power producers, regulatory scheduling requirements, meteorological forecast suppliers, and national service companies. The datasets employed in this study, derived from reference [25], address these challenges with substantial and reliable information. Recognizing the annual cyclical nature of weather patterns, a minimum of one year’s worth of data is essential for robust training and accurate algorithm validation. The “Hour” variable represents the time of day in hours, ranging from 6 to 18. The dataset also contains measurements of solar irradiance, such as Global Horizontal Irradiance (GHI), measured in W/m², which quantifies the total shortwave radiation received by a surface horizontal to the ground. Moreover, direct normal irradiance (DNI), also measured in W/m², provides the solar radiation received per unit area by a surface perpendicular to the sun’s rays. Diffuse horizontal irradiance (DHI) is another component, which represents the solar radiation received from the sky, excluding direct sunlight. Temperature variables include the wet bulb temperature, measured in degrees Celsius (°C) that reflect humidity in the air, and the dew point temperature (°C), indicating the temperature at which air becomes saturated with moisture. The dataset also includes energy readings, measured in kWh, representing the PV output.

Consequently, the datasets utilized in this research span from January 1990 to December 2014 in Mount Gambier City, South Australia, providing a comprehensive temporal perspective. For training and testing the machine learning models, the dataset was partitioned into two sets. Data from 1990 to 2009 was utilized as the training set for model development. Data from 2010 to 2014 was designated as the testing set to evaluate model performance. The models were designed to predict PV output on an hourly basis, with a prediction horizon of 24 h. The solar power generation data is recorded at an hourly resolution, aligning with the typical requirements of electricity markets. Table 1 outlines key parameters influencing PV predictions, encompassing variables such as temperature, GHI, DNI, DHI, and the hour of the day, regarding the count as 97,000.

Table 1 Comprehensive overview of the input variables along with their corresponding statistical details

Full size table

2.2 Machine learning methods

In this section of the study, an examination is conducted of the algorithms employed for predicting PV power. Additionally, alongside these algorithms, a concise overview of optimization algorithms and their functionalities is provided. These optimizers include PSO, FOA, and SBO.

2.2.1 Adaptive boosting regression (AdaBoost)

Adaptive boosting regression, a sequential ensemble technique in the realm of ML, stands out for its ability to amalgamate multiple weak learners selected randomly from the dataset, culminating in the creation of a resilient model. These feeble models, crafted through diverse ML techniques, collectively contribute to the robustness of the final model. During each training iteration, distinct weights are assigned to individual data points, steering the learning trajectory of the hypothesis. Should predictions err, subsequent emphasis is placed on these inaccuracies by assigning higher weights to the following weak learner. This iterative process persists until the algorithm attains the targeted level of accuracy. In regression tasks, where outcomes are not confined to a binary spectrum but involve quantifiable errors, the ensemble model departs from a simplistic right or wrong evaluation. Instead, it may opt for a nuanced approach, utilizing metrics like the median or a weighted average derived from all weak models to formulate its ultimate prediction [26].

Having a system dataset S comprised of pairs (x_i, y_i), where i spans from 1 to N, and each x_i is an instance from the set X, while each y_i is a label from the set Y. Assuming a specified number of rounds denoted by M, the algorithm initiates by establishing the initial distribution (or weight) of D in the following manner:

$${D}_{i}^{1}=\frac{1}{N}\text{ For }i=\{1,....,N\}$$

(1)

During each iteration j ranging from 1 through M, the AdaBoost algorithm systematically builds weak models or learners denoted as h_j. These models are crafted utilizing the training dataset and the prevailing weight distribution D. The primary objective is to devise a model that not only minimizes the error ε_j, but also guarantees that ε_j remains below 0.5. This ε_j signifies the weighted error associated with the jth model and is precisely defined by Eq. (2).

$${\varepsilon }_{j}={\sum }_{i:{h}_{j}(xi)\ne yi}{D}_{i}^{j}$$

(2)

The “confidence” or weight ${\alpha }_{j}$ linked to the jth model is ascertained through Eq. (3).

$$\alpha_{j} = \frac{1}{2}\ln \left( {\frac{{1 - \varepsilon_{j} }}{{\varepsilon_{j} }}} \right)$$

(3)

The subsequent iteration involves adapting the weight distributions through the following modifications:

$${D}_{i}^{j+1}={e}^{-{y}_{i}{h}_{i}({x}_{i}){\alpha }_{j}{D}_{i}^{j}}$$

(4)

$${D}_{i}^{j+1}=\frac{{D}_{i}^{j+1}}{{\sum }_{i=1}^{N}{D}_{i}^{j+1}}$$

(5)

The forecast for a new dataset is acquired by computing the weighted average of the models h_j in the subsequent manner:

$$H\left( x \right) = \mathop \sum \limits_{j}^{M} \alpha_{j} h_{j} \left( x \right)$$

(6)

The AdaBoost algorithm demonstrates versatility, applying it to both classification and regression problems. Nevertheless, in this particular research endeavor, we exclusively utilized the AdaBoost algorithm for regression tasks [27].

2.2.2 Histogram gradient boosting regressor (HGBoost)

HGBR, or histogram gradient boosting regressor, represents an evolved version of the traditional gradient boosting regressor, incorporating histograms as an innovative method for splitting. Diverging from conventional splitting techniques, HGBR employs histograms to segment the feature space into multiple bins, enhancing precision and adaptability in partitioning. This unique approach allows for a more detailed differentiation of data, proving especially beneficial when dealing with both categorical and continuous variables. An outstanding feature of HGBR is its diminished sensitivity to hyperparameter selection, setting it apart from the standard gradient boosting regressor. The impact of hyperparameter choices is noticeably reduced in HGBR, rendering it more resilient and less prone to overfitting. It is worth noting, however, that the computational demands of HGBR during training may be higher due to the incorporation of histograms. In summary, HGBR provides an enhanced and versatile solution for regression tasks by leveraging histograms for partitioning and delivering more accurate predictions. Its capability to handle a diverse range of variable types and its reduced sensitivity to hyperparameters make it a valuable asset in various ML applications [28].

2.3 Optimization algorithms

In this section, various optimization algorithms such as particle swarm optimization (PSO), satin bowerbird optimizer (SBO), and fruit-fly optimization algorithm (FOA) are utilized in the presented section.

2.3.1 Particle swarm optimization (PSO)

PSO is a stochastic optimization algorithm introduced by Kennedy and Eberhart [29]. Drawing inspiration from the social behavior of birds within a flock and the principles of swarm intelligence, the PSO algorithm orchestrates a collective effort among a swarm of particles to explore and interact, aiming to discover the optimal personal (P_best) and global (g_best) positions in the search space. The algorithm dynamically adjusts the velocity and position of all particles through the following update mechanism:

$$V_{{{\text{new}}}} = \beta V + C_{1} r_{1} \left( {P_{{{\text{best}}}} - X} \right) + C_{2} r_{2} \left( {g_{{{\text{best}}}} - X} \right)$$

(7)

$$X_{{{\text{new}}}} = X + V_{{{\text{new}}}}$$

(8)

In the provided PSO algorithm context, X and V represent the current position and velocity of each particle, respectively. X_new and V_new signify the new position and velocity of each particle. The parameter β serves as the inertia weight, regulating the influence of the previous velocity history. Additionally, C₁ and C₂ denote the cognition learning factor and social learning factor, respectively. r₁ and r₂ are two independent random numbers within the [0, 1] range.

For a deeper understanding of the PSO algorithm, more extensive information can be found in [30].

2.3.2 Satin bowerbird optimizer (SBO)

Initiating the SBO algorithm involves generating an initial population through uniform random sampling, and establishing a set of positions for optimization [31]. Each position, denoted as (pop(i): Pos), is precisely defined based on the specified parameters for optimization, as outlined in Eq. (9). Crucially, the values within this assembly are constrained within the pre-established minimum and maximum boundaries for the optimized parameters.

$${\text{pop}}\left( i \right).{\text{Pos}} = {\text{rand}} \left( {1,n_{{{\text{var}}}} } \right).\left( {{\text{Var}}_{{{\text{Max}}}} - {\text{Var}}_{{{\text{Min}}}} } \right) + {\text{Var}}_{{{\text{Min}}}}$$

(9)

To a certain extent, the likelihood of attracting a male/female (Probi) to a bower is calculated using Eqs. (10) through (11) [31].

$$\Pr {\text{ob}}_{{\text{i}}} = \frac{{\cos t_{i} }}{{\mathop \sum \nolimits_{k = 1}^{{n_{{{\text{pop}}}} }} \cos t_{i} }},\forall i \in n_{{{\text{Pop}}}}$$

(10)

$${\text{cos}}t_{i} = \left\{ {\begin{array}{*{20}l} {\frac{1}{{1 + f(x_{i} )}},} \hfill & {f(x_{i} ) \ge 0} \hfill \\ {1 + |f(x_{i} )|,} \hfill & {f(x_{i} ) < 0} \hfill \\ \end{array} } \right.$$

(11)

In line with other evolutionary-based optimizers, the use of elitism is crucial in preserving the optimal solution(s) during every iteration of the optimization process. During the mating season, males, similar to their avian counterparts, instinctively involve themselves in the construction and embellishment of bowers. Instinctively, older and more experienced male birds possess the capability to draw increased attention to their meticulously constructed bowers. Simply put, these bowers showcase superior fitness compared to their counterparts. Within the SBO methodologies, the location of the most outstanding bower, meticulously crafted by avian species, is envisioned as the elite of the kth iteration, denoted as x_(elite,k). This elite position not only achieves the utmost fitness, but also wields influence over other positions. Subsequently, during each iteration, innovative adjustments in any bower are computed using the formula depicted in Eq. (12).

$$x_{i,k}^{{{\text{new}}}} = x_{i,k}^{{{\text{old}}}} + \beta_{k} \left[ {\left( {\frac{{x_{j,k} + x_{{{\text{elite}},k}} }}{2}} \right) - x_{i,k}^{{{\text{old}}}} } \right]$$

(12)

It is essential to underscore that the roulette wheel selection mechanism is employed to pick bowers with an enhanced probability, denoted as x_(j,k).

In the SBO framework, the parameter β_k plays a pivotal role in determining the step size for the selection of the target bower. This calculation is conducted individually for each variable and is adapted based on:

$${\beta }_{k}=\frac{\alpha }{1+{\text{Prop}}_{i}}$$

(13)

Stochastic modifications are introduced to x_(i,k) with a specified likelihood, involving a normal distribution N with an average of ${x}_{i,k }^{\text{old}}$ and a variance of σ, as delineated in Eq. (14).

$$\begin{gathered} x_{i,k}^{{{\text{new}}}} \sim x_{i,k}^{{{\text{old}}}} + \sigma .N\left( {0,1} \right) \hfill \\ \sigma = Z.({\text{Var}}_{{{\text{Max}}}} - {\text{Var}}_{{{\text{min}}}} \hfill \\ \end{gathered}$$

(14)

Upon concluding each cycle, the assessment involves the evaluation of both the old population and the population derived from the aforementioned alterations. These two populations are amalgamated, sorted, and employed to formulate the new population. The inspiration and rationale behind the SBO approach can be explored in detail in [31].

2.3.3 Fruit-fly optimization algorithm (FOA)

The fruit-fly optimization algorithm (FOA), introduced by Pan in 2011 [32], emerged as a groundbreaking swarm intelligence algorithm. Functioning as an interactive evolutionary computation method, FOA replicates the foraging behavior observed in fruit fly swarms, enabling it to converge effectively toward the global optimum. Fruit flies, inhabitants of temperate and tropical climates, exhibit a preference for decaying fruit. Renowned for their superior vision and olfactory capabilities, fruit flies employ a systematic approach to locating food. The process involves initially detecting the scent of the food source using its osphresis organ, followed by flying toward the source. As the fruit fly approaches the food location, its keen vision is employed to identify both food and the congregation of other fruit flies, guiding it in the appropriate direction [33,34,35].

Following the foraging characteristics of a fruit fly swarm, the fruit-fly optimization algorithm (FOA) can be delineated into several steps, as outlined below:

Initialization of Parameters: The essential parameters for the fruit-fly optimization algorithm (FOA) encompass the maximum iteration number (maxgen), population size (sizepop), the initial location of the fruit fly swarm (X_axis, Y_axi), and the range for random flight distance (FR).

Population Initialization: Initialize each individual fruit fly by assigning a random flight direction and distance for food finding through the process of osphresis.

$${X}_{i}={X}_{\text{axis}}+\text{rand}$$

(15)

$${Y}_{i}={Y}_{\text{axis}}+\text{rand}$$

(16)

Population Evaluation: First, calculate the distance Dist of the fruit fly from the origin. Secondly, compute the smell concentration judgment value S. Suppose that S is the reciprocal of Dist:

$${\text{Dist}}_{i}=({X}_{i}^{2}+{Y}_{i}^{2}{)}^{1/2}$$

(17)

$${S}_{i}=1/{\text{Dist}}_{i}$$

(18)

Subsequently, calculate the Smell_i at the location of each individual fruit fly by applying the smell concentration judgment value (S_i) to the associated fitness function. Ultimately, ascertain the individual fruit fly exhibiting the most elevated smell concentration (maximal Smelli) within the entire swarm of fruit flies.

$${\text{smell}}_{i}=\text{Function}({S}_{i})$$

(19)

$$[\text{bestsmellbestIndex}]=\text{max}({\text{smell}}_{i})$$

(20)

Selection Operation: Retain the maximal Smell_i value along with the corresponding x and y coordinates. Subsequently, guide the fruit flies to move toward the location with the highest smell concentration value utilizing their vision. Initiate an iterative optimization process by repeating the implementation of steps 2–3. This iteration persists until the smell concentration no longer surpasses the previous iteration’s smell concentration or the iteration count reaches the maximum specified value, at which point the process concludes.

$$\text{smellbest}=\text{bestsmell}$$

(21)

$${X}_{\text{axis}}=X(\text{bestIndex})$$

(22)

$${Y}_{\text{axis}}=Y(\text{bestIndex})$$

(23)

2.4 Performance metrics

The performance metric acts as a simple forecasting model, frequently used as a benchmark for comparison with more intricate models. Error metrics offer a more objective evaluation of the model’s effectiveness in performing its tasks [36]. To assess the reliability of the hybrid models utilized in this study, various indicators are employed to measure the agreement between predicted and actual values, as outlined in Table 2. The mean absolute percentage error (MAPE) values are expressed as percentages, indicating the percentage difference between predicted and actual values. The root-mean-square error (RMSE) and mean bias error (MBE) are expressed in the same units as the dependent variable, which is likely kilowatt-hours (kWh), which provides an absolute measure of prediction error and bias in the original units of the data. The relative absolute error (RAE) is a dimensionless ratio, comparing the total absolute error to that of a simple model, while the coefficient of determination (R²) is also dimensionless and indicates the proportion of variance in the dependent variable explained by the model in the range of 0 and 1. Finally, the variance accounted for (VAF) is expressed as a percentage, representing the proportion of variance in the data accounted for by the model.

Table 2 Statistical evaluation indexes

Full size table

The mentioned metrics have advantages and disadvantages, which can affect the studies. The MAPE offers the advantage of providing prediction accuracy in percentage terms, which makes it easy to interpret and independent of the data’s scale. However, it is sensitive to outliers and can produce large errors when actual values are close to zero. Similarly, RMSE is useful for measuring overall prediction accuracy, particularly by penalizing larger errors, and is expressed in the same units as the dependent variable, making it interpretable. Yet, RMSE can be disproportionately influenced by outliers and does not provide a relative error measure. MBE helps detect systematic bias in predictions but can be misleading if positive and negative errors offset each other, and it is also sensitive to outliers.

RAE allows for easy comparison across models and provides a relative measure of error, though it can be challenging to interpret without a clear baseline. R² effectively explains the variance in the dependent variable but may be misleading with nonlinear relationships or outliers, and a high R² does not always guarantee good predictive performance. Lastly, VAF is easy to interpret as a percentage-based measure of explained variance, but like R², it can be affected by outliers and does not provide insight into absolute error magnitude.

2.5 PAWN sensitivity analysis index

The PAWN index serves as a powerful tool in global sensitivity analysis, offering insights into the influence of uncertain inputs on a model. Particularly useful for models characterized by diverse inputs with varying distributions, this index quantifies the statistical disparity between output distributions. It achieves this by comparing distributions generated with randomized inputs, assuming one input remains constant, against the output distribution when no inputs are held constant [37]. The general formula guiding the calculation of the PAWN index is outlined as follows:

$$\text{PAWN}(i)={\text{KS}}_{\text{Max}}\left({F}_{Y|{X}_{i}={x}_{i}^{*}},{F}_{Y}\right)$$

(24)

In the formula, KS_max represents the Kolmogorov–Smirnov statistic, measuring the maximum distance between two cumulative distributions. ${F}_{Y|Xi=xi}$ denotes the cumulative distribution of output Y, assuming input i is fixed at ${x}_{i}^{*}$.Meanwhile, F_Y represents the cumulative distribution of output Y without constraints on the inputs. Essentially, the PAWN index serves as an indicator of the model’s sensitivity to variations in input i. A higher index for a specific input indicates a greater sensitivity of the model to that particular input.

3 Results and discussion

In this section, we present the results and analyses conducted for PV prediction. The algorithms include AdaBoost and HGBoost, initially in their singular forms and later in a hybrid configuration optimized with FOA, SBO, and PSO optimizers. The results section includes various charts and tables presented for the analysis of the models. The dataset undergo thorough preprocessing to handle missing values and anomalies and to select relevant features. Seasonal components, such as daily and yearly cycles, were considered and processed to improve model accuracy. Table 3 gives more detail about this process.

Table 3 Variables and their type, source, and dataset categorization

Full size table

Figure 2 displays a correlation matrix constructed using input and output parameters. As illustrated in the figure, the study involves ten input variables and one output variable, denoted as “Energy.” This visual representation highlights the interrelationships among the input parameters and their connections to the output variable. The matrix values, falling within the range of − 1 to + 1, convey the strength and direction of these correlations. Positive values signify a positive correlation and a direct influence, while negative values indicate a negative correlation and an inverse impact. According to the insights gleaned from Fig. 2, the parameters GHI (global horizontal irradiance) and DNI (direct normal irradiance) exhibit a substantial positive impact, showcasing the highest positive correlation with the output variable. This suggests a significant and direct influence on the output. On the other hand, parameters DHI (diffuse horizontal irradiance) and DEW point temp (dew point temperature) demonstrate an inverse effect on the output, implying that an increase in these parameters results in an inverse impact on the output variable. The remaining parameters exhibit either a neutral or marginal effect on the output.

Figure 3 provides a comprehensive visualization of statistical parameters, including the minimum, maximum, mean, and coefficient of variation (CV). These values are derived through the application of the PAWN sensitivity analysis criterion. Based on Fig. 3, parameters DNI and GHI exhibit the highest mean and median values, indicating a greater sensitivity and impact on the output. The next influential parameter is DHI, which has the highest effect on the output among the other parameters. Parameters such as hour and temperature have a relatively lower impact on the output.

Figure 4 illustrates time series data for both observational and computational results for individual algorithms, accompanied by scatter plots. The error rates for each of the AdaBoost and HGBoost algorithms are highlighted in red, and based on the figure, the outcomes in the two plots closely align. To better assess data dispersion, scatter plots are utilized, and the R² index is provided for both the training and testing sets. According to this index, the AdaBoost algorithm, with an R² value of 0.8709, outperforms the HGBoost algorithm in terms of prediction accuracy.

Additionally, the density distribution curve for error is presented for the two algorithms. According to this plot and based on the shape of the AdaBoost curve, it has a lower error distribution, appearing narrower with a higher peak. In contrast, the HGBoost curve is wider, exhibiting a broader distribution around zero with a higher error.

To conduct a comprehensive and detailed examination of the performance and accuracy of these algorithms, assessments were performed using various statistical indicators as outlined in the preceding section and subsequently compared. Based on Table 4 and considering the crucial RMSE index, it is evident that the HGBoost model has a higher error compared to AdaBoost, indicating weaker performance. Other statistical indicators also support this observation, showing that AdaBoost performs better.

Table 4 Error metrics for proposed HGBoost and AdaBoost models

Full size table

In order to enhance accuracy and streamline the comparison of algorithms, hybrid models have been developed. Three optimizers, namely FOA, SBO, and PSO, have been employed for this purpose. Figure 5 illustrates a time series chart depicting observed and computed data for the hybrid models of HGBoost and CatBoost. In this figure, the training and testing phases are distinctly delineated, and the associated error rates are highlighted. Based on Fig. 5, the results and outputs of the models are very close to each other, and all models have demonstrated good performance.

In order to conduct a comprehensive analysis and identify the most suitable algorithms for prediction, as well as to evaluate their performance, scatter plots for each of the hybrid models are provided in Fig. 6. These plots include the specification of the R² index for both training and testing sets. Based on Fig. 6, and also the R² values, it is evident that among the closely performing models, the hybrid models with HGBoost have performed slightly better. Among them, the HGBoost-SBO model exhibits the best performance with an R² value of 0.9907, while the AdaBoost-FOA model with R² = 0.9897 shows the weakest performance.

Figure 7 depicts the error distribution curves of all models, accompanied by their respective box plots, categorized by the test and train datasets. Notably, the HGBoost-SBO model stands out by exhibiting the smallest area under the error distribution curve, the most compact box area, and fewer outliers (lower standard deviation) in all subplots of the figure. This observation provides compelling evidence for the superior performance of the HGBoost-SBO model. Additionally, concerning the training dataset, the HGBoost-PSO model shows the largest area under the error distribution curve, while for the testing dataset, both the HGBoost-PSO and HGBoost-FOA models display the greatest box area and the most substantial outliers. This suggests the relatively inferior performance of these models compared to others.

Figure 8 illustrates error metric charts for hybrid models created in rectangular and circular forms. Statistical indicators, including RMSE, R², MAPE, RAE, MBE, and VAF, are included in the evaluation of models for both the test and training sets, differentiated by distinct colors. Based on Fig. 8 and considering the R² index chart, it is apparent that the AdaBoost-PSO model has been the most effective in training, while other models show similar results. Similarly, in the testing section indicated by the orange color, except for the HGBoost-SBO model, which performs exceptionally well, the other models are close to the center of the chart, suggesting comparable results. According to the RMSE index, the HGBoost-SBO model exhibits the lowest error, indicating superior performance in testing. In the training section, the AdaBoost-PSO model has the lowest error, performing better in this dataset. Other indicators also show similar results. Comprehensive details and numerical values for each indicator in the hybrid models are provided in Table 5.

Table 5 Error metrics derived from the application of AdaBoost and HGBoost hybrid models

Full size table

HGBoost-SBO is selected based on its consistent superiority across multiple key metrics, particularly in the testing phase, which is crucial for model generalization. As evident in Table 4, HGBoost-SBO achieves the lowest RMSE (30.6135) and MAPE (0.295764), which demonstrate the smallest prediction errors and the most desired percentage accuracy. In addition, it depicts the highest R² (0.990608) and VAF (99.06577%), showing it explained the most variance in the data, further reinforcing its robustness. Although the differences between models are small, HGBoost-SBO consistently performs better than the others across all key metrics and outlines reliability and strong generalization to unseen data. These factors collectively justify its selection as the best performer.

Besides, Fig. 9 depicts the cumulative runtime of hybrid models across each iteration. It is observed that AdaBoost hybrid models have a shorter runtime and show more fluctuations over time. Specifically, the AdaBoost-PSO model has the shortest runtime, approximately 0.187 s, while the hybrid model HGBoost-SBO exhibits the longest runtime at 6672.12 s. The runtime is influenced by several factors, including the processing system type, search approach, and optimization strategies, each contributing significantly to the overall runtime of these models.

Figure 10 depicts the convergence chart for hybrid models, utilizing the mean squared error (MSE) index as the convergence metric with a fixed number of iterations set at 300. The graph reveals that HGBoost models showcase the lowest MSE values. Notably, among them, the hybrid HGBoost-PSO model, recognized as superior based on various other indicators, displays the lowest MSE in this chart. Conversely, the hybrid AdaBoost-PSO model exhibits the highest MSE.

In the presented study, eight different models were employed to predict PV power output, and all models demonstrated high accuracy based on performance metrics such as R², RMSE, and VAF. The consistent high accuracy observed across multiple, diverse models suggests a reliable predictive capability for PV power output. The similarity in the outputs obtained from diverse models indicated that the findings are not model-specific but are consistent across various modeling approaches, which improves the reliability of the outputs. This convergence in model performance remarks that the results are not attributable to random variations or overfitting to specific data characteristics but instead reflect a generalizable and stable pattern in the data. While a nominal statistical hypothesis test could provide additional quantification, the strong agreement among the models provides compelling evidence of the reliability of the results. The high level of consistency across different models reinforces the validity of the outcomes in predicting PV power output employing the approaches explored.

For more emphasis on the superior function of the presented model, a comparative evaluation is carried out and the outputs are compared with existing literature’s reported results through other methods for predicting solar energy, as depicted in Table 6. The outputs demonstrated that the proposed integrated algorithm brings out more accuracies in the predictions compared with other methods.

Table 6 Comparative outputs for the presented field of study

Full size table

Solar radiation predictions are widely applied in various practical photovoltaic energy situations to optimize energy production and grid integration. For instance, utility-scale solar power plants in regions such as California and Germany rely on accurate forecasts to predict power output and manage energy storage, ensuring grid stability. In Europe, solar forecasts aid in integrating distributed solar energy into the grid, reducing the need for backup fossil fuels. Smart cities, including Masdar City, utilize these predictions to optimize energy management in microgrids, while companies like Tesla Solar use them to provide accurate energy production estimates for rooftop installations. In addition, solar radiation forecasts are crucial in remote agricultural areas for managing solar-powered irrigation systems, and even in space missions like NASA’s Mars Rover, where solar energy predictions help manage the rover’s power consumption. These examples highlight the importance of solar forecasting in making solar energy more reliable and efficient across different contexts [41, 42].

4 Conclusion

In conclusion, addressing the global challenge of transitioning to a sustainable energy supply is imperative, with solar energy emerging as a key player. This study focused on the crucial aspect of accurately predicting photovoltaic (PV) power output, essential for maximizing economic benefits and ensuring stability in modern electric power systems. The unpredictable nature of weather introduces uncertainties in PV power, emphasizing the need for precise forecasting. The comparative analysis encompassed various forecasting models, including deterministic, statistical, and hybrid approaches. The study highlighted the significance of ML models, such as AdaBoost and HGBoost, in predicting PV power. The results demonstrated that AdaBoost outperformed HGBoost in terms of accuracy, as indicated by statistical metrics including R², RMSE, and VAF. Furthermore, the study delved into the realm of hybrid models, incorporating optimization algorithms like FOA, SBO, and PSO. The hybrid models, particularly HGBoost-SBO, exhibited superior performance with high R² values, lower error rates, and smaller outliers. This underscores the effectiveness of combining different models and optimization strategies for enhanced accuracy. The findings of the study reveal several key insights:

AdaBoost consistently demonstrated superior performance compared to HGBoost in terms of accuracy metrics.
The introduction of hybrid models, especially HGBoost-SBO, significantly improved predictive capabilities, achieving high accuracy and lower error rates.
AdaBoost hybrid models exhibited shorter runtimes, emphasizing the importance of considering both accuracy and computational efficiency in model selection.

The findings from this study hold paramount significance for stakeholders in the photovoltaic (PV) sector seeking to leverage advanced ML techniques for precise PV output forecasting.

Data availability

Data will be made available upon reasonable request to the corresponding author. Interested researchers can contact the corresponding author to access the data set, subject to data-sharing agreements ensuring confidentiality and compliance with data protection regulations. No datasets were generated or analyzed during the current study.

Abbreviations

AdaBoost:: Adaptive boosting
ANN:: Artificial neural network
ANFIS:: Neuro-fuzzy inference system
ARIMA:: Autoregressive integrated moving average
FIS:: Fuzzy inference system
FOA:: Fly optimization algorithm
FR:: Flight distance
GA:: Genetic algorithm
HGBoost:: Histogram gradient boosting regressor
MAPE:: Mean absolute percentage error
MBE:: Mean bias error
ML:: Machine learning
MLFFNN:: Multilayer feed-forward neural network
MLR:: Multiple linear regression
PSO:: Particle swarm optimization
PV:: Photovoltaic
R ² :: Coefficient of determination
RAE:: Relative absolute error
RBF:: Radial basis function
RBFNN:: Radial basis function neural network
RMSE:: Root mean square error
S :: The reciprocal of dist
SAM:: System advisor model
SBO:: Satin bowerbird optimizer
STLF:: Short-term load forecasting
SVM:: Support vector machine
TFFN:: Feed-forward neural networks
VAF:: Variance accounted for

References

de Oliveira JFL, de Mattos Neto PSG, Siqueira HV et al (2023) Forecasting methods for photovoltaic energy in the scenario of battery energy storage systems: a comprehensive review. Energies (Basel) 16:6638. http://doi.org/10.3390/en16186638
Article Google Scholar
Sawin J (2011) Renewable energy policy network for the 21st century: renewables 2012 Global Status Report. REN21 Secretariat
Mayer MJ (2022) Benefits of physical and machine learning hybridization for photovoltaic power forecasting. Renew Sustain Energy Rev 168:112772
Article Google Scholar
Santos de O DS, Mattos Neto de PSG, Oliveirade JFL et al (2022) Solar irradiance forecasting using dynamic ensemble selection. Appl Sci 12:3510. http://doi.org/10.3390/app12073510
Article Google Scholar
Almasoud AH, Gandayh HM (2015) Future of solar energy in Saudi Arabia. J King Saud Univ Eng Sci 27:153–157
Google Scholar
Ennaoui A, Figgis B, Plaza DM (2016) Outdoor testing in Qatar of PV performance, reliability and safety. In: Qatar foundation annual research conference proceedings, vol 2016(1). Hamad bin Khalifa University Press (HBKU Press), p EEPP2538
Touati F, Al-Hitmi MA, Chowdhury NA et al (2016) Investigation of solar PV performance under Doha weather using a customized measurement and monitoring system. Renew Energy 89:564–577
Article Google Scholar
Ahmad N, Khandakar A, El-Tayeb A et al (2018) Novel design for thermal management of PV cells in harsh environmental conditions. Energies (Basel) 11:3231
Article Google Scholar
De Mattos Neto PSG, Firmino PRA, Siqueira H et al (2021) Neural-based ensembles for particulate matter forecasting. IEEE Access 9:14470–14490. http://doi.org/10.1109/ACCESS.2021.3050437
Article Google Scholar
Khandakar A, EH Chowdhury M, Khoda Kazi M, Benhmed K, Touati F, Al-Hitmi M, SP Gonzales A Jr (2019) Machine learning based photovoltaics (PV) power prediction using different environmental parameters of Qatar. Energies (Basel) 12:2782
Article Google Scholar
Kratochvil JA, Boyson WE, King DL (2004) Photovoltaic array performance model. Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA
Mishra SP, Dash PK (2015) Short term wind power forecasting using Chebyshev polynomial trained by ridge extreme learning machine. In: 2015 IEEE Power, Communication and Information Technology Conference (PCITC). IEEE, pp 173–177
Netsanet S, Zhang J, Zheng D et al (2018) An aggregative machine learning approach for output power prediction of wind turbines. In: 2018 IEEE Texas Power and Energy Conference (TPEC). IEEE, pp 1–6
Jawaid F, NazirJunejo K (2016) Predicting daily mean solar power using machine learning regression techniques. In: 2016 sixth international conference on innovative computing technology (INTECH). IEEE, pp 355–360
Li J, Ward JK, Tong J et al (2016) Machine learning for solar irradiance forecasting of photovoltaic system. Renew Energy 90:542–553
Article Google Scholar
Moosa A, Shabir H, Ali H et al (2018) Predicting solar radiation using machine learning techniques. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, pp 1693–1699
Kayri M, Kayri I, Gencoglu MT (2017) The performance comparison of Multiple Linear Regression, Random Forest and Artificial Neural Network by using photovoltaic and atmospheric data. In: 2017 14th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, pp 1–4
Trigo-Gonzalez M, Cortés M, Alonso-Montesinos J et al (2021) Development and comparison of PV production estimation models for mc-Si technologies in Chile and Spain. J Clean Prod 281:125360
Article Google Scholar
Theocharides S, Makrides G, Livera A et al (2020) Day-ahead photovoltaic power production forecasting methodology based on machine learning and statistical post-processing. Appl Energy 268:115023
Article Google Scholar
Sheng H, Xiao J, Cheng Y et al (2017) Short-term solar power forecasting based on weighted Gaussian process regression. IEEE Trans Industr Electron 65:300–308
Article Google Scholar
Hiyama T, Karatepe E (2010) Investigation of ANN performance for tracking the optimum points of PV module under partially shaded conditions. In: 2010 Conference Proceedings IPEC. IEEE, pp 1186–1191
Mellit A (2006) Artificial intelligence based-modeling for sizing of a stand-alone photovoltaic power system: proposition for a new model using neuro-fuzzy system (ANFIS). In: 2006 3rd International IEEE Conference Intelligent Systems. IEEE, pp 606–611
Wu Y-K, Chen C-R, Abdul Rahman H (2014) A novel hybrid model for short-term forecasting in PV power generation. Int J Photoenergy 2014:1–9
Google Scholar
Yokoyama J, Chiang H-D (2012) Short term load forecasting improved by ensemble and its variations. In: 2012 IEEE Power and Energy Society General Meeting. IEEE, pp 1–6
Beeravalli, Vijayalaxmi (2023) Energy prediction. Mendeley Data, V1. http://doi.org/10.17632/c4rn7mtfrf.1
Gupta KK, Kalita K, Ghadai RK et al (2021) Machine learning-based predictive modelling of biodiesel production—a comparative perspective. Energies (Basel) 14:1122
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Article MathSciNet Google Scholar
Dastour H, Hassan QK (2023) A machine-learning framework for modeling and predicting monthly streamflow time series. Hydrology 10:95
Article Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks. IEEE, pp 1942–1948
Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22:387–408
Article Google Scholar
Moosavi SHS, Bardsiri VK (2017) Satin bowerbird optimizer: a new optimization algorithm to optimize ANFIS for software development effort estimation. Eng Appl Artif Intell 60:1–15
Article Google Scholar
Pan W-T (2012) A new fruit fly optimization algorithm: taking the financial distress model as an example. Knowl Based Syst 26:69–74
Article Google Scholar
Shi D-Y, Lu J, Lu L-J (2012) A judge model of the impact of lane closure incident on individual vehicles on freeways based on RFID technology and FOA-GRNN method. Wuhan Ligong Daxue Xuebao (J Wuhan Univ Technol) 34:63–68
Google Scholar
Xu ZH, Wang FL, Sun DD, Wang JQ (2012) A forecast of export trades based on the FOA-RBF neural network. Math Pract Theor 42:16–21
Google Scholar
Xiao ZA (2012) Design of analog filter based on fruit fly optimization algorithm. J Hubei Univ Educ 29:26–29
Google Scholar
Khajavi H, Rastgoo A (2023) Improving the prediction of heating energy consumed at residential buildings using a combination of support vector regression and meta-heuristic algorithms. Energy 272:127069
Article Google Scholar
Pianosi F, Wagener T (2015) A simple and efficient method for global sensitivity analysis based on cumulative distribution functions. Environ Model Softw 67:1–11
Article Google Scholar
Kazem HA, Yousif J, Chaichan M (2016) Modelling of daily solar energy system prediction using support vector machine for Oman. Int J Appl Eng Res 11:10166–10172
Google Scholar
Ağbulut Ü, Gürel AE, Biçen Y (2021) Prediction of daily global solar radiation using different machine learning algorithms: evaluation and comparison. Renew Sustain Energy Rev 135:110114. http://doi.org/10.1016/j.rser.2020.110114
Article Google Scholar
Al-Dahidi S, Alrbai M, Alahmer H et al (2024) Enhancing solar photovoltaic energy production prediction using diverse machine learning models tuned with the chimp optimization algorithm. Sci Rep 14:18583. http://doi.org/10.1038/s41598-024-69544-8
Article Google Scholar
Islam MD, Kubo I, Ohadi M, Alili AA (2009) Measurement of solar energy radiation in Abu Dhabi, UAE. Appl Energy 86:511–515. http://doi.org/10.1016/j.apenergy.2008.07.012
Article Google Scholar
Ahmetzhanov B, Tazhibekova K, Shametova A, Urazbekov A (2018) Expanded implementation of solar photovoltaics: forecasting and risk assessment. Int J Energy Econ Policy 8:113–118
Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Institute of Social Innovation and Public Culture, Communication University of China, Beijing, 100000, China
Chaoyang Zhu
International Engineering Psychology Institute in the United States, Denver, 80201, USA
Chaoyang Zhu, Jinxin Deng & Qipei Du
University of Illinois at Champaign, Champaign, 61801, USA
Chaoyang Zhu
Hainan Vocational University of Science and Technology, Haikou, 570100, China
Chaoyang Zhu
Shenzhen High-level Talents Development Promotion Association, Shenzhen, 518000, China
Chaoyang Zhu & Yunxiang Zhang
CDA International Accelerator, Shenzhen, 518000, China
Chaoyang Zhu & Yunxiang Zhang
Shenzhen Research Institute, Beijing Institute of Technology, Shenzhen, 518000, China
Chaoyang Zhu
University of Wollongong, Wollongong City, 2223, Australia
Chaoyang Zhu
School of Management, Zhejiang University of Technology, Hangzhou, 310000, China
Mengxia Wang
Shandong Open University, Jinan, 250000, China
Mengxing Guo
School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, 710048, China
Wei Wei & Yunxiang Zhang

Authors

Chaoyang Zhu
View author publications
Search author on:PubMed Google Scholar
Mengxia Wang
View author publications
Search author on:PubMed Google Scholar
Mengxing Guo
View author publications
Search author on:PubMed Google Scholar
Jinxin Deng
View author publications
Search author on:PubMed Google Scholar
Qipei Du
View author publications
Search author on:PubMed Google Scholar
Wei Wei
View author publications
Search author on:PubMed Google Scholar
Yunxiang Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Chaoyang Zhu involved in conceptualization, methodology, software development, writing—original draft, review and editing, and supervision; Mengxia Wang involved in project administration, funding acquisition, writing—review and editing, supervision, and correspondence; Mengxing Guo involved in data curation, formal analysis, and writing—review and editing; Jinxin Deng involved in software development, validation, and writing—review and editing; Qipei Du involved in methodology, investigation, and writing—review and editing; Wei Wei involved in resources, visualization, and writing—review and editing; Yunxiang Zhang involved in conceptualization, validation, resources, writing—review and editing, and supervision.

Corresponding author

Correspondence to Mengxia Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Human participants and/or animals

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, C., Wang, M., Guo, M. et al. Innovative approaches to solar energy forecasting: unveiling the power of hybrid models and machine learning algorithms for photovoltaic power optimization. J Supercomput 81, 20 (2025). http://doi.org/10.1007/s11227-024-06504-z

Download citation

Accepted: 27 September 2024
Published: 17 October 2024
DOI: http://doi.org/10.1007/s11227-024-06504-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Innovative approaches to solar energy forecasting: unveiling the power of hybrid models and machine learning algorithms for photovoltaic power optimization

Abstract

Similar content being viewed by others

A next-generation hybrid energy converter empowered by machine learning: pioneering sustainable integration of photovoltaic and grid power

Advanced machine learning techniques for predicting power generation and fault detection in solar photovoltaic systems

Advanced hybrid machine learning models for enhanced electricity consumption forecasting: integrating MLP, HGBoost, and AdaBoost and hybrid machine learning

Explore related subjects

1 Introduction

1.1 Research gaps and main contributions

1.2 Paper organization

2 Methodology

2.1 Data

2.2 Machine learning methods

2.2.1 Adaptive boosting regression (AdaBoost)

2.2.2 Histogram gradient boosting regressor (HGBoost)

2.3 Optimization algorithms

2.3.1 Particle swarm optimization (PSO)

2.3.2 Satin bowerbird optimizer (SBO)

2.3.3 Fruit-fly optimization algorithm (FOA)

2.4 Performance metrics

2.5 PAWN sensitivity analysis index

3 Results and discussion

4 Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Human participants and/or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords