CLDE: a competitive learning-driven differential evolution optimization for the influence maximization problem in social networks

Chai, Baoqiang; Zhang, Ruisheng; Li, Xinyue; Tang, Jianxin

doi:10.1007/s11227-025-07175-0

CLDE: a competitive learning-driven differential evolution optimization for the influence maximization problem in social networks

Published: 05 April 2025

Volume 81, article number 700, (2025)
Cite this article

Download PDF

Access provided by China Pharmaceutical University

The Journal of Supercomputing Aims and scope Submit manuscript

CLDE: a competitive learning-driven differential evolution optimization for the influence maximization problem in social networks

Download PDF

Baoqiang Chai¹^na1,
Ruisheng Zhang¹,
Xinyue Li²^na1 &
…
Jianxin Tang²^na1

150 Accesses
Explore all metrics

Abstract

The influence maximization problem focuses on identifying a small subset of influential users in social networks that can ignite maximum influence spread. The existing meta-heuristic evolutionary optimizations for the problem struggle with the threat of premature convergence into local optima easily and lack flexibility. It highlights the need for solutions with high influence spread while showing satisfactory time efficiency in large-scale networks. To address such challenges, a competitive learning-driven differential evolution (CLDE) optimization is proposed. The algorithm constructs an evolutionary population by employing a random partitioning mechanism that divides it into an excellent subpopulation and a common subpopulation. The global exploration and local exploitation operations are adopted for each subpopulation to improve the solution diversity and performance. Furthermore, an adaptive probabilistic updating-driven local search strategy is incorporated to better accommodate the topological structure of networks and enhance the robustness of the algorithm. Extensive experiments conducted on six real social networks and three different types of synthetic networks demonstrate that the proposed CLDE achieves an average improvement of 6% in influence spread compared to the state-of-the-art baselines, as well as competitive results to the well-known greedy-based cost-efficient lazy forward algorithm.

An efficient discrete differential evolution algorithm based on community structure for influence maximization

Article 07 February 2022

Improved Evolution Algorithm that Guides the Direction of Individual Mutation for Influence Maximization in Social Networks

Discrete Genetic Learning-Enabled PSO for Influence Maximization

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The widespread use of social media platforms such as Facebook, Twitter and Instagram has led to the formation of complete industrial chains of viral marketing [1]. For instance, e-commerce companies are inclined to utilize viral marketing as a promotional strategy for their innovative products promotion [2]. This involves identifying high-quality users on the platforms and personalizing their advertisements in order to enhance the appeal of the products to the targeted followers. Such phenomena are termed as influence maximization (IM) in social network analysis, which has been proven to be a NP-hard problem [3]. Its goal is to identify a specific subset of influential users to maximize the influence spread of the product information, innovative ideas and even the authoritative opinion within the social networks. Given its practical and commercial significance, it has been employed widely in numerous real-world scenarios including viral marketing, rumors suppression, revenue maximization, opinion formation, collective decision-making, and even political movements [4, 5].

Kempe et al. [3] put forth a natural hill-climbing greedy strategy for the first time to address the IM problem. It selects one node with the most marginal return in each round by estimating the expected influence of the remaining nodes based on tens of thousands of Monte Carlo simulations. Theoretical outcomes indicate that the greedy-based method can yield a solution with an accuracy of not less than $(1 - 1/e - \epsilon )$ ($\approx $63%) of the optimal solution. However, the high time complexity resulted by the extensive Monte Carlo simulations limits its applications in large-scale networks. In contrast, centrality-based methods [6,7,8], which return the precise rankings of the node based on its position in the network topology, are efficient in terms of computational cost. However, such measurements fail to provide performance guarantees in influence spread because they consider solely the local or global structural features of the network or node attributes, while neglecting the overlapping influence. Over the last few years, meta-heuristic algorithms [9,10,11] have been developed to explore new efficient solutions by replacing the time-consuming Monte Carlo mechanisms with formulated fitness evaluating functions. Such algorithms employ discrete evolutionary mechanisms to identify the optimal seed set by optimizing the fitness functions in a more promising way. Yet the existing meta-heuristics tend to overly focus on a single metric such as solution accuracy while neglecting others like efficiency, especially in dealing with large-scale networks [12]. Therefore, balancing the solution quality and computational cost remains a challenge in designing meta-heuristic algorithms for the IM problem.

As one of the seminal meta-heuristics, differential evolution (DE) has demonstrated its significant advantages in solving the IM problem due to its strong global search capability and high robustness [13]. Its fundamental framework is to optimize the fitness of the expected influence spread of the candidate node sets iteratively through population evolution strategies based on mutation, crossover and selection operations. However, the performance of the algorithm is affected by its myopic evolutionary rules, and the high computational cost of fitness evaluation further limits its efficiency in large-scale networks. To address the problems, a novel competitive learning-driven differential evolution (CLDE) is proposed. Specifically, a competitive mechanism divides the individuals in the population into pairs randomly, in which individuals compete based on fitness value, with the winners forming an excellent subpopulation and the losers forming a common subpopulation. Such strategy is conceived to enhance the diversity of the solutions. It then implements the global exploration and local exploitation separately on the two subpopulations to speed up the convergence of the algorithm. Meanwhile, an adaptive probabilistic updating-driven local search strategy is designed to guide the evolution of the population to avoid being influenced by the network structure. The primary contributions of this paper can be summarized as follows:

To the best of our knowledge, the competitive mechanism of dividing the population into an excellent subpopulation and a common subpopulation by random pairing is employed in the discrete evolution optimization for the first time to address the IM problem.
A global exploration based on local-influence-descending and a local exploitation within node neighborhoods are conceived for different subpopulations to accelerate the convergence of the proposed CLDE.
An adaptive local search strategy is designed based on the updating probability for each individual in the common subpopulation to enhance the stability and robustness of the algorithm.

The remainder of the paper is organized into several sections. Section 2 gives a brief review of the related work on IM problem. Section 3 provides the preliminary information. The details of the proposed CLDE are presented in Sect. 4. Section 5 is devoted to the design and analysis of experiments. Section 6 discusses the advantages and the potential weak spots of the CLDE, which presents the future work of our work. Section 7 concludes this work.

2 Related work

Existing methods for the IM problem can be categorized into four primary groups: greedy-based, topological-based, machine learning-based, and meta-heuristic-based methods [14, 15]. Greedy-based algorithms require a substantial number of Monte Carlo simulations to evaluate the marginal gain of each node, so high computational costs and low efficiency are two typical challenges for the greedy algorithms [3, 16]. Compared to the simulation-based greedy algorithms [17], sampling-based greedy algorithms [18, 19] are more efficient; however, the time complexity of sampling-based greedy algorithms remains a concern. Topological-based methods are a class of metrics that utilize the topological structure of networks directly to assess the importance of the nodes and guide the search or optimization processes [20, 21]. Once the network topology is given, the seed node set based on topology-based methods is determined. Although such methods are efficient, they lack theoretical guarantees for their solutions. Since users in real-world networks tend to cluster, it also leads to the issue of overlapping influence inevitably [22]. In addition, machine learning-based models have demonstrated their high efficiency when dealing with large-scale networks [23, 24]. However, the high dependence on datasets leads to a decline in their performance when datasets are insufficient or of low quality. Meanwhile, such approaches tend to be often very time-consuming during the training process, which increases the computational costs [25].

As a new branch of the artificial intelligence, meta-heuristic algorithms do not rely on the mathematical model of the problem but instead seek optimal or near-optimal solutions by simulating the evolutionary principles observed in natural, social or physical phenomena [26]. In combinatorial optimization, meta-heuristic algorithms such as particle swarm optimization, genetic algorithm, simulated annealing and tabu search have been widely applied to the problems such as scheduling, routing and facility layout [27,28,29]. To address the NP-hard IM problem, Jiang et al. [30] made the first attempt by developing a metric called expected diffusion value to estimate the influence spread of candidate nodes. It was the first meta-heuristic algorithm applied to solve the IM problem, demonstrating a speed improvement of 2$\sim $3 orders of magnitude compared to traditional greedy algorithms. The meta-heuristic algorithm can find the optimal solution quickly due to the use of the optimization function. Chawla and Cheney [31] proposed a neighbor-hop mutation strategy for the genetic algorithm to enhance the optimization of influence maximization by incorporating network topology into the search process. Furthermore, Cunegatti et al. [32] and Konotopska and Iacca [33] explored graph-aware evolutionary algorithms that leverage topological information to ensure better propagation. Singh et al. [34] proposed a learning automata based discrete particle swarm optimization (LAPSO-IM) to address the issue of premature convergence of the original work on the IM problem [35]. Nevertheless, due to the need for the recalculation of probability and the reassessment of actions in the reinforcement process, it results in poor efficiency easily.

Tang et al. [36] investigated seed activation strategies for the adaptive influence maximization problem and proposed a novel bio-inspired meta-heuristic called discrete scheduled particle swarm optimization to address the challenge of determining optimal seeding times during the influence spreading. Considering the unreliable inherent to transmission in social network connections, Wang et al. [10] proposed an enhanced discrete moth-flame optimization algorithm and developed a new influence estimation model based on neighboring nodes to assess the expected spread influence of node sets effectively. While the algorithm demonstrates high influence spread and robustness, it incurs a significant time cost inevitably. To avoid being trapped into local optima in solving the IM problem, Khatri et al. [37] proposed a discrete Harris hawks optimization algorithm. For networks with community structures, the algorithm introduces a novel neighborhood scouting strategy and an adaptive random population initialization method to enhance the performance. Recently, Wang et al. [38] proposed a discrete particle swarm optimization to solve the IM problem in multilayer networks. By designing novel position updating rules and velocity updating strategies, the authors addressed the complex information propagation in multilayer networks in a more efficient way.

As an effective meta-heuristic, differential evolution operates on the differences among individuals in a population and employs mechanisms including the differential mutation, crossover and selection to generate new offsprings to achieve global optimization in the search space. The DE algorithm has achieved significant results in various fields, including large-scale optimization [39, 40] and multi-objective optimization [41, 42]. In addressing the IM problem, Qiu et al. [13] proposed a novel search strategy named local influence descending and an expected diffusion impact value function to save the running time and improve the influence spread. It symbolizes the first application of the DE algorithm to the IM problem. As for the IM problem, since the probability of most nodes in the network being included in the optimal seed set is typically zero, Biswas et al. [43] proposed a two-stage VIKOR-assisted multi-operator differential evolution algorithm. To enhance the algorithm’s capability of escaping from local optima, Zhu et al. [44] developed a phased evaluation-enhanced approach that integrates the random range search mechanism named RandRDE to enhance the diversity of solutions. Furthermore, an accelerated convergence strategy is utilized to facilitate an efficient search for optimal solutions.

Based on the existing research, it is evident that meta-heuristic algorithms demonstrate promising prospects in addressing the IM problem. However, several key challenges remain to be further explored: (1) researchers need to make a trade-off between the influence spread and computation time; (2) the scalability issues in large-scale networks need to be addressed [45]; (3) developing effective methods to model information propagation in multilayer networks; (4) improving the adaptability of parameter settings. In this context, competitive learning mechanisms offer new insights for the development of novel effective and efficient meta-heuristics. Competitive learning was first introduced by Cheng and Jin [46] to solve the continuous optimization, where a competition mechanism among solutions was implemented to promote population diversity. Du et al. [47] further combined competitive and cooperative learning to improve the performance on complex optimization problems. This mechanism can maintain the population diversity and facilitates the convergence in a better way. It demonstrates promising superiority in solving the combinatorial optimization problems. Therefore, incorporating competitive learning mechanisms into the meta-heuristic algorithms for the IM problem provides new avenues to enhance algorithm performance.

3 Preliminaries

3.1 Problem statement

A given social network is typically defined as a graph $G = (V, E)$, where V represents the nodes and E represents the connections between the nodes. The key objective of the IM problem is to identify an influential node set of size k as the initial seed set S that can maximize the influence spread through the propagation capability of the network. Specifically, it can be formulated as Eq. (1).

$$\begin{aligned} \left\{ \begin{aligned} S^{*} = \arg \max \sigma (S) \\ subject \hspace{5.0pt}to \hspace{5.0pt}|S| = k \end{aligned} \right. \end{aligned}$$

(1)

where S is the set with k candidate nodes. $\sigma (S)$ is the expected number of nodes activated by the set of candidate nodes S. $S^*$ is the optimal seed set with the maximum influence spread.

3.2 Independent cascade model

In the IC model, at any given timestamp, each node is in one of two states: inactive or active. According to the rules, when node u is activated at time $t-1$, it has a single opportunity to activate its inactive neighbor v under propagation probability $p_{uv}$ at time t. Regardless of whether node v is activated successfully by node u, node u will not have chance to activate other nodes in following steps. If node v is activated successfully, it will have an opportunity to activate its direct inactive neighbors at time $t+1$. The diffusion process terminates if no new nodes are activated at time T.

3.3 Expected diffusion estimation function

To replace the expensive Monte Carlo simulation, Qiu et al. [13] proposed a novel objective function called expected diffusion impact value (EDIV) function, which is designed to assess the potential influence spread of a set of nodes. This function integrates the local influence from the local influence value (LFV) function and incorporates the diffusion value in two hops. The formula for calculating the probability of activating a node by the candidate seed set is given in Eq. (2).

$$\begin{aligned} PS(u)=1-\prod _{v \in Pe(u) \cap S}(1-p_{vu}) \end{aligned}$$

(2)

where v and u represent two nodes, PS(u) represents the probability of node u being activated by node set S, Pe(u) denotes the one-hop neighbors of node u, $p_{vu}$ denotes the activation probability from v to u.

The objective function is comprised of the original influence: the one-hop influence and the two-hop influence. The original influence can be defined as the initial advantage offered by the selected nodes, which is represented by the size of the node set. Then, the one-hop influence is determined by the amalgamation of the diffusion value of the nodes’ direct neighbors with their respective local influence. LFV function is evaluated according to Eq. (3).

$$\begin{aligned} LFV(u) = 1+\sum _{v \in N_u^ {(1)} \backslash {\{u\}}}\left(p_{uv}+p_{uv}\cdot \sum _{s \in N_v^ {(1)} \backslash {\{u,v\}}}p_{vs}\right) \end{aligned}$$

(3)

where LFV(u) represents the local influence of node u, $N_u^{(1)}$ and $N_v^{(1)}$ denote the first-order neighbors of nodes u and v, respectively. The variables $p_{uv}$ and $p_{vs}$ correspond to the activation probabilities from node u to node v and from node v to node s, respectively. The beneficial influence that the set S exerts on its direct neighboring nodes can be evaluated by Eq. (4).

$$\begin{aligned} f_1(S)=\sum _{u \in N_S^{(1)} \backslash S}PS(u) \cdot LFV(u) \end{aligned}$$

(4)

where $N_S^{(1)}$ is the set of direct neighbor nodes of the set S.

When calculating the two-hop influence, the method takes into account the complicated interactions among the two-hop nodes, the activation status of the one-hop nodes and the diffusion capability of the two-hop nodes. It can accurately estimate the two-hop influence based on Eq. (5).

$$\begin{aligned} f_2(S)=\sum _{u \in N_S^{(2)}-N_S^{(1)}-S}\left[1-\prod _{v \in Pe(u) \cap N_S^{(1)}}(1-p_{vu} \cdot PS(v))\right] \cdot LFV(u) \end{aligned}$$

(5)

where $f_2(S)$ represents the expected influence spread of S on its two-hop nodes, $N_S^{(2)}$ represents the two-hop neighbor nodes of S.

The CLDE utilizes the three aforementioned computational components to estimate the influence spread. It employs three different parameters to depict the importance of the above three computational parts. The EDIV value is computed by Eq. (6).

$$\begin{aligned} EDIV(S)=\alpha \cdot |S|+\beta \cdot f_1(S)+\gamma \cdot f_2(S) \end{aligned}$$

(6)

where |S| represents the size of the node set S, $f_1(S)$ denotes the one-hop influence of S, $f_2(S)$ represents the two-hop influence of S. $\alpha $, $\beta $ and $\gamma $ denote the relative importance of the three parts in propagating the influence of S.

4 Proposed method

4.1 The framework of CLDE

Figure 1 shows the flowchart of the proposed CLDE for the IM problem. It comprises four main components: parameter initialization, population partitioning, subpopulation evolution, individuals recombination and selection.

The details are elaborated as follows:

(1)
At the initial stage, the algorithm generates the first generation of individuals to initialize the DE population. Each individual represents a candidate node set that symbolizes a potential solution for the influence maximization.
(2)
The population partitioning operation employs a competitive mechanism to divide the population into an excellent subpopulation and a common subpopulation in accordance with the fitness value of each individual in the current generation.
(3)
The excellent subpopulation evolves through global exploration based on the rules of differential mutation and crossover operations. The differential mutation alters specific number of individuals based on the differential set to generate the mutated individuals. The crossover process involves the mutated individuals with the original individuals of the current generation in accordance with the specified crossover probability to get a crossover subpopulation.
(4)
For each individual in the common subpopulation, it searches in the entire neighborhood or the neighborhood of a specific individual in a local exploitation way according to different local search probability, and replaces the suboptimal individuals with the ones with larger degree.
(5)
Merge the excellent subpopulation after the crossover and mutation with the common subpopulation obtained by the local search to form a recombined population for the selection process.
(6)
After the recombination operation, the change in the average fitness value of the population is calculated. If the change is less than or equal to zero, the evolutionary algorithm is considered to have converged, and the internal loop is terminated. Otherwise, return to steps (3) and (4).
(7)
The selection process compares the fitness value of the individuals from the previous generation with that of the recombined individuals. Individuals with higher fitness value are selected into the next generation.
(8)
Check the termination condition: whether the iteration counter has reached the predefined maximum iterations $g_{\max }$. If satisfied, terminate the evolution and output the optimal solution; otherwise, return to step (2).

Algorithm 1 presents the pseudocode of the CLDE for the IM problem.

4.2 Population initialization

The CLDE first utilizes the LFV mechanism, defined in Eq. (3), to assess the local influence of each node in the network and sorts the nodes in descending order based on their local influence. The algorithm selects one node every k nodes and adds it into the current candidate node set. Once the requisite size of the current individual has been reached, the selection process ceases and the next individual begins to generate. Algorithm 2 presents the pseudocode of the aforementioned process.

4.3 Population partition

To balance between the global exploration and local exploitation, the proposed CLDE employs a competition mechanism that randomly partitions the population into subpopulations with the same size. As described in [46, 47], in each iteration, individuals in the population $X=\{X_1,X_2\ldots X_{pop}\}$ are randomly divided into pop/2 pairs (assuming the population size pop is even), and the competition is held between the two individuals in each pair. The individual with better fitness is considered as the winner of the competition. For a population of size pop, competitions occur among pop/2 pairs, with all pop individuals in one single round of competitions. In CLDE, all the winning individuals form the excellent subpopulation $XE=\{XE_{1},XE_{2}\ldots XE_{pop/2}\}$, while the remaining ones form the common subpopulation $XC=\{XC_{1},XC_{2}\ldots XC_{pop/2}\}$. The mechanism helps maintain the diversity of the population, as even individuals with lower fitness have the chances to be assigned to the excellent subpopulation through random pairing, thus preventing the population from homogenization and reducing the risk of premature convergence.

4.4 Excellent subpopulation evolution

4.4.1 Local-influence-descending search strategy

To accelerate the convergence speed of the CLDE, this paper employs the local-influence-descending (LID) search strategy [13] for the mutation and crossover operations of the population. The search strategy aims at selecting a set of nodes with relatively high influence. Then it randomly selects alternative nodes from the set to update the population. Selecting nodes with largest LFV can obtain the local optimal solutions, whereas local optimality does not guarantee global optimality. The selection range of nodes at different positions should be flexible and governed by a dynamic upper bound. Consequently, taking into account the dynamic nature of this upper bound, it should be defined as a function of the index of node i. Consequently, the rule for the up-bound is defined by Eq. (7).

$$\begin{aligned} up \_ bound=\eta \cdot i+\theta \end{aligned}$$

(7)

where $up \_ bound$ denotes the upper bound of the selection range for the node. $\eta $ and $\theta $ are two control parameters. The specific process of LID search strategy is introduced in Algorithm 3.

4.4.2 Differential mutation

Specifically, the differential mutation utilizes the differences between individuals to generate mutated individuals. For the discrete IM problem, $XE_{r1}$ is generated from the excellent subpopulation as the baseline individual, with its nodes to be replaced by nodes from the set of nodes involved in the differential operation. The generating rule is defined in Eq. (8).

$$\begin{aligned} XE_{r1}=XE.get(random(pop/2)) \end{aligned}$$

(8)

$XE_{r2}$ and $XE_{r3}$ are used to generate the differential set. To prevent individuals in the population from converging to the same solution, the algorithm establishes a more rational differential strategy. The algorithm selects two distinct individuals randomly from the excellent subpopulation to serve as $XE_{r2}$ and $XE_{r3}$. The rules are described in Eq. (9).

$$\begin{aligned} \left\{ \begin{aligned} XE_{r2}&=XE.get(random(pop/2))\\ XE_{r3}&=XE.get(random(pop/2))\\&\textrm{subject} \hspace{5.0pt}to \hspace{5.0pt}XE_{r2} \ne XE_{r3} \end{aligned} \right. \end{aligned}$$

(9)

For each individual $XE_{r1}$, the algorithm replaces a certain number of nodes with the nodes in the differential set of $XE_{r2}$ and $XE_{r3}$. The number is determined by both the zoom factor F and the size of the differential set. The specific rule of the number is defined in Eq. (10).

$$\begin{aligned} N = F \cdot |XE_{r2}-XE_{r3}| \end{aligned}$$

(10)

where N represents the number of nodes to be replaced, "−" signifies the differential operation.

Furthermore, the node to be replaced is the one with less influence in the individual. Therefore, the CLDE initially employs LFV to assess the influence of each node within the present $XE_{r1}$. Subsequently, it replaces the node with the smallest influence. The specific rule for generating the index is defined in Eq. (11).

$$\begin{aligned} j=\arg \min [LFV(ME_{i,j})] \end{aligned}$$

(11)

where $ME_{i,j}$ denotes the jth ($j=1,\ldots ,k$) node to be replaced in the mutated individual $ME_{i}$ ($i=1,\ldots ,pop/2$), which is generated from the ith $XE_{r1}$.

If $ME_{i}$ is the same as the differential set of $XE_{r2}$ and $XE_{r3}$, the CLDE selects a unique node from the set obtained by the LID search strategy to replace $ME_{i,j}$. This approach ensures the continuity of the evolution while enhancing the diversity of the population. Conversely, the algorithm replaces $ME_{i,j}$ with a randomly selected node from the differential set of $XE_{r2}$ and $XE_{r3}$. The updating rules are described in Eq. (12).

$$\begin{aligned} ME_{i,j}=\left\{ \begin{aligned}&uniq(LID(G,k)),\hspace{5.0pt}if \hspace{5.0pt}ME_{i}=XE_{r2}-XE_{r3}\\&(XE_{r2}-XE_{r3}).get(random(|XE_{r2}-XE_{r3}|)),\hspace{5.0pt}otherwise \end{aligned} \right. \end{aligned}$$

(12)

Figure 2 shows an illustration of the differential mutation process. Assuming that k is 5, F = 0.6, $XE_{r1}$ = {14, 20, 7, 5, 13}, $XE_{r2}$ = {4, 3, 50, 35, 26} and $XE_{r3}$ = {35, 50, 55, 26, 3}. Firstly, the CLDE implements the differential operation on $XE_{r2}$ and $XE_{r3}$, then the difference set can be obtained. Then, parameters N and j are calculated according to Eqs. (10) and (11), respectively. Finally, it replaces the nodes at the specified indices in $XE_{r1}$ with the nodes from the differential set. After the above-mentioned process, a mutated individual $ME_{i}$ is obtained.

The specific details of the mutation are explained in Algorithm 4.

4.4.3 Crossover operation

In the CLDE, after the mutation operation, a set of targeted nodes is selected to produce the crossover individual from each original individual and its corresponding mutated individual in the current generation based on the crossover probability cr. The rules are given in Eq. (13).

$$\begin{aligned} CE_{i,j} = \left\{ \begin{aligned} ME_{i,j}&,\hspace{5.0pt}if \hspace{5.0pt}random< cr \hspace{5.0pt}and \hspace{5.0pt}ME_{i,j} \notin CE_{i} \\ uniq(LID(G,k))&,\hspace{5.0pt}if\hspace{5.0pt}random< cr \hspace{5.0pt}and \hspace{5.0pt}ME_{i,j} \in CE_{i} \hspace{5.0pt}and\hspace{5.0pt}XE_{i,j} \in CE_{i} \\ XE_{i,j}&,\hspace{5.0pt}if\hspace{5.0pt}random < cr \hspace{5.0pt}and \hspace{5.0pt}ME_{i,j} \in CE_{i} \hspace{5.0pt}and \hspace{5.0pt}XE_{i,j} \notin CE_{i} \\ XE_{i,j}&,\hspace{5.0pt}if\hspace{5.0pt}random \ge cr \hspace{5.0pt}and\hspace{5.0pt}XE_{i,j} \notin CE_{i} \\ uniq(LID(G,k))&,\hspace{5.0pt}if\hspace{5.0pt}random \ge cr \hspace{5.0pt}and\hspace{5.0pt}ME_{i,j} \in CE_{i} \hspace{5.0pt}and\hspace{5.0pt}XE_{i,j} \in CE_{i} \\ ME_{i,j}&, \hspace{5.0pt}if\hspace{5.0pt}random \ge cr \hspace{5.0pt}and\hspace{5.0pt}XE_{i,j} \in CE_{i} \hspace{5.0pt}and\hspace{5.0pt}ME_{i,j} \notin CE_{i} \end{aligned} \right. \end{aligned}$$

(13)

where $CE_{i}$, $CE_{i,j}$, $XE_{i,j}$ and $ME_{i,j}$ represent the crossover individual, the node of the crossover individual $CE_{i}$, the node of the original individual $XE_{i}$, the node of the mutated individual $ME_{i}$, respectively. cr denotes the predetermined crossover probability in the excellent subpopulation. Figure 3 shows an illustration of the crossover operation. Assuming that k is 5, $cr = 0.4. 0.32, 0.54, 0.76, 0.2$ and 0.46 represent the random numbers. During this process, if the random number is larger than cr, the node $CE_{ij}$ is filled with the node $XE_{ij}$. Otherwise, the node is filled with the node $ME_{ij}$.

4.5 Common subpopulation evolution

For the evolution of the common subpopulation, an adaptive probabilistic updating-driven local search strategy is developed to enable further exploitation in a specific local region to improve the quality and enhance the diversity of solutions. Based on the local search probability and random numbers, the range for the solution search is selected. The formula for local search probability is defined in Eq. (14).

$$\begin{aligned} LSp_{i+1}= \left\{ \begin{aligned} (1-ar) \cdot LSp_i&, \hspace{5.0pt}if \hspace{5.0pt}EDIV(XC_{i}) < 2*fitness \_ sum/pop \\ bp+(1-bp) \cdot&LSp_i,\hspace{5.0pt}otherwise \end{aligned} \right. \end{aligned}$$

(14)

where parameters ar and bp represent the reward and penalty between 0 and 1, respectively. The variable $LSp_i$ denotes the local search probability for the ith individual. $EDIV(XC_{i})$ represents the EDIV of the ith individual in the common subpopulation. The variable $fitness \_ sum$ refers to the total EDIV of all individuals in the common subpopulation. The updating formula for the common subpopulation is defined in Eq. (15).

$$\begin{aligned} v^*= \left\{ \begin{aligned}&\arg \max _{v \in V \backslash S} d(v), \hspace{5.0pt}if \hspace{5.0pt}m < LSp_i\\&\arg \max _{v \in N(XC_{i,j})} d(v), \hspace{5.0pt}otherwise \end{aligned} \right. \end{aligned}$$

(15)

where $v^*$ represents the node with the maximum degree, m is a random variable that is drawn from the interval [0, 1]. $N(XC_{i,j})$ represents the neighbors of the jth node of the ith individual in the common subpopulation. The degree of a node v is denoted by d(v). The details of the evolution of common population are explained in Algorithm 5.

4.6 Selection operation

In the final stage, the EDIV is utilized to assess the quality of the individuals and update the population by selecting better individuals. The selection operation compares $X_i$ with $P_i$, and selects the one with higher EDIV value into the next generation for further evolution, where X represents the original population of the current generation and P is the population formed by the recombination of the excellent subpopulation after the crossover and differential mutation (denoted as CE) with the common subpopulation after local search (denoted as PC). Specifically, the formula for the selection operation is defined in Eq. (16).

$$\begin{aligned} X_i=\arg \max \{EDIV(X_i),EDIV(P_i)\} \end{aligned}$$

(16)

4.7 Complexity analysis

The complexity of CLDE is determined by three main factors: pop, k, and F. First, the complexity of the initialization process in Algorithm 2 is $O(pop \cdot k)$. The process of dividing the population using a competitive mechanism needs at most pop/2 operations. Once the iteration starts, the CLDE conducts five components: differential mutation of excellent subpopulation, crossover of excellent subpopulation, local search of common subpopulation, population reorganization and population selection. Due to the size of the differential set is variable, the complexity is unclear for the differential mutation process in Algorithm 4. The size of the differential set will decrease as the iteration proceeds. Therefore, this algorithm considers only the worst-case time complexity of $O(pop \cdot F \cdot k)$. For the excellent subpopulation crossover process, it needs $pop/2 \cdot k$ operations. For the process of common subpopulation local search of Algorithm 5, it needs $pop/2 \cdot k \cdot {\hat{D}}$ operations, where ${\hat{D}}$ denotes the maximum degree of the graph. The time complexity of population reorganization is O(1). For the selection process, the complexity is O(pop). Moreover, the initialization includes LFV, while the common population local search and selection process separately includes EDIV computation with a complexity of $O(n \cdot {\hat{D}})$ and $O(k \cdot {\hat{D}})$. In summary, the complexity of time is $O(pop \cdot k^2 \cdot {\hat{D}}^2)$.

5 Experiments and analysis

5.1 Datasets and benchmarks

This paper evaluates the performance of the proposed CLDE based on the following considerations across six real-world social networks^{Footnote 1} and three synthetic networks: network density and network type. Specifically, Ca-netscience^{Footnote 2} describes a network of collaborative relationships among scientists, where the nodes represent scientists and the edges represent collaboration between scientists. Email-univ $^{2}$ describes interactions among members via email exchanges. Blog [48] is a network of communication relationships between bloggers on the MSN website. PGPgiantcompo $^{2}$ is an unweighted and undirected graph network that describes the interactions between users who exchange information using the pretty good privacy algorithm. Ca-HepTh^{Footnote 3} is from the "High Energy Physics—Theory" section. The description of Feather-deezer-social $^{3}$ pertains to the network of mutual connections among Deezer users from European countries. The synthetic small-world (Watts-Strogatz, WS) network, random (Erdös-Rényi, ER) network and scale-free (Barabási-Albert, BA) network were also generated to conduct experiments. Table 1 provides the detailed information, where |V|, |E|, $d_{ave}$ and $c_{ave}$ represent the number of vertices, the number of edges, average degree and average clustering coefficient of the networks, respectively.

Table 1 Statistical information of the three synthetic networks and the six real-world networks

Full size table

All the procedures were encoded in Python and executed on an $\hbox {Intel}^\circledR $ $\hbox {Xeon}^\circledR $ 5218R CPU @ 2.10 GHz with 64 GB memory in a Windows system. To evaluate the performance of the proposed CLDE, it is compared with the state-of-the-art algorithms.

CELF [16]: presented in 2007, it is a classic greedy-based algorithm, which selects the node with the highest marginal gain into the seed set in each iteration.
LIDDE [13]: put forward in 2021, it employs a differential evolution that utilizes the EDIV evaluation function to find the optimal seed set.
LDD [49]: introduced in 2022, it is a heuristic algorithm for identifying influential nodes based on centrality metric.
LAPSO-IM [34]: developed in 2019, it integrates a learning automata with discrete particle swarm optimization, with modifications made to the original evolutionary mechanisms.
TS-VA-MODE [43]: developed in 2022, it employs a two-stage optimization process and incorporates the multi-operator differential evolution along with an improved differential evolution featuring multiple search operators.
PHEE [44]: proposed in 2024, it utilizes two different strategies to search optimal solutions to improve the quality of the solutions for the IM problem.

5.2 Parameter settings

To balance both the influence spread and efficiency of the proposed algorithm, the orthogonal experimental test [50] was employed to determine the optimal parameter settings for the proposed CLDE. The orthogonal experimental test provides comprehensive information with fewer experimental trials. This is achieved by systematically selecting experimental combinations to reduce unnecessary repetitive experiments. It was observed that eight parameters, including population size pop, the zoom factor F, the crossover probability cr, the initial local search probability $LSp_0$, the reward parameter ar, the penalty parameter bp, two control parameters $\eta $ and $\theta $ impact the performance of the CLDE under different settings. Therefore, this paper employs the $L_{32}$ orthogonal array method, as shown in Table 2, which displays the parameter levels and their combinations, to configure the optima parameter settings. For the parameter setting experiments, this paper sets $g_{max}$ to 50, k to 60 and p to 0.05. It sets the number of Monte Carlo simulations r to 1000. $\alpha $, $\beta $ and $\gamma $ all take 1.

Each combination runs 50 times to obtain the average results. The bolded ones in Table 2 are the optimal value under the current network. Configuration number 9 is optimal in networks Email-univ, Feather-deezer-social, and BA, and number 30 is optimal in the networks Ca-netscience, Blog, PGPgiantcompo, Ca-HepTh, feather-deezer-social and WS, and number 31 is optimal only in the network ER. From Table 2, the results indicate that the proposed CLDE can obtain the optimal value in most cases when $F=0.6$, $cr=0.4$ and $LSp_0=0.6$. And when the population size pop, the reward parameter ar, the penalty parameter bp, two control parameters $\eta $ and $\theta $ are larger, the influence spread obtained is relatively larger, but they increase the time complexity of the algorithm concurrently. To achieve a balance between the efficiency and effectiveness, the parameter settings in number 30 was chosen as the optimal parameter configuration for the following experiments.

Table 2 The influence spread of the orthogonal experimental test on the nine datasets at $k=60$

Full size table

5.3 Performance experiments

5.3.1 Comparison on the influence spread

To validate the influence spread of the CLDE across different networks, comparison experiments with the six state-of-the-art algorithms were conducted on the six real-world networks and three synthetic networks. CELF and LDD require no parameter settings, while the parameter settings for the other five algorithms are shown in Table 3. The experimental parameter settings of the comparison algorithms in Table 3 follow the parameter configurations in the original literature. The comparison results of the influence spread under the IC model under propagation probability p = 0.05 and 0.1 are illustrated in Figs. 4, 5, 6 and 7.

Table 3 Parameter settings

Full size table

In the four figures, the x-axis represents the number of seed nodes, while the y-axis indicates the influence spread. It is evident that the influence spread increases gradually with the enlargement of the seed set. Specifically, as illustrated in Fig. 4, the CLDE outperforms the other six baselines on small-scale networks such as Ca-netscience, Email-univ, and Blog. However, in the other three real-world networks, its performance is only surpassed by the greedy-based CELF. Taking Fig. 4e as an example, when k = 60, the performance of the proposed CLDE surpasses that of LIDDE, PHEE, TS-VA-MODE, LAPSO-IM, and LDD by 6.9%, 1.1%, 42.3%, 1.1% and 0.8%, respectively. In Fig. 5, the influence spread of CLDE outperforms those of the other six baselines in most cases on the synthetic networks.

In Fig. 6, the influence spread of the CLDE is slightly inferior to that of CELF on the six real-world networks, while it outperforms the other five benchmarks when the propagation probability $p = 0.1$. As shown in Fig. 7, the CLDE returned larger influence spread in the random network ER and the small-world network WS. In the scale-free synthetic network BA, the performance of the CLDE is slightly inferior to CELF, while it outperforms the other five algorithms. Since the CELF is a greedy-based approach that selects seed nodes one by one, its performance is superior to the performance of the proposed CLDE due to it can estimate the influence spread in a more accurately way based on the Monte Carlo simulation. Overall, the proposed CLDE achieves comparable results to CELF and demonstrates superior performance compared to the other five algorithms in most cases. As demonstrated in the influence diffusion comparisons in Figs. 4, 5, 6 and 7, CLDE maintains superior stability across networks of varying scales compared to other algorithms. It can be primarily attributed to its competitive mechanism with random population partitioning strategy, which prevents it from converging to local optima effectively.

5.3.2 Comparison on efficiency

The running time of the CLDE was compared with the under six algorithms across the six real-world networks and three synthetic networks, with $p = 0.05$ and 0.1, and k was chosen as 60. The efficiency comparison results of the seven algorithms are shown in Figs. 8 and 9. Furthermore, to clearly illustrate the comparison results, the logarithmic value of the running time was used, instead of directly plotting the running time.

As can be seen from Figs. 8 and 9, the running time of the proposed CLDE in the six real-world and three synthetic networks is significantly lower than that of the PHEE, TS-VA-MODE, CELF, LAPSO-IM and LIDDE. Although the proposed CLDE integrates more operations than the LIDDE, the running times of CLDE and LIDDE vary across different networks, with each algorithm excelling in different scenarios. Such results are promising, as the influence spread is better than that of most baselines.

5.4 Performance analysis of seed sets

5.4.1 Comparison on the convergence speed

The convergence of CLDE is closely related to its exploration and exploitation capabilities within the search space. Theoretically, CLDE enhances the population diversity by randomly dividing the population into excellent subpopulation and common subpopulation to reduce the risk of premature convergence. The algorithm employs global exploration for extensive search, and local exploitation to further refine the solutions, with the aim of accelerating the convergence speed in a more efficiency way. The size of the seed set was set as $k = 60$ for this experiment. Figure 10 illustrates the comparisons of the convergence speeds of CLDE, LIDDE, PHEE, LAPSO-IM and TS-VA-MODE on the six real-world networks and three synthetic networks under propagation probability $p = 0.05$.

As seen in the plots, firstly, the experimental results indicate that the algorithm demonstrates superior exploration and exploitation capabilities during the optimization process, enabling it to approach to the global optimum more rapidly. Furthermore, it is evident from the figures that CLDE converges significantly faster than LIDDE while demonstrating its superior performance in optimizing the EDIV function. Compared with the TS-VA-MODE, LAPSO-IM and PHEE, CLDE converges slower in most scenarios but with superior fitness value and the final influence spread. Therefore, it is important to make a trade-off between the solution quality and the convergence speed.

5.4.2 Average shortest distance among seed nodes

To further analyze the reasons why the proposed CLDE can obtain a more influential seed set, the average shortest distance [51] of seed nodes is introduced to show the performance of the CLDE. This is expressed mathematically in Eq. (17).

$$\begin{aligned} L(S) = \frac{2}{|S|(|S|-1)} \sum _{v,u \in S} d_{uv} \end{aligned}$$

(17)

where L(S) represents the average shortest distance between nodes in the seed set S, |S| denotes the number of seed nodes, $d_{uv}$ indicates the average shortest distance between nodes u and v.

The experimental results in the nine networks at fixed $k = 30$ and $k = 60$ under the propagation probability $p=0.05$ are shown in Fig. 11. As shown in Fig. 11a, b, g, h and i, the proposed CLDE shows higher average shortest distances than that of the classical CELF and other five baselines, indicating that the seed nodes obtained by CLDE are scattered wider in the network. In Fig. 11, the proposed CLDE and CELF exhibit higher average shortest distances than that of other baselines, which indicates that the seed nodes can unfold efficient propagation capability, such evidence can be found in Fig. 10. Furthermore, it also suggests that the CLDE and CELF are capable of fully leveraging the characteristics of the network structure to enhance the propagation. The longer average shortest distance signifies that the seed nodes have superior propagation abilities, enabling them to disseminate the information to a broader range of nodes. It further reflects the importance of their dispersion in the network in allowing them to cover different areas. In addition, the average shortest path distances in Fig. 11 reveal that the seed nodes selected by CLDE are dispersed more rationally within the network, such fact indicates the algorithm’s capability to conduct effective global searches.

5.4.3 Similarity between the seed sets

For the IM problem, the selected seed sets play a crucial role. This section employs the Jaccard coefficient to quantify the similarity between the seed sets [52]. The purpose of the seeds similarity comparison is to evaluate the consistency and stability of the seeds generated by different algorithms. The definition is mathematically expressed in Eq. (18).

$$\begin{aligned} J(S_A,S_B) = \frac{|S_A \cap S_B|}{|S_A \cup S_B|} \end{aligned}$$

(18)

where $S_A$ and $S_B$ represent the seed node sets returned by algorithms A and B, respectively. $|S_A|$ denotes the size of $S_A$. The method highlights the similarities and differences in node selection strategies across various algorithms. The similarity among seed sets acquired by different algorithms on the six real networks under p = 0.1 with a seed set size of 60 is illustrated in Fig. 12. The $avg\_sim$ denotes the average similarity. The upper triangular matrix provides the average similarity value.

According to the influence spread shown in Fig. 6, the CELF achieved the best results compared with the six other algorithms. Therefore, the CELF can be selected as the groundtruth in the similarity comparisons. The similarity results shown in Fig. 12 further corroborate CLDE’s advantage in maintaining population diversity. As shown in the six subfigures, the similarities between the seed nodes of the CLDE and the CELF are the highest. Its high similarity with the CELF algorithm indicates its ability to identify high-quality solutions, while lower similarity with other algorithms suggests its capacity to explore a broader search space. The results suggest that CLDE is comparable to CELF in terms of influence spread in seed selection. In contrast, the other algorithms show lower seed set similarity compared to CLDE, which indicates a reduced overlap of nodes between the seed sets chosen by CLDE and other algorithms, demonstrating that CLDE is capable of avoiding the selection of nodes with low influence.

5.5 Statistical analysis

To validate the effectiveness of CLDE independently, rigorous statistical tests were conducted to examine whether there are significant statistical differences in the experimental results of the CLDE compared to the six other baselines across the six real networks and the three synthetic networks. The Wilcoxon rank-sum test, of which the confidence level was set at 0.05, was performed using SPSS software, and the results are presented in Table 4. The $N-$ and $N+$ represent the number of instances where the targeted algorithm outperforms the baselines and the number of instances where the targeted algorithm performs worse than the baseline, respectively. The Z-value and p-value reflect the extent of performance difference between the two algorithms and the probability of the observed difference occurring by chance, respectively. The p-value less than 0.05 typically indicates a statistically significant difference, suggesting that the observed improvement (or decline) in performance is unlikely due to random variation.

The experimental results in Table 4 reveal the following findings: (1) CLDE performs well on both the real-world networks and the synthetic networks, indicating that the type of network has a minimal impact on its performance. (2) When k is small, the performance advantage of CLDE is not pronounced such as $k = 10$. As k increases, CLDE generally outperforms other algorithms, including some scenarios of the CELF. (3) Compared to CELF, CLDE exhibits superior performance, with a p-value $<0.05$, which means a significant difference under $k=60$.

Table 4 The statistical results on CLDE and the other six algorithms at $\alpha = 0.05$ on the six real networks and three synthetic networks (IC model, $p = 0.05$)

Full size table

6 Discussion

This study proposes a competitive learning-driven differential evolution algorithm for the IM problem. CLDE shares certain similarities with LIDDE in the initialization, mutation, crossover, and selection steps. However, IT introduces key innovations in the design and application of these steps, particularly evident in the following aspects: (1) Competitive mechanism: CLDE introduces a mechanism that divides the population into "excellent subpopulation" and "common subpopulation". This division allows the algorithm to implement targeted operations based on the distinct characteristics of each subpopulation to enhance the efficiency of both global and local search operations. (2) Adaptive probabilistic updating-driven local search strategy: In CLDE, we have designed an adaptive probabilistic updating-driven local search mechanism that adjusts the intensity of local search based on the dynamic changes within the population. Compared to LIDDE, CLDE places greater emphasis on the dynamic adjustment of the search process, thereby improving the algorithm’s adaptability to complex optimization problems. (3) Innovative integration of operational processes: Although both algorithms employ mutation, crossover, and selection operations, CLDE optimizes the integration and adaptation of these operations. The excellent subpopulation and common subpopulation utilize differential mutation and local search techniques to achieve a balance between the global exploration and local exploitation.

Among the CELF, LAPSO-IM, LIDDE, PHEE, TS-VA-MODE and LDD, extensive experiments on nine datasets demonstrate that the CLDE has higher influence spread and faster convergence speed. Therefore, CLDE has achieved a substantial performance improvement. However, the running time of the CLDE is longer than that of the centrality-based LDD. To improve time efficiency, future research can further explore the use of variable neighborhood search or iterated local search as alternatives to the computationally intensive local search with node neighborhood [12].

The proposed CLDE not only offers new insights for theoretical research on the IM problem based on meta-heuristics but also provides strong support for strategy formulation in practical applications. From a theoretical perspective, we are the first to apply the competitive mechanism to solve the IM problem. Through this mechanism, we can gain a more comprehensive understanding of the complexity and dynamics of influence spread. From a practical perspective, CLDE provides a strong basis for applications such as disease prevention, social promotion and product marketing. In disease prevention, CLDE can categorize users into multiple groups based on their age. Decision-makers can adopt different disease prevention strategies for each group to control the transmission and prevalence of diseases in a more effective way. Within social media platforms, CLDE can dissect user behavior patterns and pinpoint content creators or trending topics that possess the high dissemination potential, bolstering the efficacy of platform recommendations and enhancing user engagement.

7 Conclusions

To balance the influence spread and efficiency when solving the influence maximization problem especially in large-scale social networks, this study proposes an optimization algorithm named CLDE, which is a meta-heuristic approach based on differential evolution. It employs a competitive mechanism of randomly partitioning the population to enhance the diversity of the population and reduce the risk of converging to local optima. Furthermore, the CLDE incorporates an adaptive probabilistic updating-driven local search strategy to make more effective exploration of the search space across different iterations. The method achieves a better balance between local exploitation and global exploration in accelerating the convergence speed. Experimental results demonstrate that the proposed CLDE outperforms state-of-the-art algorithms in influence spread and exhibits superior convergence speed.

Future research aims to develop more precise evolutionary mechanisms to further enhance the influence spread of the algorithm. Moreover, a more rational population partitioning mechanism and local search strategy such as variable neighborhood search or iterated local search will be designed to improve the efficiency of the evolutionary algorithm.

Availability of data

Data will be made available on reasonable request.

Availability of Material and/or Code

Code will be made available on reasonable request.

Notes

References

Barta S, Belanche D, Fernández A, Flavián M (2023) Influencer marketing on tiktok: the effectiveness of humor and followers’ hedonic experience. J Retail Consum Serv 70:103149. http://doi.org/10.1016/j.jretconser.2022.103149
Article Google Scholar
Zhao G, Jia P, Zhou A, Zhang B (2020) InfGCN: identifying influential nodes in complex networks with graph convolutional networks. Neurocomputing 414:18–26. http://doi.org/10.1016/j.neucom.2020.07.028
Article MATH Google Scholar
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 137–146. http://doi.org/10.1145/956750.956769
Huang H, Pan J, Chen Y (2024) The competitive diffusion of knowledge and rumor in a multiplex network: a mathematical model. Appl Math Comput 475:128719. http://doi.org/10.1016/j.amc.2024.128719
Article MathSciNet MATH Google Scholar
Liang G, Yao X, Gu Y, Huang H, Gu C (2022) Multi-batches revenue maximization for competitive products over online social network. J Netw Comput Appl 201:103357. http://doi.org/10.1016/j.jnca.2022.103357
Article MATH Google Scholar
Azaouzi M, Mnasri W, Romdhane LB (2021) New trends in influence maximization models. Comput Sci Rev 40:100393. http://doi.org/10.1016/j.cosrev.2021.100393
Article MATH Google Scholar
Zhang Q, Shuai B, Lü M (2022) A novel method to identify influential nodes in complex networks based on gravity centrality. Inf Sci 618:98–117. http://doi.org/10.1016/j.ins.2022.10.070
Article MATH Google Scholar
Chen D, Su H (2023) Identification of influential nodes in complex networks with degree and average neighbor degree. IEEE J Emerg Sel Topics Circuits Syst 13(3):734–742. http://doi.org/10.1109/JETCAS.2023.3283680
Article MATH Google Scholar
Aghaee Z, Ghasemi MM, Beni HA, Bouyer A, Fatemi A (2021) A survey on meta-heuristic algorithms for the influence maximization problem in the social networks. Computing 103:2437–2477. http://doi.org/10.1007/s00607-021-00945-7
Article MathSciNet MATH Google Scholar
Wang L, Ma L, Wang C, Xie N-G, Koh JM, Cheong KH (2021) Identifying influential spreaders in social networks through discrete moth-flame optimization. IEEE Trans Evol Comput 25(6):1091–1102. http://doi.org/10.1109/TEVC.2021.3081478
Article MATH Google Scholar
Tang J, Zhu H, Lan J, Zhang L, Song S (2022) Maximizing the influence spread in social networks: a learning-automata-driven discrete butterfly optimization algorithm. Symmetry 15(1):117. http://doi.org/10.3390/sym15010117
Article MATH Google Scholar
Martí R, Sevaux M, Sörensen K (2025) Fifty years of metaheuristics. Eur J Oper Res 321(2):345–362. http://doi.org/10.1016/j.ejor.2024.04.004
Article MathSciNet MATH Google Scholar
Qiu L, Tian X, Zhang J, Gu C, Sai S (2021) LIDDE: a differential evolution algorithm based on local-influence-descending search strategy for influence maximization in social networks. J Netw Comput Appl 178:102973. http://doi.org/10.1016/j.jnca.2020.102973
Article MATH Google Scholar
Singh SS, Srivastva D, Verma M, Singh J (2022) Influence maximization frameworks, performance, challenges and directions on social network: a theoretical study. J King Saud Univ Comput Inf Sci 34(9):7570–7603. http://doi.org/10.1016/j.jksuci.2021.08.009
Article MATH Google Scholar
Jaouadi M, Romdhane LB (2024) A survey on influence maximization models. Expert Syst Appl 248:123429. http://doi.org/10.1016/j.eswa.2024.123429
Article MATH Google Scholar
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 420–429. http://doi.org/10.1145/1281192.1281239
Cai T, Li J, Mian A, Li R-H, Sellis T, Yu JX (2022) Target-aware holistic influence maximization in spatial social networks. IEEE Trans Knowl Data Eng 34(4):1993–2007. http://doi.org/10.1109/TKDE.2020.3003047
Article MATH Google Scholar
Li D, Liu J (2019) Modeling influence diffusion over signed social networks. IEEE Trans Knowl Data Eng 33(2):613–625. http://doi.org/10.1109/TKDE.2019.2930690
Article MATH Google Scholar
Zhu Y, Tang J, Tang X, Wang S, Lim A (2023) 2-hop+ sampling: efficient and effective influence estimation. IEEE Trans Knowl Data Eng 35(2):1088–1103. http://doi.org/10.1109/TKDE.2021.3093934
Article MATH Google Scholar
Xiao Y, Chen Y, Zhang H, Zhu X, Yang Y, Zhu X (2024) A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood. Artif Intell Rev 57(5):1–21. http://doi.org/10.1007/s10462-024-10725-2
Article MATH Google Scholar
Wen T, Chen Y-w, Syed T, Wu T (2024) ERIUE: evidential reasoning-based influential users evaluation in social networks. Omega 122:102945. http://doi.org/10.1016/j.omega.2023.102945d
Article Google Scholar
Guo C, Li W, Liu F, Zhong K, Wu X, Zhao Y, Jin Q (2024) Influence maximization algorithm based on group trust and local topology structure. Neurocomputing 564:126936. http://doi.org/10.1016/j.neucom.2023.126936
Article MATH Google Scholar
Ling C, Jiang J, Wang J, Thai MT, Xue R, Song J, Qiu M, Zhao L (2023) Deep graph representation learning and optimization for influence maximization. In: International Conference on Machine Learning, pp 21350–21361. http://doi.org/10.5555/3618408.3619288
Tang J, Qu J, Song S, Zhao Z, Du Q (2024) GCNT: identify influential seed set effectively in social networks by integrating graph convolutional networks with graph transformers. J King Saud Univ Comput Inf Sci 36:102183. http://doi.org/10.1016/j.jksuci.2024.102183
Article Google Scholar
Rezaei AA, Munoz J, Jalili M, Khayyam H (2023) A machine learning-based approach for vital node identification in complex networks. Expert Syst Appl 214:119086. http://doi.org/10.1016/j.eswa.2022.119086
Article MATH Google Scholar
Gui L, Li X, Zhang Q, Gao L (2024) Domain knowledge used in meta-heuristic algorithms for the job-shop scheduling problem: review and analysis. Tsinghua Sci Technol 29(5):1368–1389. http://doi.org/10.26599/TST.2023.9010140
Article MATH Google Scholar
Tong H, Minku LL, Menzel S, Sendhoff B, Yao X (2024) Evaluating meta-heuristic algorithms for dynamic capacitated arc routing problems based on a novel lower bound method. IEEE Comput Intell Mag 19(4):31–44. http://doi.org/10.1109/MCI.2024.3440213
Article Google Scholar
Srinath N, Yilmazlar IO, Kurz ME, Taaffe K (2023) Hybrid multi-objective evolutionary meta-heuristics for a parallel machine scheduling problem with setup times and preferences. Comput Ind Eng 185:109675. http://doi.org/10.1016/j.cie.2023.109675
Article Google Scholar
Riquelme F, Muñoz F, Olivares R (2023) A depth-based heuristic to solve the multi-objective influence spread problem using particle swarm optimization. Opsearch 60(3):1267–1285. http://doi.org/10.1007/s12597-023-00662-z
Article MathSciNet MATH Google Scholar
Jiang Q, Song G, Gao C, Wang Y, Si W, Xie K (2011) Simulated annealing based influence maximization in social networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 25, pp 127–132. http://doi.org/10.1609/aaai.v25i1.7838
Chawla A, Cheney N (2023) Neighbor-hop mutation for genetic algorithm in influence maximization. In: Proceedings of the Companion Conference on Genetic and Evolutionary Computation, pp 187–190. http://doi.org/10.1145/3583133.3590755
Cunegatti E, Custode L, Iacca G (2024) Many-objective evolutionary influence maximization: Balancing spread, budget, fairness, and time. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp 655–658 . http://doi.org/10.1145/3638530.3654161
Konotopska K, Iacca G (2021) Graph-aware evolutionary algorithms for influence maximization. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp 1467–1475. http://doi.org/10.1145/3449726.3463138
Singh SS, Kumar A, Singh K, Biswas B (2019) LAPSO-IM: a learning-based influence maximization approach for social networks. Appl Soft Comput 82:105554. http://doi.org/10.1016/j.asoc.2019.105554
Article MATH Google Scholar
Gong M, Yan J, Shen B, Ma L, Cai Q (2016) Influence maximization in social networks based on discrete particle swarm optimization. Inf Sci 367:600–614. http://doi.org/10.1016/j.ins.2016.07.012
Article Google Scholar
Tang J, Song S, Lan J, Zhang L, Zhao F (2023) Steering the spread of influence adaptively in social networks via a discrete scheduled particle swarm optimization. Appl Intell 53(21):25070–25091. http://doi.org/10.1007/s10489-023-04884-y
Article Google Scholar
Khatri I, Choudhry A, Rao A, Tyagi A, Vishwakarma DK, Prasad M (2023) Influence maximization in social networks using discretized Harris’ Hawks optimization algorithm. Appl Soft Comput 149:111037. http://doi.org/10.1016/j.asoc.2023.111037
Article Google Scholar
Wang S, Liu W, Chen L, Zong S (2025) Influence maximization based on discrete particle swarm optimization on multilayer network. Inf Syst 127:102466. http://doi.org/10.1016/j.is.2024.102466
Article Google Scholar
Jiang P, Liu J, Cheng Y (2023) Bi-population enhanced cooperative differential evolution for constrained large-scale optimization problems. IEEE Trans Evol Comput 28(6):1620–1632. http://doi.org/10.1109/TEVC.2023.3325004
Article MATH Google Scholar
Liu J, Sarker R, Elsayed S, Essam D, Siswanto N (2024) Large-scale evolutionary optimization: a review and comparative study. Swarm Evol Comput 85:101466. http://doi.org/10.1016/j.swevo.2023.101466
Article MATH Google Scholar
Palakonda V, Ghorbanpour S, Kang J-M, Jung H (2024) External archive guided radial-grid multi objective differential evolution. Sci Rep 14(1):29006. http://doi.org/10.1038/s41598-024-76877-x
Article Google Scholar
Yu X, Xu P, Wang F, Wang X (2024) Reinforcement learning-based differential evolution algorithm for constrained multi-objective optimization problems. Eng Appl Artif Intell 131:107817. http://doi.org/10.1016/j.engappai.2023.107817
Article MATH Google Scholar
Biswas TK, Abbasi A, Chakrabortty RK (2022) A two-stage VIKOR assisted multi-operator differential evolution approach for influence maximization in social networks. Expert Syst Appl 192:116342. http://doi.org/10.1016/j.eswa.2021.116342
Article MATH Google Scholar
Zhu E, Wang H, Zhang Y, Zhang K, Liu C (2024) PHEE: identifying influential nodes in social networks with a phased evaluation-enhanced search. Neurocomputing 572:127195. http://doi.org/10.1016/j.neucom.2023.127195
Article MATH Google Scholar
Cunegatti E, Iacca G, Bucur D (2022) Large-scale multi-objective influence maximisation with network downscaling. In: International Conference on Parallel Problem Solving from Nature. Springer, pp 207–220. http://doi.org/10.1007/978-3-031-14721-0_15
Cheng R, Jin Y (2014) A competitive swarm optimizer for large scale optimization. IEEE Trans Cybern 45(2):191–204. http://doi.org/10.1109/TCYB.2014.2322602
Article MATH Google Scholar
Du J, Wang L, Fei M, Menhas MI (2023) A human learning optimization algorithm with competitive and cooperative learning. Complex Intell Syst 9(1):797–823. http://doi.org/10.1007/s40747-022-00808-4
Article MATH Google Scholar
Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104. http://doi.org/10.1103/physreve.74.036104
Article MathSciNet MATH Google Scholar
Zhong S, Zhang H, Deng Y (2022) Identification of influential nodes in complex networks: a local degree dimension approach. Inf Sci 610:994–1009. http://doi.org/10.1016/j.ins.2022.07.172
Article MATH Google Scholar
Tang J, Zhang R, Wang P, Zhao Z, Fan L, Liu X (2020) A discrete shuffled frog-leaping algorithm to identify influential nodes for influence maximization in social networks. Knowl-Based Syst 187:104833. http://doi.org/10.1016/j.knosys.2019.07.004
Article MATH Google Scholar
Li Y, Lu T, Li W, Zhang P (2024) HCCKshell: a heterogeneous cross-comparison improved Kshell algorithm for influence maximization. Inf Process Manage 61(3):103681. http://doi.org/10.1016/j.ipm.2024.103681
Article MATH Google Scholar
Soto JE, Hernández C, Figueroa M (2023) JACC-FPGA: a hardware accelerator for Jaccard similarity estimation using FPGAs in the cloud. Future Gener Comput Syst 138:26–42. http://doi.org/10.1016/j.future.2022.08.005
Article Google Scholar

Download references

Acknowledgements

This work was supported financially by the Gansu Provincial University Teachers Innovation Foundation under Grant Number 2024A-024, the Gansu Provincial Science Fund for Distinguished Young Scholars under Grant Number 23JRRA766, the National Natural Science Foundation of China under Grant Number 62162040, the Gansu Provincial Science Fund for Technological Innovation Guidance Plan under Grant Number 24CXGA046.

Author information

Baoqiang Chai, Xinyue Li, and Jianxin Tang contributed equally to this work.

Authors and Affiliations

School of Information science and Engineering, Lanzhou University, Lanzhou, 730000, China
Baoqiang Chai & Ruisheng Zhang
School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
Xinyue Li & Jianxin Tang

Authors

Baoqiang Chai
View author publications
Search author on:PubMed Google Scholar
Ruisheng Zhang
View author publications
Search author on:PubMed Google Scholar
Xinyue Li
View author publications
Search author on:PubMed Google Scholar
Jianxin Tang
View author publications
Search author on:PubMed Google Scholar

Contributions

BC was involved in the methodology; investigation; writing—review and editing; and project administration; RZ contributed to the software; experiments of the algorithms; writing—original draft; and conceptualization; XL assisted in the visualization and writing—review and editing; JT contributed to the resources; supervision; and formal analysis.

Corresponding author

Correspondence to Ruisheng Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chai, B., Zhang, R., Li, X. et al. CLDE: a competitive learning-driven differential evolution optimization for the influence maximization problem in social networks. J Supercomput 81, 700 (2025). http://doi.org/10.1007/s11227-025-07175-0

Download citation

Accepted: 09 March 2025
Published: 05 April 2025
DOI: http://doi.org/10.1007/s11227-025-07175-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

CLDE: a competitive learning-driven differential evolution optimization for the influence maximization problem in social networks

Abstract

Similar content being viewed by others

An efficient discrete differential evolution algorithm based on community structure for influence maximization

Improved Evolution Algorithm that Guides the Direction of Individual Mutation for Influence Maximization in Social Networks

Discrete Genetic Learning-Enabled PSO for Influence Maximization

Explore related subjects

1 Introduction

2 Related work

3 Preliminaries

3.1 Problem statement

3.2 Independent cascade model

3.3 Expected diffusion estimation function

4 Proposed method

4.1 The framework of CLDE

4.2 Population initialization

4.3 Population partition

4.4 Excellent subpopulation evolution

4.4.1 Local-influence-descending search strategy

4.4.2 Differential mutation

4.4.3 Crossover operation

4.5 Common subpopulation evolution

4.6 Selection operation

4.7 Complexity analysis

5 Experiments and analysis

5.1 Datasets and benchmarks

5.2 Parameter settings

5.3 Performance experiments

5.3.1 Comparison on the influence spread

5.3.2 Comparison on efficiency

5.4 Performance analysis of seed sets

5.4.1 Comparison on the convergence speed

5.4.2 Average shortest distance among seed nodes

5.4.3 Similarity between the seed sets

5.5 Statistical analysis

6 Discussion

7 Conclusions

Availability of data

Availability of Material and/or Code

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords