1 Introduction

In today’s rapidly evolving automotive technology landscape, the integration of intelligent vehicles and inter-vehicle network connectivity has become a pivotal industry trend. In-vehicle technologies and services have amalgamated into a unified platform. With the increase in complexity of automotive internal networks, more and more electronic control units (ECUs) are installed in vehicles [1, 2]. These ECUs enable various functionalities such as autonomous driving [3] and automatic parking [4]. At the same time, the ways in which vehicles connect to the external world are becoming increasingly diverse. However, this diversity increases the risk of threats to in-vehicle systems, leading to unprecedented cyber security challenges for the automotive industry [5].

The controller area network (CAN) is the primary protocol for modern automotive communication [6]. However, the original design of the CAN protocol did not adequately account for network security. Its broadcast nature means that message transmissions are carried out publicly and lack authentication or encryption, making it vulnerable to potential malicious attacks. Through physical access points (e.g., OBD-II ports) or external connections (e.g., Wi-Fi, Bluetooth), attackers can easily launch malicious attacks that interfere with or disrupt the normal operation of vehicle systems [7]. For example, Koscher et al. [8] conducted a fuzzing attack on the CAN bus by injecting and modifying CAN packets. Miller et al. [9] used a Wi-Fi connection to hack into the in-vehicle networks (IVN) system of the 2014 Jeep Cherokee and control the vehicle via the CAN bus, successfully disabling critical functions such as braking or causing the vehicle to stall, which ultimately triggered a recall of 1.4 million vehicles. Therefore, protecting the IVN from potential threats has become a top priority, inspiring researchers to explore IVN protection countermeasures.

Currently, there are several approaches to enhancing the security of the CAN bus. One approach involves providing encryption and authentication mechanisms for CAN protocol [10,11,12,13]. Recent advancements in security mechanisms for the industrial internet have also shown potential for improving IVN security by enabling robust cross-domain authentication [14]. Another effective approach focuses on developing intrusion detection systems (IDS) [15, 16] to monitor, detect, and prevent potential network intrusions. The real-time requirements and bandwidth limitations of the CAN bus make the first approach less practical, whereas IDSs effectively address these challenges. Since the communication between ECUs typically follows specific frequencies, the statistical features of in-vehicle networks messages remain relatively stable. Consequently, researchers often treat consecutive CAN messages as windows and analyze their characteristics for intrusion detection, employing techniques such as information theory [17, 18], analysis of message sequences [19], and graph-based methods [20,21,22,23,24]. Among these, graph-based methods have gained increasing attention due to their ability to model the structural relationships between messages, providing a more comprehensive representation of network behavior.

Graph-based approaches construct a graph for a fixed number of CAN messages and extract features, such as the number of nodes, edges, and the maximum degree. These features capture the underlying patterns in the data, which are essential for accurate classification. Once features are extracted, intrusion detection is performed on each graph using either mathematical methods [20], machine learning techniques [21], or deep learning models [22,23,24]. While state-of-the-art graph-based approaches are effective, they primarily rely on basic structural features, which struggle to capture the subtle patterns of spoofing and replay attacks. By manipulating legitimate messages, these attacks introduce slight but significant changes to the network’s behavior, making them challenging to detect. Deep learning methods combined with graph theory have the potential to address some of these limitations by capturing more complex patterns. However, they also introduce higher computational complexity and reduced interpretability, making them less suitable for resource-constrained environments such as modern vehicles.

To address these limitations in existing graph-based IDSs, this paper proposes a graph-based decision tree intrusion detection system. By analyzing the characteristics of different attack types, we introduce three novel features—time difference, betweenness centrality [25], and graph density [26]—to enhance the detection of spoofing and replay attacks. Specifically, time difference detects anomalies in the timing of CAN messages, as replay attacks inject messages at irregular intervals. Betweenness centrality captures changes in node importance caused by spoofing attacks, which alter the flow of information by introducing multiple legitimate IDs. Graph density measures overall connectivity, which increases abnormally due to additional edges introduced by malicious nodes. These features improve detection accuracy by capturing subtle deviations caused by these attacks. For mixed attacks, we conducted experiments on different datasets and achieved high multi-class classification accuracy, demonstrating the strong generalization capability of our model. Furthermore, decision tree models provide a computationally efficient and interpretable solution, requiring fewer resources and making them well suited for lightweight detection systems. We also assess the importance of each feature by evaluating its impact on model performance. A significant increase in prediction error indicates that the feature is important, enhancing the model’s interpretability and reliability.

The main contributions of this research can be summarized as follows:

  • To the best of our knowledge, this is the first time a graph-based model has been introduced into a decision tree algorithm for CAN intrusion detection. Moreover, this is the first method to utilize betweenness centrality and graph density as graph features for detection. The proposed method also possesses strong interpretability.

  • The proposed method has excellent detection accuracy for most of the attacks, especially for spoofing and replay attacks, which is higher than the existing detection methods based on graph theory. Also, the proposed method demonstrates strong generalization and improved performance in multi-class classification. More importantly, the proposed method is lightweight and computationally efficient, with reduced training and prediction times, making it suitable for resource-constrained environments.

  • The proposed method can detect various types of attacks without requiring modifications to the CAN protocol. This makes it potentially applicable to a wide range of communication systems utilizing the CAN protocol, though its effectiveness may vary depending on specific system configurations and attack scenarios.

The rest of the paper is organized as follows. Section 2 reviews related work on IDSs. Section 3 provides a brief overview of the background of the research. Section 4 describes graph theory and decision trees in detail and introduces the implementation of the proposed GDT-IDS. Section 5 presents the results of GDT-IDS and compares the proposed IDS with existing methods. Finally, Sect. 6 concludes the paper.

2 Related work

IDS can monitor, detect, and defend against potential network attacks, and they are relatively easy to deploy. In recent years, extensive research has been conducted in this field, which can be categorized into two main approaches based on the CAN protocol: IDS based on physical layer characteristics and IDS based on data link layer characteristics.

2.1 IDS based on physical layer characteristics

Researchers have investigated CAN networks by leveraging their physical characteristics, such as voltage signals or clock deviations. Cho et al. [27] proposed a method called Viden, which uses voltage curves for ECU fingerprinting. This approach enables the quick and accurate detection of illegally accessed nodes. Similarly, Choi et al. [28] introduced VoltageIDS, which combines time-domain and frequency-domain characteristics of voltage signals with machine learning to perform anomaly detection. Zhao et al. [29] developed a clock-based IDS, named CIDS, which calculates clock skew as an ECU fingerprint for intrusion detection and ECU identification. Lee et al. [30] proposed an error-based IDS, which detects deviations in CAN message timestamp intervals to identify attacks. While these methods are effective at detecting physical layer intrusions, such as signal interference or voltage abnormalities, they are less effective at detecting more complex attacks in the data link layer.

2.2 IDS based on data link layer characteristics

Since communication between ECUs tends to follow a specific frequency, the features of the CAN message sequence are generally stable. When an attacker injects malicious messages, the stability of these features is disrupted, which can be detected using various methods. The most common and effective approaches employ both statistical methods and machine learning techniques to analyze these features and detect anomalies.

Statistical methods typically treat consecutive CAN messages as windows and analyze their features for intrusion detection. Müter et al. [17] introduced the first information entropy-based method for in-vehicle networks intrusion detection, demonstrating its effectiveness in detecting intrusions on the CAN bus. Marchetti et al. [18] and Wu et al. [31] further expanded on this approach by using a fixed time interval and a fixed number of messages as a sliding window for intrusion detection. Yu et al. [32] proposed a time interval conditional entropy-based IDS, which calculates the conditional entropy of CAN messages to effectively detect intrusions. Song et al. [33] introduced a similarity-based IDS that employs an improved Levenshtein distance and N-gram algorithms to enhance detection accuracy. Statistical methods are often computationally efficient and easy to implement. However, the features in the windows tend to be relatively simple, making them susceptible to interference caused by nonperiodic CAN messages, which can disrupt the stability of the calculated features.

Machine learning techniques, especially deep learning-based methods, have significantly improved the performance of IDS. These methods not only detect anomalies within fixed message windows but also enable message-by-message detection by capturing the relationships between consecutive CAN message sequences. Seo et al. [34] proposed a generative adversarial network (GAN) IDS, which is capable of detecting four types of attacks with a high detection rate. Similarly, Song et al. [35] introduced an IDS based on deep convolutional neural networks (DCNN), which transforms CAN bus data into a grid-like structure for intrusion detection. Desta et al. [36] proposed a convolutional neural network (CNN)-based IDS that trains a model using images generated from recursive graphs to detect intrusions in the vehicle network. Lo et al. [37] proposed a hybrid IDS combining CNN for spatial feature extraction and long short-term memory (LSTM) for capturing temporal dependencies, forming a CNN-LSTM architecture. While these methods have demonstrated strong performance in intrusion detection, they are generally ineffective against replay attacks due to the unique characteristics of such attacks. Additionally, they often come with a high computational cost, which may limit their deployment on resource-constrained embedded systems.

Graph-based methods have gained significant attention due to their ability to model the relationships between CAN message sequences in a more structured manner. Instead of directly analyzing the sequence, these methods convert the message sequence within a window into a graph structure, thereby diversifying the features. After extracting graph features, statistical methods or machine learning techniques can be employed for learning and detection. Islam et al. [20] introduced a graph-based intrusion detection method using a Chi-square test. In their approach, a graph is constructed for every 200 CAN messages, and features such as the number of nodes, edges, and maximum degree are extracted. The Chi-square test is then applied to detect anomalies in the graph. However, this method requires a large number of messages for attack detection, resulting in considerable detection delays. To address this issue, Islam et al. [21] proposed a graph-based Gaussian Naive Bayes (GGNB) IDS. This model demonstrated improved accuracy, reduced delays. Recently, in the field of deep learning, state-of-the-art IDSs have extended traditional neural network approaches to graph-structured data [38]. Devnath et al. [22] proposed a graph convolutional network (GCN)-based IDS to detect anomalies in graphs constructed from CAN messages. Zhang et al. [23] proposed a federated graph neural network (GNN) for fast anomaly detection in CAN bus networks, which utilizes federated learning to protect user data privacy while efficiently detecting attacks. Xiao et al. [24] proposed a graph node attention network (GNAT) to extract features from graph structures, which are then used for intrusion detection with random forest (RF), achieving high accuracy in identifying malicious messages.

Despite these advancements, existing graph-based methods still face limitations. They are often reliant on simple graph features such as the number of nodes, edges, and maximum degree, which are insufficient for capturing the complex patterns of attacks like spoofing and replay attacks. Additionally, combining graph theory with deep learning models further increases computational complexity compared to traditional neural networks, making them less suitable for lightweight systems. To overcome these shortcomings, we propose a graph-based decision tree intrusion detection system. In our approach, a graph is constructed for every 200 CAN messages, and several new features are introduced based on the unique characteristics of different attack types. Specifically, three novel features—time difference, betweenness centrality, and graph density—are designed to effectively capture the characteristics of spoofing and replay attacks, significantly enhancing detection accuracy. Finally, anomaly detection is performed using a decision tree model, which not only ensures lightweight detection but also provides interpretability, making it suitable in resource-constrained systems.

3 Background knowledge

3.1 CAN protocol and characteristics

The controller area network (CAN) is a serial communication protocol primarily used to enable communication between nodes in various embedded systems. The CAN protocol was originally developed by Robert Bosch GmbH in the early 1980 s to facilitate communication between different control units in automotive electronic systems. Due to its high performance, high reliability, and unique design, it is now widely used in many other fields. In 1991, the CAN specification (version 2.0) was developed and published [6]. Figure 1 shows the standardized format of a CAN data frame. As illustrated, the standard CAN frame consists of seven segments: Start of Frame (SOF), Arbitration Field, Control Field, Data Field, CRC Field, ACK Field, and End of Frame (EOF). Each segment is described as follows:

  • Start of Frame (SOF): It represents the beginning of the data frame, and it consists of a single ’dominant’ bit.

  • Arbitration Field: It represents the frame priority and consists of two parts: Identifier (ID) (11bit) and Remote Transfer Request (RTR) (1bit). In the arbitration process, the identifier is regarded as a priority and the RTR indicates the type of CAN frame.

  • Control Field: It consists of 6 bits, including two reserved bits, Identifier Extension (IDE) and r0, and 4 bits of Data Length Code (DLC).

  • Data Field: It contains the data content of the transmitted information, which can range from 0 to 8 bytes.

  • CRC Field: It ensures the validity of the message by checking for transmission errors using a Cyclic Redundancy Code (CRC).

  • ACK Field: It indicates whether the message has been successfully received.

  • End of Frame (EOF): It marks the end of the data frame.

Fig. 1
figure 1

Format of CAN data frame

The CAN protocol has several characteristics that make it widely used in automotive communication systems. However, these same characteristics also introduce significant security vulnerabilities [39]:

  • Broadcast Communication: The CAN protocol uses a broadcast communication mechanism, which allows all devices on the network to receive messages. However, this characteristic introduces a security risk, as an attacker who gains control of a single node can monitor and read all messages on the CAN bus. This can lead to data leakage and enable further attacks.

  • Priority-Based Arbitration: Each CAN message is assigned a priority based on its identifier, with higher-priority messages interrupting lower-priority ones during transmission. This arbitration mechanism can be exploited by attackers who continuously send high-priority messages, effectively blocking legitimate messages and causing a denial-of-service (DoS) attack.

  • Lack of Authentication: The CAN protocol lacks an authentication mechanism to verify the legitimacy of messages. This allows attackers to inject spoofed messages into the network, potentially issuing malicious commands to vehicle systems. For instance, a spoofed message could cause a vehicle to suddenly shift gears or stall, leading to dangerous situations.

  • Lack of Encryption: CAN messages are transmitted in plaintext, making them vulnerable to interception and tampering. Attackers can easily modify or forge messages, compromising the integrity of the data and enabling attacks such as replay or spoofing.

The vulnerabilities of the CAN protocol stem from its design focus on simplicity and real-time communication, which prioritizes performance over security. However, these vulnerabilities enable attackers to easily exploit the CAN bus, leading to severe security risks. The following section provides an overview of various types of CAN bus attacks.

Fig. 2
figure 2

Four types of attacks. a DoS Attacks, b Fuzzy Attacks, c Spoofing Attacks, d Replay Attacks

3.2 CAN bus attack scenarios

The CAN bus is vulnerable to a variety of attacks, each with distinct characteristics and impacts on the system. Among the most common types are denial-of-service (DoS) attacks, fuzzy attacks, spoofing attacks, and replay attacks. These attacks exploit the inherent vulnerabilities of the CAN protocol, such as its lack of authentication and encryption mechanisms. Figure 2 provides a simple illustration of these attack types, which are described in detail below:

  • DoS Attacks: As shown in Fig. 2a, the attacker exploits the CAN bus arbitration mechanism by continuously sending a large number of high-priority messages (e.g., 0x000) into the network. This interference blocks legitimate messages from being transmitted in a timely manner, causing system performance to deteriorate and potentially leading to a complete system crash.

  • Fuzzy Attacks: As shown in Fig. 2b, the attacker generates messages with randomly assigned IDs and arbitrary data, injecting them into the network. These large volumes of meaningless messages can cause the electronic control unit (ECU) to behave abnormally or even paralyze vehicle functions. For instance, this may result in abnormal engine noise or unexpected power surges.

  • Spoofing Attacks: As shown in Fig. 2c, after analyzing and obtaining details of legitimate CAN messages in the network, the attacker injects forged CAN messages with specific functions into the network to control certain vehicle operations [40]. Examples include modifying the RPM gauge, changing gears, or other malicious actions. The injected messages in this attack typically have valid IDs but altered data segment content, making this attack more difficult to detect than the previous two.

  • Replay Attacks: As shown in Fig. 2d, the attacker captures CAN messages over a certain period of time and then re-sends these messages into the network to interfere with the normal operation of the vehicle. Since replay attacks involve retransmitting legitimate messages, it becomes challenging to distinguish between normal messages and those sent by the attacker.

Our proposed method can effectively detect the above four types of attacks, with particularly strong performance against spoofing and replay attacks. Furthermore, our detection results significantly outperform other graph-based methods.

4 Proposed methodology

We propose a graph-based decision tree IDS for in-vehicle CAN bus networks. An overview of the proposed methodology is shown in Fig. 3.

We conduct experiments using a dataset generated from a real vehicle. The dataset is created by simulating an attacker launching various types of attacks. From this dataset, we extract the CAN IDs of each message to construct thousands of directed graphs, where each graph represents the communication structure of a specific window. Next, we extract features from these graphs, including the number of nodes, edges, degree, betweenness centrality, and graph density. These features capture the structural and topological properties of the CAN bus communication network, which are critical for distinguishing between normal and malicious behavior. The extracted features are then used to train decision tree classifiers. These classifiers are subsequently applied for intrusion detection, determining whether the system is under attack. For mixed attacks, the proposed system can perform multi-class classification to identify the specific type of attack. The details of each step in the proposed methodology are described in the following sections.

Fig. 3
figure 3

Overview of GDT-IDS

4.1 CAN message graph generation

In this paper, we utilize CAN bus messages to construct directed graphs. For consistency and comparison, we create a graph for every 200 messages, following the window size defined in [20]. At the end of this paper, we will justify our choice of 200 as the window size by analyzing its impact on anomaly detection performance across different window sizes. Figure 4 illustrates a portion of the message sequence from the dataset, where a simple directed graph is constructed by using the ID of each message as a vertex and the order between messages as the edge connecting two vertices. As shown in Fig. 5, the message with ID 0430 follows the message with ID 0350, forming a directed edge from vertex 0350 to vertex 0430.

Fig. 4
figure 4

Example of dataset

Fig. 5
figure 5

Example of a constructed graph

Algorithm 1 outlines the procedure for constructing directed graphs from the dataset. The algorithm processes consecutive messages by extracting their IDs as graph vertices, creating edges to represent message order, and building graphs when a specific number of messages (200) are reached. Key parameters like the graph’s adjacency list (G), the list of all graphs (GraphList), and timestamps (FirstTimestamp, LastTimestamp) are updated throughout the process. Once a graph is constructed, it is added to GraphList, and the process is reset for the next graph. After processing all messages, the algorithm returns the list of constructed graphs.

Algorithm 1
figure a

Graph generation

4.2 Graph feature and analysis of attack characteristics

In this paper, we analyze several key features of the constructed graphs to identify and characterize attacks on the CAN bus network. These features include nodes, edges, degree, time difference, betweenness centrality, and graph density. Each of these features is described in detail below.

4.2.1 Number of nodes and edges

The number of nodes and edges in a graph reflects its structural complexity. When an attacker launches attacks, such as fuzzy attacks, a large number of fake messages with randomly generated IDs are injected into the network. This significantly increases the number of nodes and edges in each graph. As illustrated in Fig. 6a and b, the distributions of nodes and edges in attacked graphs are clearly distinguishable from normal graphs. This distinction is particularly evident in the case of fuzzy attacks, where the number of nodes and edges increases substantially compared to normal graphs.

Fig. 6
figure 6

Distribution of feature values. The x-axis represents the graph number, and the y-axis denotes the feature values of graph

4.2.2 Degree

In graph theory, the degree of a node is categorized into in-degree and out-degree. In-degree is the number of edges terminating at a node, while out-degree is the number of edges originating from it. When an attacker launches a DoS attack by sending a large number of high-priority messages, such as those with ID 0x000, the arbitration mechanism of the CAN bus ensures that messages with ID 0x000 are processed first, followed by messages with other IDs. As a result, the number of edges originating from the node with ID 0x000 increases dramatically, leading to a significant rise in the maximum out-degree of the graph. As shown in Fig. 6c, the maximum out-degree of a DoS attacked graph is noticeably higher than that of a normal graph. A similar upward trend in the maximum out-degree can also be observed for other types of attacks.

While the previously discussed features are effective for detecting fuzzy and DoS attacks, they are less effective for replay and spoofing attacks. This is because these attacks do not inject high-priority or randomly generated IDs. Instead, they inject messages with multiple IDs, often using IDs that already exist in the original dataset. This behavior significantly increases the difficulty of attack detection.

To address this challenge, we analyzed the characteristics of these two attacks and propose three novel features—time difference, betweenness centrality, and graph density.

Fig. 7
figure 7

Replay attack time interval between every two messages

4.2.3 Time difference

The time difference represents the duration between the timestamp of the first message and the timestamp of the last message during the construction of a graph, which is essentially the time required to construct the graph. As shown in Fig. 7, an analysis of the time interval between consecutive messages in the replay attack dataset reveals that the time interval decreases dramatically after an attack. This observation highlights the sensitivity of replay attacks to time intervals, which inspired the use of time difference as a feature.

Under normal conditions, CAN bus messages are transmitted periodically, resulting in a relatively consistent time required to construct each graph. However, during replay attacks, the attacker injects a large number of messages within a short period of time. This behavior significantly reduces the time interval between consecutive messages. Consequently, since each graph is constructed using a fixed number of 200 messages, the overall time required to construct a graph decreases, leading to a smaller time difference.

As shown in Fig. 6d, a noticeable decreasing trend in time difference can be observed for both spoofing and fuzzy attacks. For replay attacks, the reduction in time difference is particularly significant. This makes time difference a critical feature for detecting replay attacks, as well as providing useful insights for identifying other types of attacks.

Fig. 8
figure 8

Example of betweenness centrality

4.2.4 Betweenness centrality

Betweenness centrality [25] is a measure used to identify nodes that play a critical role in a network. For every pair of nodes in a connected graph, there exists at least one shortest path between them. If a node lies on the shortest paths of many node pairs in the network, it has a higher betweenness centrality, indicating that it may serve as a key bridge in the information transfer process. Betweenness centrality is defined by the following equation,

$$\begin{aligned} {{c}_{B}}\left( v \right) =\sum \limits _{s\ne v\ne t\in V}{\frac{{{\sigma }_{st}}\left( v \right) }{{{\sigma }_{st}}}} \end{aligned}$$
(1)

where V is the set of nodes, \({{\sigma }_{st}}\) represents the number of shortest paths between nodes s and t, and \({{\sigma }_{st}}\left( v \right)\) is the number of those paths passing through node v.

The betweenness centrality of a node is influenced by the total number of nodes in the graph. To enable a fair comparison of the relative importance of nodes across graphs with different sizes, the calculated value can be normalized. This normalization is achieved by dividing the betweenness centrality by the total number of node pairs excluding the node v, ensuring that \({{c}_{B}}\left( v \right) \in \left[ 0,1 \right]\). For directed graphs, the betweenness centrality is normalized using the following formula,

$$\begin{aligned} c_B^{\text {norm}}(v) = \frac{{{c}_{B}}\left( v\right) }{\left( n - 1\right) \left( n - 2\right) } \end{aligned}$$
(2)

where n is the total number of nodes in the graph. This normalization ensures that the betweenness centrality values are comparable across graphs of varying sizes and structures.

For spoofing attacks, the attacker injects messages with an already existing ID multiple times. This injection creates new edges and paths in the graph, increasing the betweenness centrality of the affected node. As shown in Fig. 8a, consider a simple directed graph with five nodes. The betweenness centrality of a node is calculated based on the number of shortest paths that pass through the node. For example, node 1 has a betweenness centrality of 6, which is normalized to 0.5000. After a spoofing attack, when an attacker injects a message with ID 1 after the message with ID 5, the betweenness centrality of node 1 increases significantly, from 0.5000 to 0.7500, as shown in Fig. 8b. This increase occurs because the added edge (5, 1) introduces more shortest paths through node 1.

As shown in Fig. 6e, the median of the maximum betweenness centrality increases significantly for spoofing and DoS attacks. In contrast, replay attacks exhibit a notable decrease in betweenness centrality. This is because replay attacks inject CAN messages from a specific time period that was previously extracted, and these messages typically carry new IDs. As a result, the graph structure becomes more dispersed, reducing the concentration of shortest paths through specific nodes and leading to lower betweenness centrality values. The distinct differences in the betweenness centrality distributions for DoS attacks, spoofing attacks, replay attacks, and normal graphs demonstrate its effectiveness as a key feature for detecting various types of attacks.

4.2.5 Graph density

In graph theory, density [26] is defined as the ratio of the actual number of edges in a graph to the maximum possible number of edges. For directed graphs, the maximum possible number of edges is given by \(n\left( n-1 \right)\). Density can be described by the following equation,

$$\begin{aligned} \text{Density}=\frac{R}{N\left( N-1 \right) } \end{aligned}$$
(3)

where R represents the number of actual edges in the graph, and N represents the number of nodes in the graph.

Graph density quantifies the relationship between the actual number of edges and the maximum possible number of edges in a graph, with values ranging from 0 to 1. A density value close to 1 indicates a densely connected graph, where nodes are highly interconnected. As shown in Fig. 6f, the graph density increases during spoofing and replay attacks because the attacker injects messages with already existing IDs multiple times, thereby increasing the number of actual edges in the graph. Similarly, DoS attacks also lead to an increase in graph density due to the injection of a large number of messages with high-priority IDs. However, in the case of fuzzy attacks, the attacker injects messages with random IDs, significantly increasing the maximum possible number of edges in the graph. This causes a decrease in density, as the actual number of edges becomes relatively smaller compared to the maximum possible edges.

By analyzing these features, it is evident that each of the four attacks exhibits distinct behaviors across different graph-based metrics. For example, maximum out-degree provides the most significant differentiation for DoS attacks, while the number of edges is the most distinguishing feature for fuzzy attacks. For spoofing and replay attacks, features such as time difference, betweenness centrality, and density demonstrate clear differentiation. Furthermore, these features also effectively distinguish between fuzzy and DoS attacks. Overall, the proposed features capture the unique characteristics of each attack, enabling reliable detection and classification.

4.3 Graph-based decision tree IDS

4.3.1 Feature selection

The decision tree generally consists of three main components: nodes, edges, and leaf nodes. Specifically, each node represents a feature attribute used for decision-making, the edges connecting the nodes represent the divisions or conditions based on the values of the feature, and the leaf nodes represent the final classification labels or regression results.

The construction of a decision tree involves selecting the optimal features to split the data. This selection is typically guided by specific metrics, such as information gain, information gain ratio, or the Gini coefficient, with the goal of identifying features that best differentiate the data. In this paper, we adopt the CART (classification and regression trees) algorithm [41] for the classification task.

CART constructs a binary tree structure, where each node splits into two branches based on a specific value of a feature. The algorithm uses the Gini coefficient to measure impurity, where a lower Gini coefficient indicates a purer category in the dataset. A purer category implies a higher likelihood that the samples within a subset belong to the same class. At each step, the feature with the smallest Gini coefficient is selected as the best feature for splitting.

Suppose the dataset D has k classifications with a total number of samples \(\left| D \right|\), the number of samples in each classification is \(\left| {{D}_{1}} \right|\), \(\left| {{D}_{2}} \right|\),...\(\left| {{D}_{k}} \right|\). The probability that a sample belongs to class i is given by:

$$\begin{aligned} {{p}_{i}} = \frac{\left| {{D}_{i}} \right| }{\left| D \right| } \end{aligned}$$
(4)

The Gini coefficient for dataset D is defined as:

$$\begin{aligned} \text{Gini}(D) = \sum \limits _{i=1}^{k}{{{p}_{i}}}\left( 1 - {{p}_{i}} \right) = 1 - \sum \limits _{i=1}^{k}{{{p}_{i}}^{2}} \end{aligned}$$
(5)

For a given feature A, the dataset D is divided into two subsets, \({{D}_{1}}\) and \({{D}_{2}}\), based on a specific feature value. This ensures that the CART algorithm constructs a binary tree rather than a multi-way tree. The Gini coefficient for dataset D under the condition of feature A is calculated as:

$$\begin{aligned} \text{Gini}(D, A) = \frac{\left| {{D}_{1}} \right| }{\left| D \right| } \text{Gini}({{D}_{1}}) + \frac{\left| {{D}_{2}} \right| }{\left| D \right| } \text{Gini}({{D}_{2}}) \end{aligned}$$
(6)

For discrete feature variables, the decision tree enumerates all possible binary combinations of the discrete feature values and selects the combination with the smallest Gini coefficient for splitting. The remaining combinations are considered in subsequent splits. For continuous feature variables, the decision tree discretizes these features using the following method:

Assume there are m samples for a continuous feature A. First, the samples are sorted in ascending order as \({{a}_{1}}\), \({{a}_{2}}\),...\({{a}_{m}}\). The CART algorithm computes the average of each pair of adjacent values to generate \(m - 1\) division points, where the \(i-th\) division point \({{T}_{i}}\) is defined as: \({{T}_{i}} = \frac{{{a}_{i}} + {{a}_{i+1}}}{2}\). For these \(m-1\) division points, each point divides the sample into two parts, \({{D}_{1}}\) and \({{D}_{2}}\). The Gini coefficient for each division point is calculated as a binary classification point using Eq. (6). The division point with the smallest Gini coefficient is selected as the binary classification point for this continuous feature. This process is repeated for all features, and the feature with the smallest Gini coefficient across all features is chosen as the optimal feature. Then, the data is divided based on the selected division point. For example, suppose the feature with the smallest Gini coefficient is A, and its optimal division point is a. In this case, the dataset is divided into left and right subsets based on a, corresponding to the left and right child nodes of the decision tree, respectively. It is important to note that if a node is split based on a continuous feature, this feature can still participate in feature selection for subsequent nodes. This ensures that continuous features are not excluded from further splits and can continue to contribute to the construction of the tree.

4.3.2 Recursive construction

For the graph features proposed in this paper, all features are treated as continuous variables. The decision tree is constructed by recursively dividing the dataset into two subsets at each step, based on the optimal division point of a selected feature. This process continues until the tree construction is complete. For example, as shown in Fig. 9, the partial decision tree construction for spoofing attacks is illustrated. First, the Gini coefficient is calculated for all possible division points of the features. After computation, when the maximum betweenness centrality has a division point of 0.597, the Gini coefficient is found to be the smallest. Therefore, this division point is selected as the optimal split, and the dataset is divided into left and right child nodes accordingly. The same process is repeated for the left and right child nodes. Splitting continues until the Gini coefficient of a node becomes 0, indicating that all data within the node belong to the same category. At this point, the node is labeled as a leaf node, and no further splitting is performed. Once all nodes are split into leaf nodes, the decision tree construction is complete.

Fig. 9
figure 9

Example of decision tree for spoofing attack. At each node, the dataset is divided into two categories (normal and attack) based on the best division feature and its corresponding division point. The Gini coefficient is used to measure the impurity of the split, and the process continues until the Gini coefficient of a node becomes 0

Algorithm 2 outlines the proposed intrusion detection algorithm. The input to the algorithm is the list of constructed graphs generated by Algorithm 1, and the output is the final classification result. In line 1, each type of feature list is initialized. In line 2, a total feature list is initialized to store all the individual feature lists. From lines 3 to 11, the algorithm iterates over each graph, calculates each feature separately, and appends the calculated values to the corresponding feature list. Finally, all feature lists are added to the total feature list. In lines 12 to 14, the model is trained using a decision tree classifier, with 70% of the data used as the training set and the remaining 30% as the test set. The trained model is then used to return the final classification result.

Algorithm 2
figure b

Intrusion detection algorithm

5 Experimental results and evaluation

5.1 Dataset

To evaluate the performance of the proposed model, we conducted experiments on four datasets: the Car-Hacking Dataset [35], the IVN Intrusion Detection Challenge Dataset [42], the Car Hacking: Attack & Defense Challenge Dataset [43], and the OpelAstra Dataset [44]. These datasets were selected to ensure the generalization of the model and to comprehensively test its ability to detect various types of attacks. We provide a detailed description of each dataset, including the types of attacks they contain and the data distribution used in our experiments.

Car-Hacking Dataset [35] contains four types of attacks: DoS attacks, fuzzy attacks, spoofing attacks targeting the drive gear, and spoofing attacks targeting the RPM gauge. This dataset is widely used for evaluating intrusion detection systems in automotive networks.

IVN Intrusion Detection Challenge Dataset [42] was included because the Car-Hacking Dataset does not contain replay attacks. This dataset simulates replay attacks on a real car, the KIA Soul, using its OBD-II port. It provides a realistic representation of replay attacks in in-vehicle networks, making it a valuable resource for testing intrusion detection systems.

Car Hacking: Attack & Defense Challenge Dataset [43] contains four types of attacks: spoofing attacks, fuzzy attacks, flooding attacks, and replay attacks. Notably, flooding attacks are a specific type of DoS attack, where a large number of high-priority messages are injected into the network to overwhelm the system. This dataset is specifically designed to evaluate the robustness of intrusion detection systems across diverse attack scenarios.

OpelAstra Dataset [44] contains a variety of attack types, making it suitable for testing the robustness and generalization of intrusion detection systems. The dataset includes the following types of attacks: DoS, diagnostic, fuzzing CAN ID, fuzzing payload, replay, and suspension attacks. For comparison with GGNB [21], we conducted experiments on this dataset.

The experimental data used in this paper includes normal messages from all of the above datasets. In addition, attack messages from the Car-Hacking Dataset (DoS, fuzzy, spoofing targeting the drive gear, and spoofing targeting the RPM gauge) and the IVN Intrusion Detection Challenge Dataset (replay attacks) were used. Using the method described above, graphs are constructed for these messages, resulting in approximately 48K, 49K, 52K, 53K, 33K, and 68K graphs for each type of attack, respectively. The detailed distribution of the experimental data after graph construction is summarized in Table 1.

Table 1 Distribution of experimental data
Fig. 10
figure 10

Detection rate for different window sizes

5.2 Evaluation metrics

To evaluate the performance of the proposed model, we employ four widely used metrics: accuracy, precision, recall, and F1-score. These metrics are formulated as follows,

$$\begin{aligned} \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \end{aligned}$$
(7)
$$\begin{aligned} \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \end{aligned}$$
(8)
$$\begin{aligned} \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} \end{aligned}$$
(9)
$$\begin{aligned} F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \end{aligned}$$
(10)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. These metrics are crucial for evaluating classification models, as they provide a comprehensive understanding of the model’s performance in predicting both positive and negative categories. By analyzing these metrics, we can identify the strengths and weaknesses of the proposed model across different classification scenarios.

Fig. 11
figure 11

Confusion Matrix of a DoS Attacks, b Fuzzy Attacks, c Spoofing Attacks, d Replay Attacks, e Mixed Attacks, based on the Car-Hacking Dataset

5.3 Window size selection

The window size is a user-defined parameter that can be adjusted based on the robustness requirements of the design. In our experiments with the Attack & Defense Challenge Dataset [43], we evaluated window sizes of 100, 150, 200, 250, and 300 to generate graphs and extract features for detection. The results, presented in Fig. 10, show that when the window size exceeds 200, the detection accuracy for various attack types stabilizes. To ensure consistency and facilitate comparison with previous graph-based methods, we selected a window size of 200 for our experiments.

Table 2 Comparison with existing graph-based methods
Table 3 Comparison with GGNB on the OpelAstra Dataset
Table 4 Time comparison with GGNB and GNAT
Table 5 Comparison with existing deep learning methods
Table 6 Comparison of FLOPs and parameters

5.4 Comparison with existing graph-based methods

The experiments were conducted using a 12th Gen Intel(R) Core(TM) i5-12400 12-core processor. The proposed model was trained and tested using 70% and 30% of the data, respectively. Our method demonstrates excellent detection performance, achieving detection accuracies of 99.81%, 99.87%, 99.83%, 99.94%, and 99.43% for DoS, fuzzy, spoofing, replay, and mixed attacks, respectively. The confusion matrices are shown in Fig. 11a–e, respectively.

We compare our proposed model with existing graph-based methods on the Car-Hacking Dataset [35] and the IVN Intrusion Detection Challenge Dataset [42]. As shown in Table 2, our method demonstrates outstanding detection performance across all attack types. Specifically, it achieves a notably higher detection accuracy for spoofing and replay attacks compared to existing methods, demonstrating the effectiveness of three novel features—time difference, betweenness centrality, and graph density. Additionally, our approach demonstrates strong capability in handling multi-class classification problems, achieving high accuracy in detecting mixed attacks. This is attributed to the selection of the most effective features for detecting each specific attack type. To further evaluate the generalization of the model, we applied GDT-IDS to the Car Hacking: Attack & Defense Challenge Dataset [43], achieving an accuracy of 99.13% in detecting mixed attacks. The corresponding confusion matrix is presented in Fig. 12. Overall, our proposed method achieves the highest detection accuracy for all attack types, except for DoS, where it performs comparably to other methods, with all detection rates exceeding 99%.

In addition, we evaluated the proposed GDT-IDS on the OpelAstra Dataset [44], as summarized in Table 3. The detection accuracies for DoS, diagnostic, fuzzing CAN ID, fuzzing payload, replay, suspension, and mixed attacks are 100%, 99.92%, 99.92%, 100%, 100%, 96.16%, and 99.79%, respectively. While the detection accuracy for the suspension attacks is slightly lower than GGNB [21], our method outperforms GGNB in all other attack types.

Furthermore, we analyzed the time complexity of the decision tree model in both the training and prediction phases. The time complexity for the training phase is \({\mathcal {O}}(N \log N)\), where N represents the number of training samples. For the prediction phase, the time complexity for predicting a single sample is \({\mathcal {O}}(\log N)\), making the model highly efficient and well suited for real-time applications. In comparison, GGNB [21] employs Gaussian Naive Bayes (GNB), while GNAT [24] utilizes random forest (RF) to learn and detect anomalies from graph features. As shown in Table 4, our method requires slightly more training time than GGNB but is significantly faster than GNAT. Although the training time is marginally longer than GGNB, our method achieves significantly lower prediction time compared to both GGNB and GNAT, demonstrating its superior computational efficiency and suitability for real-time systems.

Fig. 12
figure 12

Confusion matrix based on the Car Hacking: Attack & Defense Challenge Dataset [43]

Fig. 13
figure 13

Feature importance measures the contribution of each feature to the model’s prediction accuracy, feature importance analysis of a DoS Attacks, b Fuzzy Attacks, c Spoofing Attacks, d Replay Attacks, e Mixed Attacks

5.5 Comparison with existing deep learning-based methods

We compare our proposed model with existing deep learning-based intrusion detection methods using the Car-Hacking Dataset [35], evaluating performance metrics such as precision, recall, and F1-score. The detection results for each type of attack are summarized in Table 5. The results demonstrate that our model outperforms most existing methods across various attack types. Although some metrics are marginally lower than those of DCNN [35] and CNN-LSTM [37] models, these models are significantly more complex, requiring substantial computational resources and extensive data. As shown in Table 6, our model requires much fewer FLOPs (floating point operations) and parameters compared to DCNN and CNN-LSTM models. These results emphasize that despite achieving competitive performance, our model is more lightweight and suitable for deployment in resource-constrained environments.

Furthermore, due to the unique characteristics of replay attacks, most deep learning-based methods struggle to detect them effectively. These methods often fail to capture the temporal dependencies and subtle patterns associated with replay attacks, resulting in poor detection performance. In contrast, our proposed model excels in detecting replay attacks, achieving outstanding results with high accuracy and reliability. This highlights the detection capability and generalization ability of our approach, particularly in scenarios where traditional deep learning methods fall short.

To further evaluate the model’s interpretability, we conducted a feature importance analysis to identify the most impactful features for each attack type. Feature importance is measured by permuting a feature’s values in the test set and observing the resulting change in prediction error. An increase in prediction error indicates that the feature is important, as the model relies on it for accurate predictions. Conversely, if the prediction error remains unchanged, the feature is considered less important, suggesting that the model does not heavily rely on it for predictions.

As shown in Fig. 13, we also incorporate features such as maximum in-degree, minimum in-degree, and minimum out-degree for comparison with GGNB [21]. For DoS attacks, as shown in Fig. 13a, the maximum out-degree is the most influential feature, followed by the proposed features—betweenness centrality, time difference, and graph density. For fuzzy attacks, as shown in Fig. 13b, graph density is the most critical feature, followed by the number of nodes, time difference, and the number of edges. For spoofing and replay attacks, as shown in Fig. 13c and d, betweenness centrality and graph density are highly impactful. In particular, for replay attacks, time difference, betweenness centrality, and graph density contribute significantly more than other features. Finally, for mixed attacks, as shown in Fig. 13e, betweenness centrality, time difference, and graph density are the most important features influencing predictions. By analyzing feature importance for each attack type, we validate the effectiveness of the three proposed features—time difference, betweenness centrality, and graph density. These features are particularly critical for detecting all attack types, especially spoofing and replay attacks. This not only enhances detection performance but also improves the interpretability of our model.

6 Conclusion

In this paper, we proposed a novel graph-based decision tree intrusion detection system, named GDT-IDS. The method constructs a graph for every 200 CAN messages, and features such as the number of nodes, edges, and maximum degree are extracted from each graph. To address the limitations of existing methods in detecting replay and spoofing attacks, we introduced three new graph-based features—time difference, betweenness centrality, and graph density—which effectively capture the unique characteristics of these attacks. These features are then input into a decision tree classifier to identify and categorize intrusions. Experimental results demonstrate that GDT-IDS achieves excellent performance across various attack types. Notably, it significantly outperforms existing graph-based methods in detecting spoofing and replay attacks, showcasing its superior capability in identifying these specific threats. Furthermore, it achieves remarkably shorter prediction times, highlighting its efficiency. The proposed method also offers competitive performance compared to deep learning-based approaches, with the advantages of lower computational cost and enhanced interpretability. Unlike many deep learning methods, which struggle to capture temporal dependencies and fail to detect replay attacks effectively, GDT-IDS maintains high accuracy and reliability in detecting such attacks. These qualities make GDT-IDS a promising approach for detecting various attacks, offering a lightweight and efficient solution for vehicle intrusion detection.

While GDT-IDS effectively detects intrusion behavior within CAN message windows, its current focus is on identifying whether a window is under attack rather than pinpointing the specific message responsible for the anomaly. In future work, we aim to enhance the system by developing a lightweight yet precise approach capable of not only detecting anomalous windows but also identifying the specific malicious messages within them. This improvement will further refine the system’s capabilities, enabling it to more fully address the security challenges of IVN more comprehensively.