GDT-IDS: graph-based decision tree intrusion detection system for controller area network

Ye, Pengdong; Liang, Yanhua; Bie, Yutao; Qin, Guihe; Song, Jiaru; Wang, Yingqing; Liu, Wanning

doi:10.1007/s11227-025-07116-x

GDT-IDS: graph-based decision tree intrusion detection system for controller area network

Published: 09 March 2025

Volume 81, article number 591, (2025)
Cite this article

Download PDF

Access provided by China Pharmaceutical University

The Journal of Supercomputing Aims and scope Submit manuscript

GDT-IDS: graph-based decision tree intrusion detection system for controller area network

Download PDF

Pengdong Ye^1,2,
Yanhua Liang^1,2,
Yutao Bie³,
Guihe Qin^1,2,
Jiaru Song^1,2,
Yingqing Wang^1,2 &
…
Wanning Liu^1,2

323 Accesses
3 Citations
Explore all metrics

Abstract

With the rapid development of automotive technology, the security of in-vehicle networks (IVN) has received increasing attention. The controller area network (CAN), widely used for in-vehicle communication, faces significant security risks due to its inherent vulnerabilities. These risks can lead to attacks, data leakage, and abnormal functioning of vehicle systems. Currently, the mainstream security approach is the intrusion detection system (IDS). Graph-based IDSs have been widely studied for their ability to represent the relationships between CAN messages through nodes and edges, providing an intuitive and structured analysis that enables effective detection of various types of attacks. However, existing graph-based methods rely on basic features, such as the number of nodes, edges, and the maximum degree, which are insufficient for capturing the complex characteristics of spoofing and replay attacks, resulting in suboptimal detection accuracy. To address this, we propose a graph-based decision tree IDS, named GDT-IDS, specifically tailored to the characteristics of spoofing and replay attacks. By analyzing these attack types, we introduce three novel graph-based features—time difference, betweenness centrality, and graph density—that significantly enhance detection accuracy. Moreover, our method can perform multi-class classification, effectively handling mixed attack scenarios. The use of a decision tree model ensures the process remains lightweight and interpretable, making it suitable for resource-constrained systems like vehicles.

Deep learning-based intrusion detection system for in-vehicle networks with knowledge graph and statistical methods

Article 18 November 2024

Intrusion Detection System for Connected Automobiles Security

Practical IDS on In-vehicle Network Against Diversified Attack Models

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In today’s rapidly evolving automotive technology landscape, the integration of intelligent vehicles and inter-vehicle network connectivity has become a pivotal industry trend. In-vehicle technologies and services have amalgamated into a unified platform. With the increase in complexity of automotive internal networks, more and more electronic control units (ECUs) are installed in vehicles [1, 2]. These ECUs enable various functionalities such as autonomous driving [3] and automatic parking [4]. At the same time, the ways in which vehicles connect to the external world are becoming increasingly diverse. However, this diversity increases the risk of threats to in-vehicle systems, leading to unprecedented cyber security challenges for the automotive industry [5].

The controller area network (CAN) is the primary protocol for modern automotive communication [6]. However, the original design of the CAN protocol did not adequately account for network security. Its broadcast nature means that message transmissions are carried out publicly and lack authentication or encryption, making it vulnerable to potential malicious attacks. Through physical access points (e.g., OBD-II ports) or external connections (e.g., Wi-Fi, Bluetooth), attackers can easily launch malicious attacks that interfere with or disrupt the normal operation of vehicle systems [7]. For example, Koscher et al. [8] conducted a fuzzing attack on the CAN bus by injecting and modifying CAN packets. Miller et al. [9] used a Wi-Fi connection to hack into the in-vehicle networks (IVN) system of the 2014 Jeep Cherokee and control the vehicle via the CAN bus, successfully disabling critical functions such as braking or causing the vehicle to stall, which ultimately triggered a recall of 1.4 million vehicles. Therefore, protecting the IVN from potential threats has become a top priority, inspiring researchers to explore IVN protection countermeasures.

Currently, there are several approaches to enhancing the security of the CAN bus. One approach involves providing encryption and authentication mechanisms for CAN protocol [10,11,12,13]. Recent advancements in security mechanisms for the industrial internet have also shown potential for improving IVN security by enabling robust cross-domain authentication [14]. Another effective approach focuses on developing intrusion detection systems (IDS) [15, 16] to monitor, detect, and prevent potential network intrusions. The real-time requirements and bandwidth limitations of the CAN bus make the first approach less practical, whereas IDSs effectively address these challenges. Since the communication between ECUs typically follows specific frequencies, the statistical features of in-vehicle networks messages remain relatively stable. Consequently, researchers often treat consecutive CAN messages as windows and analyze their characteristics for intrusion detection, employing techniques such as information theory [17, 18], analysis of message sequences [19], and graph-based methods [20,21,22,23,24]. Among these, graph-based methods have gained increasing attention due to their ability to model the structural relationships between messages, providing a more comprehensive representation of network behavior.

Graph-based approaches construct a graph for a fixed number of CAN messages and extract features, such as the number of nodes, edges, and the maximum degree. These features capture the underlying patterns in the data, which are essential for accurate classification. Once features are extracted, intrusion detection is performed on each graph using either mathematical methods [20], machine learning techniques [21], or deep learning models [22,23,24]. While state-of-the-art graph-based approaches are effective, they primarily rely on basic structural features, which struggle to capture the subtle patterns of spoofing and replay attacks. By manipulating legitimate messages, these attacks introduce slight but significant changes to the network’s behavior, making them challenging to detect. Deep learning methods combined with graph theory have the potential to address some of these limitations by capturing more complex patterns. However, they also introduce higher computational complexity and reduced interpretability, making them less suitable for resource-constrained environments such as modern vehicles.

To address these limitations in existing graph-based IDSs, this paper proposes a graph-based decision tree intrusion detection system. By analyzing the characteristics of different attack types, we introduce three novel features—time difference, betweenness centrality [25], and graph density [26]—to enhance the detection of spoofing and replay attacks. Specifically, time difference detects anomalies in the timing of CAN messages, as replay attacks inject messages at irregular intervals. Betweenness centrality captures changes in node importance caused by spoofing attacks, which alter the flow of information by introducing multiple legitimate IDs. Graph density measures overall connectivity, which increases abnormally due to additional edges introduced by malicious nodes. These features improve detection accuracy by capturing subtle deviations caused by these attacks. For mixed attacks, we conducted experiments on different datasets and achieved high multi-class classification accuracy, demonstrating the strong generalization capability of our model. Furthermore, decision tree models provide a computationally efficient and interpretable solution, requiring fewer resources and making them well suited for lightweight detection systems. We also assess the importance of each feature by evaluating its impact on model performance. A significant increase in prediction error indicates that the feature is important, enhancing the model’s interpretability and reliability.

The main contributions of this research can be summarized as follows:

To the best of our knowledge, this is the first time a graph-based model has been introduced into a decision tree algorithm for CAN intrusion detection. Moreover, this is the first method to utilize betweenness centrality and graph density as graph features for detection. The proposed method also possesses strong interpretability.
The proposed method has excellent detection accuracy for most of the attacks, especially for spoofing and replay attacks, which is higher than the existing detection methods based on graph theory. Also, the proposed method demonstrates strong generalization and improved performance in multi-class classification. More importantly, the proposed method is lightweight and computationally efficient, with reduced training and prediction times, making it suitable for resource-constrained environments.
The proposed method can detect various types of attacks without requiring modifications to the CAN protocol. This makes it potentially applicable to a wide range of communication systems utilizing the CAN protocol, though its effectiveness may vary depending on specific system configurations and attack scenarios.

The rest of the paper is organized as follows. Section 2 reviews related work on IDSs. Section 3 provides a brief overview of the background of the research. Section 4 describes graph theory and decision trees in detail and introduces the implementation of the proposed GDT-IDS. Section 5 presents the results of GDT-IDS and compares the proposed IDS with existing methods. Finally, Sect. 6 concludes the paper.

2 Related work

IDS can monitor, detect, and defend against potential network attacks, and they are relatively easy to deploy. In recent years, extensive research has been conducted in this field, which can be categorized into two main approaches based on the CAN protocol: IDS based on physical layer characteristics and IDS based on data link layer characteristics.

2.1 IDS based on physical layer characteristics

Researchers have investigated CAN networks by leveraging their physical characteristics, such as voltage signals or clock deviations. Cho et al. [27] proposed a method called Viden, which uses voltage curves for ECU fingerprinting. This approach enables the quick and accurate detection of illegally accessed nodes. Similarly, Choi et al. [28] introduced VoltageIDS, which combines time-domain and frequency-domain characteristics of voltage signals with machine learning to perform anomaly detection. Zhao et al. [29] developed a clock-based IDS, named CIDS, which calculates clock skew as an ECU fingerprint for intrusion detection and ECU identification. Lee et al. [30] proposed an error-based IDS, which detects deviations in CAN message timestamp intervals to identify attacks. While these methods are effective at detecting physical layer intrusions, such as signal interference or voltage abnormalities, they are less effective at detecting more complex attacks in the data link layer.

2.2 IDS based on data link layer characteristics

Since communication between ECUs tends to follow a specific frequency, the features of the CAN message sequence are generally stable. When an attacker injects malicious messages, the stability of these features is disrupted, which can be detected using various methods. The most common and effective approaches employ both statistical methods and machine learning techniques to analyze these features and detect anomalies.

Statistical methods typically treat consecutive CAN messages as windows and analyze their features for intrusion detection. Müter et al. [17] introduced the first information entropy-based method for in-vehicle networks intrusion detection, demonstrating its effectiveness in detecting intrusions on the CAN bus. Marchetti et al. [18] and Wu et al. [31] further expanded on this approach by using a fixed time interval and a fixed number of messages as a sliding window for intrusion detection. Yu et al. [32] proposed a time interval conditional entropy-based IDS, which calculates the conditional entropy of CAN messages to effectively detect intrusions. Song et al. [33] introduced a similarity-based IDS that employs an improved Levenshtein distance and N-gram algorithms to enhance detection accuracy. Statistical methods are often computationally efficient and easy to implement. However, the features in the windows tend to be relatively simple, making them susceptible to interference caused by nonperiodic CAN messages, which can disrupt the stability of the calculated features.

Machine learning techniques, especially deep learning-based methods, have significantly improved the performance of IDS. These methods not only detect anomalies within fixed message windows but also enable message-by-message detection by capturing the relationships between consecutive CAN message sequences. Seo et al. [34] proposed a generative adversarial network (GAN) IDS, which is capable of detecting four types of attacks with a high detection rate. Similarly, Song et al. [35] introduced an IDS based on deep convolutional neural networks (DCNN), which transforms CAN bus data into a grid-like structure for intrusion detection. Desta et al. [36] proposed a convolutional neural network (CNN)-based IDS that trains a model using images generated from recursive graphs to detect intrusions in the vehicle network. Lo et al. [37] proposed a hybrid IDS combining CNN for spatial feature extraction and long short-term memory (LSTM) for capturing temporal dependencies, forming a CNN-LSTM architecture. While these methods have demonstrated strong performance in intrusion detection, they are generally ineffective against replay attacks due to the unique characteristics of such attacks. Additionally, they often come with a high computational cost, which may limit their deployment on resource-constrained embedded systems.

Graph-based methods have gained significant attention due to their ability to model the relationships between CAN message sequences in a more structured manner. Instead of directly analyzing the sequence, these methods convert the message sequence within a window into a graph structure, thereby diversifying the features. After extracting graph features, statistical methods or machine learning techniques can be employed for learning and detection. Islam et al. [20] introduced a graph-based intrusion detection method using a Chi-square test. In their approach, a graph is constructed for every 200 CAN messages, and features such as the number of nodes, edges, and maximum degree are extracted. The Chi-square test is then applied to detect anomalies in the graph. However, this method requires a large number of messages for attack detection, resulting in considerable detection delays. To address this issue, Islam et al. [21] proposed a graph-based Gaussian Naive Bayes (GGNB) IDS. This model demonstrated improved accuracy, reduced delays. Recently, in the field of deep learning, state-of-the-art IDSs have extended traditional neural network approaches to graph-structured data [38]. Devnath et al. [22] proposed a graph convolutional network (GCN)-based IDS to detect anomalies in graphs constructed from CAN messages. Zhang et al. [23] proposed a federated graph neural network (GNN) for fast anomaly detection in CAN bus networks, which utilizes federated learning to protect user data privacy while efficiently detecting attacks. Xiao et al. [24] proposed a graph node attention network (GNAT) to extract features from graph structures, which are then used for intrusion detection with random forest (RF), achieving high accuracy in identifying malicious messages.

Despite these advancements, existing graph-based methods still face limitations. They are often reliant on simple graph features such as the number of nodes, edges, and maximum degree, which are insufficient for capturing the complex patterns of attacks like spoofing and replay attacks. Additionally, combining graph theory with deep learning models further increases computational complexity compared to traditional neural networks, making them less suitable for lightweight systems. To overcome these shortcomings, we propose a graph-based decision tree intrusion detection system. In our approach, a graph is constructed for every 200 CAN messages, and several new features are introduced based on the unique characteristics of different attack types. Specifically, three novel features—time difference, betweenness centrality, and graph density—are designed to effectively capture the characteristics of spoofing and replay attacks, significantly enhancing detection accuracy. Finally, anomaly detection is performed using a decision tree model, which not only ensures lightweight detection but also provides interpretability, making it suitable in resource-constrained systems.

3 Background knowledge

3.1 CAN protocol and characteristics

The controller area network (CAN) is a serial communication protocol primarily used to enable communication between nodes in various embedded systems. The CAN protocol was originally developed by Robert Bosch GmbH in the early 1980 s to facilitate communication between different control units in automotive electronic systems. Due to its high performance, high reliability, and unique design, it is now widely used in many other fields. In 1991, the CAN specification (version 2.0) was developed and published [6]. Figure 1 shows the standardized format of a CAN data frame. As illustrated, the standard CAN frame consists of seven segments: Start of Frame (SOF), Arbitration Field, Control Field, Data Field, CRC Field, ACK Field, and End of Frame (EOF). Each segment is described as follows:

Start of Frame (SOF): It represents the beginning of the data frame, and it consists of a single ’dominant’ bit.
Arbitration Field: It represents the frame priority and consists of two parts: Identifier (ID) (11bit) and Remote Transfer Request (RTR) (1bit). In the arbitration process, the identifier is regarded as a priority and the RTR indicates the type of CAN frame.
Control Field: It consists of 6 bits, including two reserved bits, Identifier Extension (IDE) and r0, and 4 bits of Data Length Code (DLC).
Data Field: It contains the data content of the transmitted information, which can range from 0 to 8 bytes.
CRC Field: It ensures the validity of the message by checking for transmission errors using a Cyclic Redundancy Code (CRC).
ACK Field: It indicates whether the message has been successfully received.
End of Frame (EOF): It marks the end of the data frame.

The CAN protocol has several characteristics that make it widely used in automotive communication systems. However, these same characteristics also introduce significant security vulnerabilities [39]:

Broadcast Communication: The CAN protocol uses a broadcast communication mechanism, which allows all devices on the network to receive messages. However, this characteristic introduces a security risk, as an attacker who gains control of a single node can monitor and read all messages on the CAN bus. This can lead to data leakage and enable further attacks.
Priority-Based Arbitration: Each CAN message is assigned a priority based on its identifier, with higher-priority messages interrupting lower-priority ones during transmission. This arbitration mechanism can be exploited by attackers who continuously send high-priority messages, effectively blocking legitimate messages and causing a denial-of-service (DoS) attack.
Lack of Authentication: The CAN protocol lacks an authentication mechanism to verify the legitimacy of messages. This allows attackers to inject spoofed messages into the network, potentially issuing malicious commands to vehicle systems. For instance, a spoofed message could cause a vehicle to suddenly shift gears or stall, leading to dangerous situations.
Lack of Encryption: CAN messages are transmitted in plaintext, making them vulnerable to interception and tampering. Attackers can easily modify or forge messages, compromising the integrity of the data and enabling attacks such as replay or spoofing.

The vulnerabilities of the CAN protocol stem from its design focus on simplicity and real-time communication, which prioritizes performance over security. However, these vulnerabilities enable attackers to easily exploit the CAN bus, leading to severe security risks. The following section provides an overview of various types of CAN bus attacks.

3.2 CAN bus attack scenarios

The CAN bus is vulnerable to a variety of attacks, each with distinct characteristics and impacts on the system. Among the most common types are denial-of-service (DoS) attacks, fuzzy attacks, spoofing attacks, and replay attacks. These attacks exploit the inherent vulnerabilities of the CAN protocol, such as its lack of authentication and encryption mechanisms. Figure 2 provides a simple illustration of these attack types, which are described in detail below:

DoS Attacks: As shown in Fig. 2a, the attacker exploits the CAN bus arbitration mechanism by continuously sending a large number of high-priority messages (e.g., 0x000) into the network. This interference blocks legitimate messages from being transmitted in a timely manner, causing system performance to deteriorate and potentially leading to a complete system crash.
Fuzzy Attacks: As shown in Fig. 2b, the attacker generates messages with randomly assigned IDs and arbitrary data, injecting them into the network. These large volumes of meaningless messages can cause the electronic control unit (ECU) to behave abnormally or even paralyze vehicle functions. For instance, this may result in abnormal engine noise or unexpected power surges.
Spoofing Attacks: As shown in Fig. 2c, after analyzing and obtaining details of legitimate CAN messages in the network, the attacker injects forged CAN messages with specific functions into the network to control certain vehicle operations [40]. Examples include modifying the RPM gauge, changing gears, or other malicious actions. The injected messages in this attack typically have valid IDs but altered data segment content, making this attack more difficult to detect than the previous two.
Replay Attacks: As shown in Fig. 2d, the attacker captures CAN messages over a certain period of time and then re-sends these messages into the network to interfere with the normal operation of the vehicle. Since replay attacks involve retransmitting legitimate messages, it becomes challenging to distinguish between normal messages and those sent by the attacker.

Our proposed method can effectively detect the above four types of attacks, with particularly strong performance against spoofing and replay attacks. Furthermore, our detection results significantly outperform other graph-based methods.

4 Proposed methodology

We propose a graph-based decision tree IDS for in-vehicle CAN bus networks. An overview of the proposed methodology is shown in Fig. 3.

We conduct experiments using a dataset generated from a real vehicle. The dataset is created by simulating an attacker launching various types of attacks. From this dataset, we extract the CAN IDs of each message to construct thousands of directed graphs, where each graph represents the communication structure of a specific window. Next, we extract features from these graphs, including the number of nodes, edges, degree, betweenness centrality, and graph density. These features capture the structural and topological properties of the CAN bus communication network, which are critical for distinguishing between normal and malicious behavior. The extracted features are then used to train decision tree classifiers. These classifiers are subsequently applied for intrusion detection, determining whether the system is under attack. For mixed attacks, the proposed system can perform multi-class classification to identify the specific type of attack. The details of each step in the proposed methodology are described in the following sections.

4.1 CAN message graph generation

In this paper, we utilize CAN bus messages to construct directed graphs. For consistency and comparison, we create a graph for every 200 messages, following the window size defined in [20]. At the end of this paper, we will justify our choice of 200 as the window size by analyzing its impact on anomaly detection performance across different window sizes. Figure 4 illustrates a portion of the message sequence from the dataset, where a simple directed graph is constructed by using the ID of each message as a vertex and the order between messages as the edge connecting two vertices. As shown in Fig. 5, the message with ID 0430 follows the message with ID 0350, forming a directed edge from vertex 0350 to vertex 0430.

Algorithm 1 outlines the procedure for constructing directed graphs from the dataset. The algorithm processes consecutive messages by extracting their IDs as graph vertices, creating edges to represent message order, and building graphs when a specific number of messages (200) are reached. Key parameters like the graph’s adjacency list (G), the list of all graphs (GraphList), and timestamps (FirstTimestamp, LastTimestamp) are updated throughout the process. Once a graph is constructed, it is added to GraphList, and the process is reset for the next graph. After processing all messages, the algorithm returns the list of constructed graphs.

4.2 Graph feature and analysis of attack characteristics

In this paper, we analyze several key features of the constructed graphs to identify and characterize attacks on the CAN bus network. These features include nodes, edges, degree, time difference, betweenness centrality, and graph density. Each of these features is described in detail below.

4.2.1 Number of nodes and edges

The number of nodes and edges in a graph reflects its structural complexity. When an attacker launches attacks, such as fuzzy attacks, a large number of fake messages with randomly generated IDs are injected into the network. This significantly increases the number of nodes and edges in each graph. As illustrated in Fig. 6a and b, the distributions of nodes and edges in attacked graphs are clearly distinguishable from normal graphs. This distinction is particularly evident in the case of fuzzy attacks, where the number of nodes and edges increases substantially compared to normal graphs.

4.2.2 Degree

In graph theory, the degree of a node is categorized into in-degree and out-degree. In-degree is the number of edges terminating at a node, while out-degree is the number of edges originating from it. When an attacker launches a DoS attack by sending a large number of high-priority messages, such as those with ID 0x000, the arbitration mechanism of the CAN bus ensures that messages with ID 0x000 are processed first, followed by messages with other IDs. As a result, the number of edges originating from the node with ID 0x000 increases dramatically, leading to a significant rise in the maximum out-degree of the graph. As shown in Fig. 6c, the maximum out-degree of a DoS attacked graph is noticeably higher than that of a normal graph. A similar upward trend in the maximum out-degree can also be observed for other types of attacks.

While the previously discussed features are effective for detecting fuzzy and DoS attacks, they are less effective for replay and spoofing attacks. This is because these attacks do not inject high-priority or randomly generated IDs. Instead, they inject messages with multiple IDs, often using IDs that already exist in the original dataset. This behavior significantly increases the difficulty of attack detection.

To address this challenge, we analyzed the characteristics of these two attacks and propose three novel features—time difference, betweenness centrality, and graph density.

4.2.3 Time difference

The time difference represents the duration between the timestamp of the first message and the timestamp of the last message during the construction of a graph, which is essentially the time required to construct the graph. As shown in Fig. 7, an analysis of the time interval between consecutive messages in the replay attack dataset reveals that the time interval decreases dramatically after an attack. This observation highlights the sensitivity of replay attacks to time intervals, which inspired the use of time difference as a feature.

Under normal conditions, CAN bus messages are transmitted periodically, resulting in a relatively consistent time required to construct each graph. However, during replay attacks, the attacker injects a large number of messages within a short period of time. This behavior significantly reduces the time interval between consecutive messages. Consequently, since each graph is constructed using a fixed number of 200 messages, the overall time required to construct a graph decreases, leading to a smaller time difference.

As shown in Fig. 6d, a noticeable decreasing trend in time difference can be observed for both spoofing and fuzzy attacks. For replay attacks, the reduction in time difference is particularly significant. This makes time difference a critical feature for detecting replay attacks, as well as providing useful insights for identifying other types of attacks.

4.2.4 Betweenness centrality

Betweenness centrality [25] is a measure used to identify nodes that play a critical role in a network. For every pair of nodes in a connected graph, there exists at least one shortest path between them. If a node lies on the shortest paths of many node pairs in the network, it has a higher betweenness centrality, indicating that it may serve as a key bridge in the information transfer process. Betweenness centrality is defined by the following equation,

$$\begin{aligned} {{c}_{B}}\left( v \right) =\sum \limits _{s\ne v\ne t\in V}{\frac{{{\sigma }_{st}}\left( v \right) }{{{\sigma }_{st}}}} \end{aligned}$$

(1)

where V is the set of nodes, ${{\sigma }_{st}}$ represents the number of shortest paths between nodes s and t, and ${{\sigma }_{st}}\left( v \right)$ is the number of those paths passing through node v.

The betweenness centrality of a node is influenced by the total number of nodes in the graph. To enable a fair comparison of the relative importance of nodes across graphs with different sizes, the calculated value can be normalized. This normalization is achieved by dividing the betweenness centrality by the total number of node pairs excluding the node v, ensuring that ${{c}_{B}}\left( v \right) \in \left[ 0,1 \right]$. For directed graphs, the betweenness centrality is normalized using the following formula,

$$\begin{aligned} c_B^{\text {norm}}(v) = \frac{{{c}_{B}}\left( v\right) }{\left( n - 1\right) \left( n - 2\right) } \end{aligned}$$

(2)

where n is the total number of nodes in the graph. This normalization ensures that the betweenness centrality values are comparable across graphs of varying sizes and structures.

For spoofing attacks, the attacker injects messages with an already existing ID multiple times. This injection creates new edges and paths in the graph, increasing the betweenness centrality of the affected node. As shown in Fig. 8a, consider a simple directed graph with five nodes. The betweenness centrality of a node is calculated based on the number of shortest paths that pass through the node. For example, node 1 has a betweenness centrality of 6, which is normalized to 0.5000. After a spoofing attack, when an attacker injects a message with ID 1 after the message with ID 5, the betweenness centrality of node 1 increases significantly, from 0.5000 to 0.7500, as shown in Fig. 8b. This increase occurs because the added edge (5, 1) introduces more shortest paths through node 1.

As shown in Fig. 6e, the median of the maximum betweenness centrality increases significantly for spoofing and DoS attacks. In contrast, replay attacks exhibit a notable decrease in betweenness centrality. This is because replay attacks inject CAN messages from a specific time period that was previously extracted, and these messages typically carry new IDs. As a result, the graph structure becomes more dispersed, reducing the concentration of shortest paths through specific nodes and leading to lower betweenness centrality values. The distinct differences in the betweenness centrality distributions for DoS attacks, spoofing attacks, replay attacks, and normal graphs demonstrate its effectiveness as a key feature for detecting various types of attacks.

4.2.5 Graph density

In graph theory, density [26] is defined as the ratio of the actual number of edges in a graph to the maximum possible number of edges. For directed graphs, the maximum possible number of edges is given by $n\left( n-1 \right)$. Density can be described by the following equation,

$$\begin{aligned} \text{Density}=\frac{R}{N\left( N-1 \right) } \end{aligned}$$

(3)

where R represents the number of actual edges in the graph, and N represents the number of nodes in the graph.

Graph density quantifies the relationship between the actual number of edges and the maximum possible number of edges in a graph, with values ranging from 0 to 1. A density value close to 1 indicates a densely connected graph, where nodes are highly interconnected. As shown in Fig. 6f, the graph density increases during spoofing and replay attacks because the attacker injects messages with already existing IDs multiple times, thereby increasing the number of actual edges in the graph. Similarly, DoS attacks also lead to an increase in graph density due to the injection of a large number of messages with high-priority IDs. However, in the case of fuzzy attacks, the attacker injects messages with random IDs, significantly increasing the maximum possible number of edges in the graph. This causes a decrease in density, as the actual number of edges becomes relatively smaller compared to the maximum possible edges.

By analyzing these features, it is evident that each of the four attacks exhibits distinct behaviors across different graph-based metrics. For example, maximum out-degree provides the most significant differentiation for DoS attacks, while the number of edges is the most distinguishing feature for fuzzy attacks. For spoofing and replay attacks, features such as time difference, betweenness centrality, and density demonstrate clear differentiation. Furthermore, these features also effectively distinguish between fuzzy and DoS attacks. Overall, the proposed features capture the unique characteristics of each attack, enabling reliable detection and classification.

4.3 Graph-based decision tree IDS

4.3.1 Feature selection

The decision tree generally consists of three main components: nodes, edges, and leaf nodes. Specifically, each node represents a feature attribute used for decision-making, the edges connecting the nodes represent the divisions or conditions based on the values of the feature, and the leaf nodes represent the final classification labels or regression results.

The construction of a decision tree involves selecting the optimal features to split the data. This selection is typically guided by specific metrics, such as information gain, information gain ratio, or the Gini coefficient, with the goal of identifying features that best differentiate the data. In this paper, we adopt the CART (classification and regression trees) algorithm [41] for the classification task.

CART constructs a binary tree structure, where each node splits into two branches based on a specific value of a feature. The algorithm uses the Gini coefficient to measure impurity, where a lower Gini coefficient indicates a purer category in the dataset. A purer category implies a higher likelihood that the samples within a subset belong to the same class. At each step, the feature with the smallest Gini coefficient is selected as the best feature for splitting.

Suppose the dataset D has k classifications with a total number of samples $\left| D \right|$, the number of samples in each classification is $\left| {{D}_{1}} \right|$, $\left| {{D}_{2}} \right|$,...$\left| {{D}_{k}} \right|$. The probability that a sample belongs to class i is given by:

$$\begin{aligned} {{p}_{i}} = \frac{\left| {{D}_{i}} \right| }{\left| D \right| } \end{aligned}$$

(4)

The Gini coefficient for dataset D is defined as:

$$\begin{aligned} \text{Gini}(D) = \sum \limits _{i=1}^{k}{{{p}_{i}}}\left( 1 - {{p}_{i}} \right) = 1 - \sum \limits _{i=1}^{k}{{{p}_{i}}^{2}} \end{aligned}$$

(5)

For a given feature A, the dataset D is divided into two subsets, ${{D}_{1}}$ and ${{D}_{2}}$, based on a specific feature value. This ensures that the CART algorithm constructs a binary tree rather than a multi-way tree. The Gini coefficient for dataset D under the condition of feature A is calculated as:

$$\begin{aligned} \text{Gini}(D, A) = \frac{\left| {{D}_{1}} \right| }{\left| D \right| } \text{Gini}({{D}_{1}}) + \frac{\left| {{D}_{2}} \right| }{\left| D \right| } \text{Gini}({{D}_{2}}) \end{aligned}$$

(6)

For discrete feature variables, the decision tree enumerates all possible binary combinations of the discrete feature values and selects the combination with the smallest Gini coefficient for splitting. The remaining combinations are considered in subsequent splits. For continuous feature variables, the decision tree discretizes these features using the following method:

Assume there are m samples for a continuous feature A. First, the samples are sorted in ascending order as ${{a}_{1}}$, ${{a}_{2}}$,...${{a}_{m}}$. The CART algorithm computes the average of each pair of adjacent values to generate $m - 1$ division points, where the $i-th$ division point ${{T}_{i}}$ is defined as: ${{T}_{i}} = \frac{{{a}_{i}} + {{a}_{i+1}}}{2}$. For these $m-1$ division points, each point divides the sample into two parts, ${{D}_{1}}$ and ${{D}_{2}}$. The Gini coefficient for each division point is calculated as a binary classification point using Eq. (6). The division point with the smallest Gini coefficient is selected as the binary classification point for this continuous feature. This process is repeated for all features, and the feature with the smallest Gini coefficient across all features is chosen as the optimal feature. Then, the data is divided based on the selected division point. For example, suppose the feature with the smallest Gini coefficient is A, and its optimal division point is a. In this case, the dataset is divided into left and right subsets based on a, corresponding to the left and right child nodes of the decision tree, respectively. It is important to note that if a node is split based on a continuous feature, this feature can still participate in feature selection for subsequent nodes. This ensures that continuous features are not excluded from further splits and can continue to contribute to the construction of the tree.

4.3.2 Recursive construction

For the graph features proposed in this paper, all features are treated as continuous variables. The decision tree is constructed by recursively dividing the dataset into two subsets at each step, based on the optimal division point of a selected feature. This process continues until the tree construction is complete. For example, as shown in Fig. 9, the partial decision tree construction for spoofing attacks is illustrated. First, the Gini coefficient is calculated for all possible division points of the features. After computation, when the maximum betweenness centrality has a division point of 0.597, the Gini coefficient is found to be the smallest. Therefore, this division point is selected as the optimal split, and the dataset is divided into left and right child nodes accordingly. The same process is repeated for the left and right child nodes. Splitting continues until the Gini coefficient of a node becomes 0, indicating that all data within the node belong to the same category. At this point, the node is labeled as a leaf node, and no further splitting is performed. Once all nodes are split into leaf nodes, the decision tree construction is complete.

Algorithm 2 outlines the proposed intrusion detection algorithm. The input to the algorithm is the list of constructed graphs generated by Algorithm 1, and the output is the final classification result. In line 1, each type of feature list is initialized. In line 2, a total feature list is initialized to store all the individual feature lists. From lines 3 to 11, the algorithm iterates over each graph, calculates each feature separately, and appends the calculated values to the corresponding feature list. Finally, all feature lists are added to the total feature list. In lines 12 to 14, the model is trained using a decision tree classifier, with 70% of the data used as the training set and the remaining 30% as the test set. The trained model is then used to return the final classification result.

5 Experimental results and evaluation

5.1 Dataset

To evaluate the performance of the proposed model, we conducted experiments on four datasets: the Car-Hacking Dataset [35], the IVN Intrusion Detection Challenge Dataset [42], the Car Hacking: Attack & Defense Challenge Dataset [43], and the OpelAstra Dataset [44]. These datasets were selected to ensure the generalization of the model and to comprehensively test its ability to detect various types of attacks. We provide a detailed description of each dataset, including the types of attacks they contain and the data distribution used in our experiments.

Car-Hacking Dataset [35] contains four types of attacks: DoS attacks, fuzzy attacks, spoofing attacks targeting the drive gear, and spoofing attacks targeting the RPM gauge. This dataset is widely used for evaluating intrusion detection systems in automotive networks.

IVN Intrusion Detection Challenge Dataset [42] was included because the Car-Hacking Dataset does not contain replay attacks. This dataset simulates replay attacks on a real car, the KIA Soul, using its OBD-II port. It provides a realistic representation of replay attacks in in-vehicle networks, making it a valuable resource for testing intrusion detection systems.

Car Hacking: Attack & Defense Challenge Dataset [43] contains four types of attacks: spoofing attacks, fuzzy attacks, flooding attacks, and replay attacks. Notably, flooding attacks are a specific type of DoS attack, where a large number of high-priority messages are injected into the network to overwhelm the system. This dataset is specifically designed to evaluate the robustness of intrusion detection systems across diverse attack scenarios.

OpelAstra Dataset [44] contains a variety of attack types, making it suitable for testing the robustness and generalization of intrusion detection systems. The dataset includes the following types of attacks: DoS, diagnostic, fuzzing CAN ID, fuzzing payload, replay, and suspension attacks. For comparison with GGNB [21], we conducted experiments on this dataset.

The experimental data used in this paper includes normal messages from all of the above datasets. In addition, attack messages from the Car-Hacking Dataset (DoS, fuzzy, spoofing targeting the drive gear, and spoofing targeting the RPM gauge) and the IVN Intrusion Detection Challenge Dataset (replay attacks) were used. Using the method described above, graphs are constructed for these messages, resulting in approximately 48K, 49K, 52K, 53K, 33K, and 68K graphs for each type of attack, respectively. The detailed distribution of the experimental data after graph construction is summarized in Table 1.

Table 1 Distribution of experimental data

Full size table

5.2 Evaluation metrics

To evaluate the performance of the proposed model, we employ four widely used metrics: accuracy, precision, recall, and F1-score. These metrics are formulated as follows,

$$\begin{aligned} \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \end{aligned}$$

(7)

$$\begin{aligned} \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \end{aligned}$$

(8)

$$\begin{aligned} \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} \end{aligned}$$

(9)

$$\begin{aligned} F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \end{aligned}$$

(10)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. These metrics are crucial for evaluating classification models, as they provide a comprehensive understanding of the model’s performance in predicting both positive and negative categories. By analyzing these metrics, we can identify the strengths and weaknesses of the proposed model across different classification scenarios.

5.3 Window size selection

The window size is a user-defined parameter that can be adjusted based on the robustness requirements of the design. In our experiments with the Attack & Defense Challenge Dataset [43], we evaluated window sizes of 100, 150, 200, 250, and 300 to generate graphs and extract features for detection. The results, presented in Fig. 10, show that when the window size exceeds 200, the detection accuracy for various attack types stabilizes. To ensure consistency and facilitate comparison with previous graph-based methods, we selected a window size of 200 for our experiments.

Table 2 Comparison with existing graph-based methods

Full size table

Table 3 Comparison with GGNB on the OpelAstra Dataset

Full size table

Table 4 Time comparison with GGNB and GNAT

Full size table

Table 5 Comparison with existing deep learning methods

Full size table

Table 6 Comparison of FLOPs and parameters

Full size table

5.4 Comparison with existing graph-based methods

The experiments were conducted using a 12th Gen Intel(R) Core(TM) i5-12400 12-core processor. The proposed model was trained and tested using 70% and 30% of the data, respectively. Our method demonstrates excellent detection performance, achieving detection accuracies of 99.81%, 99.87%, 99.83%, 99.94%, and 99.43% for DoS, fuzzy, spoofing, replay, and mixed attacks, respectively. The confusion matrices are shown in Fig. 11a–e, respectively.

We compare our proposed model with existing graph-based methods on the Car-Hacking Dataset [35] and the IVN Intrusion Detection Challenge Dataset [42]. As shown in Table 2, our method demonstrates outstanding detection performance across all attack types. Specifically, it achieves a notably higher detection accuracy for spoofing and replay attacks compared to existing methods, demonstrating the effectiveness of three novel features—time difference, betweenness centrality, and graph density. Additionally, our approach demonstrates strong capability in handling multi-class classification problems, achieving high accuracy in detecting mixed attacks. This is attributed to the selection of the most effective features for detecting each specific attack type. To further evaluate the generalization of the model, we applied GDT-IDS to the Car Hacking: Attack & Defense Challenge Dataset [43], achieving an accuracy of 99.13% in detecting mixed attacks. The corresponding confusion matrix is presented in Fig. 12. Overall, our proposed method achieves the highest detection accuracy for all attack types, except for DoS, where it performs comparably to other methods, with all detection rates exceeding 99%.

In addition, we evaluated the proposed GDT-IDS on the OpelAstra Dataset [44], as summarized in Table 3. The detection accuracies for DoS, diagnostic, fuzzing CAN ID, fuzzing payload, replay, suspension, and mixed attacks are 100%, 99.92%, 99.92%, 100%, 100%, 96.16%, and 99.79%, respectively. While the detection accuracy for the suspension attacks is slightly lower than GGNB [21], our method outperforms GGNB in all other attack types.

Furthermore, we analyzed the time complexity of the decision tree model in both the training and prediction phases. The time complexity for the training phase is ${\mathcal {O}}(N \log N)$, where N represents the number of training samples. For the prediction phase, the time complexity for predicting a single sample is ${\mathcal {O}}(\log N)$, making the model highly efficient and well suited for real-time applications. In comparison, GGNB [21] employs Gaussian Naive Bayes (GNB), while GNAT [24] utilizes random forest (RF) to learn and detect anomalies from graph features. As shown in Table 4, our method requires slightly more training time than GGNB but is significantly faster than GNAT. Although the training time is marginally longer than GGNB, our method achieves significantly lower prediction time compared to both GGNB and GNAT, demonstrating its superior computational efficiency and suitability for real-time systems.

5.5 Comparison with existing deep learning-based methods

We compare our proposed model with existing deep learning-based intrusion detection methods using the Car-Hacking Dataset [35], evaluating performance metrics such as precision, recall, and F1-score. The detection results for each type of attack are summarized in Table 5. The results demonstrate that our model outperforms most existing methods across various attack types. Although some metrics are marginally lower than those of DCNN [35] and CNN-LSTM [37] models, these models are significantly more complex, requiring substantial computational resources and extensive data. As shown in Table 6, our model requires much fewer FLOPs (floating point operations) and parameters compared to DCNN and CNN-LSTM models. These results emphasize that despite achieving competitive performance, our model is more lightweight and suitable for deployment in resource-constrained environments.

Furthermore, due to the unique characteristics of replay attacks, most deep learning-based methods struggle to detect them effectively. These methods often fail to capture the temporal dependencies and subtle patterns associated with replay attacks, resulting in poor detection performance. In contrast, our proposed model excels in detecting replay attacks, achieving outstanding results with high accuracy and reliability. This highlights the detection capability and generalization ability of our approach, particularly in scenarios where traditional deep learning methods fall short.

To further evaluate the model’s interpretability, we conducted a feature importance analysis to identify the most impactful features for each attack type. Feature importance is measured by permuting a feature’s values in the test set and observing the resulting change in prediction error. An increase in prediction error indicates that the feature is important, as the model relies on it for accurate predictions. Conversely, if the prediction error remains unchanged, the feature is considered less important, suggesting that the model does not heavily rely on it for predictions.

As shown in Fig. 13, we also incorporate features such as maximum in-degree, minimum in-degree, and minimum out-degree for comparison with GGNB [21]. For DoS attacks, as shown in Fig. 13a, the maximum out-degree is the most influential feature, followed by the proposed features—betweenness centrality, time difference, and graph density. For fuzzy attacks, as shown in Fig. 13b, graph density is the most critical feature, followed by the number of nodes, time difference, and the number of edges. For spoofing and replay attacks, as shown in Fig. 13c and d, betweenness centrality and graph density are highly impactful. In particular, for replay attacks, time difference, betweenness centrality, and graph density contribute significantly more than other features. Finally, for mixed attacks, as shown in Fig. 13e, betweenness centrality, time difference, and graph density are the most important features influencing predictions. By analyzing feature importance for each attack type, we validate the effectiveness of the three proposed features—time difference, betweenness centrality, and graph density. These features are particularly critical for detecting all attack types, especially spoofing and replay attacks. This not only enhances detection performance but also improves the interpretability of our model.

6 Conclusion

In this paper, we proposed a novel graph-based decision tree intrusion detection system, named GDT-IDS. The method constructs a graph for every 200 CAN messages, and features such as the number of nodes, edges, and maximum degree are extracted from each graph. To address the limitations of existing methods in detecting replay and spoofing attacks, we introduced three new graph-based features—time difference, betweenness centrality, and graph density—which effectively capture the unique characteristics of these attacks. These features are then input into a decision tree classifier to identify and categorize intrusions. Experimental results demonstrate that GDT-IDS achieves excellent performance across various attack types. Notably, it significantly outperforms existing graph-based methods in detecting spoofing and replay attacks, showcasing its superior capability in identifying these specific threats. Furthermore, it achieves remarkably shorter prediction times, highlighting its efficiency. The proposed method also offers competitive performance compared to deep learning-based approaches, with the advantages of lower computational cost and enhanced interpretability. Unlike many deep learning methods, which struggle to capture temporal dependencies and fail to detect replay attacks effectively, GDT-IDS maintains high accuracy and reliability in detecting such attacks. These qualities make GDT-IDS a promising approach for detecting various attacks, offering a lightweight and efficient solution for vehicle intrusion detection.

While GDT-IDS effectively detects intrusion behavior within CAN message windows, its current focus is on identifying whether a window is under attack rather than pinpointing the specific message responsible for the anomaly. In future work, we aim to enhance the system by developing a lightweight yet precise approach capable of not only detecting anomalous windows but also identifying the specific malicious messages within them. This improvement will further refine the system’s capabilities, enabling it to more fully address the security challenges of IVN more comprehensively.

Data availability

The datasets used in this research are all publicly available and have been appropriately referenced. The rest of the data will be made available upon reasonable request.

References

Tuohy S, Glavin M, Hughes C, Jones E, Trivedi M, Kilmartin L (2014) Intra-vehicle networks: a review. IEEE Trans Intel Transp Syst 16(2):534–545
Article MATH Google Scholar
Nilsson DK, Phung PH, Larson UE (2008) Vehicle ECU classification based on safety-security characteristics. In: IET Road Transport Information and Control-RTIC 2008 and ITS United Kingdom Members’ Conference, pp 1–7. IET
Wang J, Liu J, Kato N (2018) Networking and communications in autonomous driving: a survey. IEEE Commun Surv Tutor 21(2):1243–1274
Article MATH Google Scholar
Siegel JE, Erb DC, Sarma SE (2017) A survey of the connected vehicle landscape-architectures, enabling technologies, applications, and development areas. IEEE Trans Intel Transp Syst 19(8):2391–2406
Article MATH Google Scholar
Liu J, Zhang S, Sun W, Shi Y (2017) In-vehicle network attacks and countermeasures: challenges and future directions. IEEE Netw 31(5):50–58
Article MATH Google Scholar
Bosch R et al. (1991) Can specification version 2.0. Rober Bousch GmbH, Postfach 300240
Checkoway S, McCoy D, Kantor B, Anderson D, Shacham H, Savage S, Koscher K, Czeskis A, Roesner F, Kohno T (2011) Comprehensive experimental analyses of automotive attack surfaces. In: 20th USENIX Security Symposium (USENIX Security 11)
Koscher K, Czeskis A, Roesner F, Patel S, Kohno T, Checkoway S, McCoy D, Kantor B, Anderson D, Shacham H et al. (2010) Experimental security analysis of a modern automobile. In: 2010 IEEE Symposium on Security and Privacy, pp 447–462 . IEEE
Miller C, Valasek C (2014) A survey of remote automotive attack surfaces. Black hat USA 2014:94
MATH Google Scholar
Van Herrewege A, Singelee D, Verbauwhede I (2011) Canauth-a simple, backward compatible broadcast authentication protocol for can bus. In: ECRYPT Workshop on Lightweight Cryptography, vol 2011, p 20. ECRYPT
Woo S, Jo HJ, Lee DH (2014) A practical wireless attack on the connected car and security protocol for in-vehicle can. IEEE Trans Intel Transp Syst 16(2):993–1006
MATH Google Scholar
Mundhenk P, Paverd A, Mrowca A, Steinhorst S, Lukasiewycz M, Fahmy SA, Chakraborty S (2017) Security in automotive networks: lightweight authentication and authorization. ACM Trans Des Autom Electron Syst 22(2):1–27
Article Google Scholar
Cui J, Shen Y, Zhong H, Zhang J, Liu L (2023) A multilevel electronic control unit re-encryption scheme for autonomous vehicles. IEEE Trans Intel Transp Syst 25:104–119
Article MATH Google Scholar
Chen H, Han X, Zhang Y (2023) Endogenous security formal definition, innovation mechanisms, and experiment research in industrial internet. Tsinghua Sci Technol 29(2):492–505
Article MATH Google Scholar
Wu W, Li R, Xie G, An J, Bai Y, Zhou J, Li K (2019) A survey of intrusion detection for in-vehicle networks. IEEE Trans Intel Transp Syst 21(3):919–933
Article MATH Google Scholar
Young C, Zambreno J, Olufowobi H, Bloom G (2019) Survey of automotive controller area network intrusion detection systems. IEEE Des Test 36(6):48–55
Article Google Scholar
Müter M, Asaj N (2011) Entropy-based anomaly detection for in-vehicle networks. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp 1110–1115. IEEE
Marchetti M, Stabili D, Guido A, Colajanni M (2016) Evaluation of anomaly detection for in-vehicle networks through information-theoretic algorithms. In: 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a Better Tomorrow (RTSI), pp 1–6. IEEE
Marchetti M, Stabili D (2017) Anomaly detection of can bus messages through analysis of id sequences. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp 1577–1583. IEEE
Islam R, Refat RUD, Yerram SM, Malik H (2020) Graph-based intrusion detection system for controller area networks. IEEE Trans Intel Transp Syst 23(3):1727–1736
Article Google Scholar
Islam R, Devnath MK, Samad MD, Al Kadry SMJ (2022) GGNB: graph-based Gaussian naive Bayes intrusion detection system for can bus. Veh Commun 33:100442
Google Scholar
Devnath MK (2023) Gcnids: graph convolutional network-based intrusion detection system for can bus
Zhang H, Zeng K, Lin S (2023) Federated graph neural network for fast anomaly detection in controller area networks. IEEE Trans Inf Forensics Secur 18:1566–1579
Article MATH Google Scholar
Xiao J, Chen H, Zhong F (2024) A novel feature extraction framework using graph node attention network for in-vehicle networks intrusion detection. IEEE Syst J 18:150
Article MATH Google Scholar
Brandes U (2008) On variants of shortest-path betweenness centrality and their generic computation. Soc Netw 30(2):136–145
Article MATH Google Scholar
Van Wijk BC, Stam CJ, Daffertshofer A (2010) Comparing brain networks of different size and connectivity density using graph theory. PloS One 5(10):13701
Article MATH Google Scholar
Cho K-T, Shin KG (2017) Viden: attacker identification on in-vehicle networks. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp 1109–1123
Choi W, Joo K, Jo HJ, Park MC, Lee DH (2018) Voltageids: low-level communication characteristics for automotive intrusion detection system. IEEE Trans Inf Forensics Secur 13(8):2114–2129
Article MATH Google Scholar
Zhao Y, Xun Y, Liu J (2022) Clockids: a real-time vehicle intrusion detection system based on clock skew. IEEE Internet Things J 9(17):15593–15606
Article Google Scholar
Lee S, Choi W, Jo HJ, Lee DH (2023) Errids: an enhanced cumulative timing error-based automotive intrusion detection system. IEEE Trans Intel Transp Syst 18:12406
Article MATH Google Scholar
Wu W, Huang Y, Kurachi R, Zeng G, Xie G, Li R, Li K (2018) Sliding window optimized information entropy analysis method for intrusion detection on in-vehicle networks. IEEE Access 6:45233–45245
Article MATH Google Scholar
Yu Z, Liu Y, Xie G, Li R, Liu S, Yang LT (2022) TCE-IDS: time interval conditional entropy-based intrusion detection system for automotive controller area networks. IEEE Trans Ind Inf 19(2):1185–1195
Article MATH Google Scholar
Song J, Qin G, Liang Y, Yan J, Sun M (2024) Sidildng: a similarity-based intrusion detection system using improved Levenshtein distance and n-gram for can. Comput Secur 142:103847
Article MATH Google Scholar
Seo E, Song HM, Kim HK (2018) Gids: Gan based intrusion detection system for in-vehicle network. In: 2018 16th Annual Conference on Privacy, Security and Trust (PST), pp 1–6. IEEE
Song HM, Woo J, Kim HK (2020) In-vehicle network intrusion detection using deep convolutional neural network. Veh Commun 21:100198
MATH Google Scholar
Desta AK, Ohira S, Arai I, Fujikawa K (2022) Rec-CNN: in-vehicle networks intrusion detection using convolutional neural networks trained on recurrence plots. Veh Commun 35:100470
Google Scholar
Lo W, Alqahtani H, Thakur K, Almadhor A, Chander S, Kumar G (2022) A hybrid deep learning based intrusion detection system using spatial-temporal representation of in-vehicle network traffic. Veh Commun 35:100471
Google Scholar
Chen H, Li X, Liu W (2024) Multivariate time series anomaly detection by fusion of deep convolution residual autoencoding reconstruction model and Convlstm forecasting model. Comput Secur 137:103581
Article Google Scholar
Rouf I, Miller R, Mustafa H, Taylor T, Oh S, Xu W, Gruteser M, Trappe W, Seskar I (2010) Security and privacy vulnerabilities of $\{$In-Car$\}$ wireless networks: A tire pressure monitoring system case study. In: 19th USENIX Security Symposium (USENIX Security 10)
Yang Y, Duan Z, Tehranipoor M (2020) Identify a spoofing attack on an in-vehicle can bus based on the deep features of an ECU fingerprint signal. Smart Cities 3(1):17–30
Article Google Scholar
Loh W-Y (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23
Article MATH Google Scholar
Han ML, Kwak BI, Kim HK (2018) Anomaly intrusion detection method for vehicular networks based on survival analysis. Veh Commun 14:52–63
MATH Google Scholar
Kang H, Kwak B, Lee YH, Lee H, Lee H, Kim HK (2021) Car hacking: attack and defense challenge 2020 dataset. IEEE Dataport
Dupont G, Lekidis A, Hartog JJ, Etalle SS (2019) Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2. 4TU.Centre for Research Data . http://doi.org/10.4121/UUID:B74B4928-C377-4585-9432-2004DFA20A5D

Download references

Acknowledgements

We would like to thank the Jilin Scientific and Technological Development Program and the Changsha Automobile Innovation Research Institute Free Exploration Program for their support of this work.

Funding

This work was supported in part by the Jilin Scientific and Technological Development Program under Grant 20210101166JC, in part by the Changsha Automobile Innovation Research Institute Free Exploration Program under Grant CAIRIZT20220104.

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, Jilin, China
Pengdong Ye, Yanhua Liang, Guihe Qin, Jiaru Song, Yingqing Wang & Wanning Liu
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, Jilin, China
Pengdong Ye, Yanhua Liang, Guihe Qin, Jiaru Song, Yingqing Wang & Wanning Liu
Center for Computer Fundamental Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, Jilin, China
Yutao Bie

Authors

Pengdong Ye
View author publications
Search author on:PubMed Google Scholar
Yanhua Liang
View author publications
Search author on:PubMed Google Scholar
Yutao Bie
View author publications
Search author on:PubMed Google Scholar
Guihe Qin
View author publications
Search author on:PubMed Google Scholar
Jiaru Song
View author publications
Search author on:PubMed Google Scholar
Yingqing Wang
View author publications
Search author on:PubMed Google Scholar
Wanning Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

Pengdong Ye proposed the main idea, designed the datasets, conducted the experiments, and wrote the manuscript. Yanhua Liang supervised the overall structure of the paper. Yutao Bie organized discussion meetings. All authors actively participated in discussion of the work and refine the results presented in the paper.

Corresponding authors

Correspondence to Yanhua Liang or Yutao Bie.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, P., Liang, Y., Bie, Y. et al. GDT-IDS: graph-based decision tree intrusion detection system for controller area network. J Supercomput 81, 591 (2025). http://doi.org/10.1007/s11227-025-07116-x

Download citation

Accepted: 22 February 2025
Published: 09 March 2025
DOI: http://doi.org/10.1007/s11227-025-07116-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

GDT-IDS: graph-based decision tree intrusion detection system for controller area network

Abstract

Similar content being viewed by others

Deep learning-based intrusion detection system for in-vehicle networks with knowledge graph and statistical methods

Intrusion Detection System for Connected Automobiles Security

Practical IDS on In-vehicle Network Against Diversified Attack Models

Explore related subjects

1 Introduction

2 Related work

2.1 IDS based on physical layer characteristics

2.2 IDS based on data link layer characteristics

3 Background knowledge

3.1 CAN protocol and characteristics

3.2 CAN bus attack scenarios

4 Proposed methodology

4.1 CAN message graph generation

4.2 Graph feature and analysis of attack characteristics

4.2.1 Number of nodes and edges

4.2.2 Degree

4.2.3 Time difference

4.2.4 Betweenness centrality

4.2.5 Graph density

4.3 Graph-based decision tree IDS

4.3.1 Feature selection

4.3.2 Recursive construction

5 Experimental results and evaluation

5.1 Dataset

5.2 Evaluation metrics

5.3 Window size selection

5.4 Comparison with existing graph-based methods

5.5 Comparison with existing deep learning-based methods

6 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords