Reconciling tuple and attribute timestamping for temporal data warehouses

Ahmed, Waqas; Gómez, Leticia; Vaisman, Alejandro; Zimányi, Esteban

doi:10.1007/s00778-024-00889-2

Reconciling tuple and attribute timestamping for temporal data warehouses

Regular Paper
Published: 21 December 2024

Volume 34, article number 11, (2025)
Cite this article

Download PDF

Access provided by China Pharmaceutical University

The VLDB Journal Aims and scope Submit manuscript

Reconciling tuple and attribute timestamping for temporal data warehouses

Download PDF

Waqas Ahmed¹,
Leticia Gómez²,
Alejandro Vaisman ORCID: orcid.org/0000-0002-3945-4187² &
…
Esteban Zimányi¹

284 Accesses
Explore all metrics

Abstract

Data Warehouses (DWs) requir e storing and analyzing time-varying data to reflect changes that occur in the business world. Solutions to this problem build on the field of temporal databases and adopt the tuple-timestamping approach, where tuples are timestamped with their validity interval. Alternatively, the attribute timestamping approach represents a time-varying attribute with a list of its evolving values and the time when these changes occurred. The SQL:2011 standard has favored the tuple timestamping approach, which has also been used for temporal DWs, despite that it yields very long and complex SQL queries. This paper aims at reconciling both approaches and advocates for a database that can support both models, in a way such that they complement each other. We show that, to efficiently operate with tuple timestamping, we need appropriate time data types and operations for representing and manipulating temporal elements. We also show that many applications are more naturally and efficiently modeled and implemented using attribute timestamping. To prove the feasibility of our proposal, we implemented a portion of the TPC-DS benchmark using three alternative approaches, two of them based on classic tuple timestamping (including the well-known slowly-changing dimensions model), and a third one, based on our proposal. For the latter, we used MobilityDB, a novel spatiotemporal database built on top of PostgreSQL, that integrates both models in a natural way. Experiments showed that our proposal outperformed the other two ones, in many cases, by orders of magnitude.

Time in Data Models

Temporal Data Management – An Overview

Slowly Changing Dimension Handling in Data Warehouses Using Temporal Database Features

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction and motivation

Business Intelligence (BI) systems aim at transforming large volumes of business data into knowledge that can be used for decision making. Typically, BI systems are supported by a Data Warehouse (DW), which integrates data coming from different sources [37]. A DW is typically represented using a multidimensional model (MD), where data are perceived as a collection of so-called facts (quantified by measures) and dimensions, along which, facts are analyzed by means of aggregation across hierarchies. Although it is usual to consider that facts are the only dynamic part of a DW, dimensions may change over time in many different ways [18]. Thus, a MD model must be able to handle time-varying data while also allowing non-temporal objects [17]. It must also include a collection of OLAP (Online Analytical Processing) operations to analyze both non-temporal and temporal data. Finally, all of the above should be implementable using existing technologies, mainly relational databases.

In DW practice, changes in the instances of the dimensions are usually handled using the concept of slowly changing dimensions (SCDs) [21], which extend a dimension table with two columns usually denoted FromTime and ToTime, representing, respectively, the start and end of the interval of validity of each tuple in the table. Since SCDs are an ad-hoc solution to the problem and do not consider most of the research in the domain of temporal databases [31], Ahmed et al. [1] proposed a temporal MD data model that allows storing non-temporal and temporal data and showed that it can be implemented in standard SQL.

Temporal relational databases usually follow the tuple-timestamping model [10], where tuples are timestamped with their validity interval, so that, whenever the value of some attribute changes, a new tuple is created in a table, allowing keeping track of the history of the database. This is basically the SCD approach. On the other hand, in the attribute timestamping approach [12], when an attribute value changes, the new value is inserted together with its timestamp, thus producing a list of (value,timestamp) pairs that represent the evolution of the value of the attribute across time. In spite of the many years of research in the field of temporal databases, tuple timestamping proved to be very difficult to put in practice so far [11]. However, adopting this solution was fueled by the fact that most databases in the market followed the first normal form (1NF) data model, and therefore, tuple timestamping was basically the only alternative for implementation. However, in a data warehousing scenario, tuple timestamping yields very complex and long SQL OLAP queries [1], as we show in Sect. 5 of this paper. On the other hand, attribute timestamping not only allows saving storage space in the database, but also promises a better performance for a wide range of queries. For example, in a stock-exchange database, the history of the value of a stock in the attribute-oriented approach is encoded in a single tuple. Thus, a table with the values of a stock in the NASDAQ-100 will just have one-hundred tuples, each one containing the historical time-series values. On the other hand, a table using tuple timestamping enconding will contain, for each stock, as many tuples as values of the stock have been registered. Clearly, the computation of the maximum value for a stock across time, would be much simpler using the first encoding. In light of the above discussion, the question that arises is: when would we prefer one approach over the other? We argue that the choice between attribute and tuple timestamping depends on the type of operation we face. We address this problem in this paper and advocate for a database that can support both models, in a way that they can complement each other.

In this paper, we show that, to efficiently operate with tuple timestamping, we need appropriate time data types for representing timestamps, sets of timestamps, periods, and sets of periods (called temporal elements). Further, we also need operations on these data types to implement a closed interval algebra allowing the representation of the results of union, intersection, and difference of time intervals. For example, suppose an employee $e_1$ works in a project $p_1$ in the intervals [1, 4] and [7, 18]. There is also employee $e_2$ who worked in the same project in the intervals [3, 4], [8, 9] and [12, 14]. Now, we want to ask for the periods when they worked together in the same project. With tuple timestamping, the input to the query would need five tuples, one per employee and per period. These tuples will be combined to obtain the result. Opposite to this, with attribute timestamping, the input will be composed of two tuples and the result will be comprised in a single tuple.

The data types and operations mentioned above are provided by MobilityDB [41], a novel database that builds on PostgreSQL^{Footnote 1} and its spatial extension PostGIS.^{Footnote 2} MobilityDB extends the type system of PostgreSQL and PostGIS with data types for representing spatiotemporal data. These data types are based on the notion of temporal types and their associated operations. Although MobilityDB was originally aimed at mobility analysis, its temporal types can represent the evolution over time of any kind of data, like integer, float, Boolean, and text. Thus, new data types like temporal integer (tint), temporal float (tfloat), and temporal Boolean (tbool) are defined, along with functions to manipulate them. This makes MobilityDB a temporal database supporting both, tuple and attribute timestamping. In this paper, we show that we can exploit this capability to express all temporal algebra operations [29, 40] in a concise, natural way, producing queries that outperform the traditional approach in many cases. This is relevant in heavy aggregation scenarios, like DW environments. Our work aims at showing that, since many applications (in particular the ones requiring temporal aggregation) can be naturally and efficiently modeled and implemented using attribute timestamping. Thus, supporting both approaches opens a wide range of design and implementation possibilities, closing the gap produced so far by the lack of alternatives to tuple timestamping. We illustrate this idea through an example, shown in Fig. 1, which we will also use in Sect. 2. For clarity, a tabular version of this figure is shown in Fig. 1.

Figure 1 shows a portion of three fictitious time series representing, for instance, the value of stocks in the NASDAQ-100. The upper part of the figure shows an attribute timestamped representation in a table Stocks(StockId, Price). Attribute Price is of type tfloat, therefore the table contains three records that represent the twenty-five values, and each record contains the time series of each stock’s price. The lower part of the figure shows an equivalent tuple timestamped representation, that requires twenty-five timestamped records. Clearly, the representation in the upper part of the picture is more compact and also more appropriate to compute, for example, the maximum or average of stock prices. However, combining both approaches may be beneficial for some other operations, as we discuss next.

Suppose now that we want to partition the Stocks table, for example, to distribute records into three nodes or cores, for parallel computation. We want each partition to contain approximately the same number of values (e.g., partitions P1 and P2 will contain eight values and partition P3 nine). We thus need an adaptive partition, not one that considers equal time periods. This operation is called adaptive time binning. A key step in the binning computation finds the time limits of each partition. This computation over the attribute-timestamped Stocks table, would require a sweepline algorithm that scans all the time instants, which would be very inefficient. Instead, we can unnest the table into a tuple-timestamped representation and then compute the time limits of each partition. Finally, with the partitions already defined, we can go back to the original representation and use MobilityDB functions to efficiently find the parts of each series that fall into each partition. We leave the details to Sect. 2.

Paper Organization. Section 2 provides a theoretical framework for studying the problem, where we define a formal abstract model that supports attribute and tuple timestamping. In Sect. 3 we present the MobilityDB database as a concrete implementation of the abstract model. We also show the plausibility and benefits of the reconciliation approach. Section 4 discusses temporal DWs and presents the running example we use in the paper, which we implement in two classic temporal DW models and also using MobilityDB. In Sect. 5 we study how the classic temporal algebra operations, namely, temporal selection, projection, join, union, difference, and aggregation, are expressed over the three implementations. Since the implementation in SQL of these operations was studied in [29, 40], we considered them as a basis of our study. Section 6 presents the semantics of the temporal version of the most used classic OLAP operations namely roll-up and dice. Over the three models above, in Sect. 7 we define six representative queries, implementing the temporal OLAP operations previously mentioned. We perform experiments using the temporal algebra and OLAP queries and report the results in Sect. 8. Section 9 discusses related work in the context of the paper. We conclude in Sect. 10.

2 An abstract model for temporal databases

In this section we present an abstract model for temporal databases that supports both tuple and attribute timestamping. Following Güting [16], the term abstract refers to the fact that the definitions are made in terms of infinite sets. Therefore, we can get rid of the nuances of the implementation, which we address in the concrete model given in Sect. 3.

Temporal databases associate time with real-world objects and events in several ways [30]. Valid time (VT) refers to the time when a piece of information is considered valid in the real world. Transaction time (TT) represents the time when a piece of information is stored in the system. Transaction time represents the state of the database at different points in time and it is mainly used for roll-back or auditing purposes. Valid and transaction times are orthogonal, and can be used either individually or combined to define bitemporal time (BT). Further, lifespan (LS) represents the time when an object exists. The lifespan of an object can be seen as the valid time of a fact telling that the object exists. In the remainder, for the sake of simplicity and without loss of generality, we only consider valid time and lifespan.

2.1 Modeling time

Definition 1

(Time domain) The time domain $\mathcal {T}$ is a discrete and linearly ordered set of elements called instants. Instants are isomorphic (i.e., structurally similar) to the natural numbers. A special instant now in $\mathcal {T}$ denotes the current time instant. $\square $

Definition 2

(Time values) A time value $\tau \subset \mathcal {T}$ is a finite subset of the time domain $\mathcal {T}$. $\square $

Definition 3

(Set operations on time values) Let $\tau _1,\tau _2\subset \mathcal {T}$ be time values. The union ($\cup $), intersection ($\cap $), and difference (−) of time values $\tau _1$ and $\tau _2$ are defined as the traditional set operators. For example, given a set of time values $\{\tau _1,\ldots ,\tau _n\}$ their union is defined as $\textstyle \bigcup _{i=1}^n \tau _i$. $\square $

Figure 3a illustrates the operations on time values.

We define next a set of base data types, which have their usual interpretation except that their domain is extended with the value $\bot $ (undefined), which corresponds to the NULL value in standard SQL.

Definition 4

(Base types) We assume a set $\mathcal {B}$ of base types that includes the types bool, int, real, and string.^{Footnote 3} The domains of these base types are as follows.

$$\begin{aligned} \textrm{dom}(\textsf {bool})= & \{\text{ true },\text{ false }\}\cup \{\bot \} \\ \textrm{dom}(\textsf {int})= & \mathbb {Z}\cup \{\bot \} \\ \textrm{dom}(\textsf {real})= & \mathbb {R}\cup \{\bot \} \\ \textrm{dom}(\textsf {string})= & \mathbb {S}\cup \{\bot \} \text{ where } \mathbb {S} \text{ are } \text{ all } \text{ finite } \text{ strings }\\ & \text{ over } \text{ a } \text{ finite } \text{ alphabet }\, \mathcal {A}. \end{aligned}$$

$\square $

2.2 Tuple timestamping

We now formally define tuple-timestamped relations and the relational algebra operations on them. In what follows, capital letters denote schema elements and lowercase letters refers to instances. Further, $\bar{A}$ represents schema tuples while $\bar{a}$ represents instance tuples.

Definition 5

(Tuple-timestamped temporal relation) Let $R(\bar{A}:\bar{B}) =$ $R(A_1\!:\!B_1,\ldots ,A_n\!:\!B_n)$ be a relation scheme where values of attribute $A_i$ are defined over $\textrm{dom}(B_i), B_i~\in \mathcal {B}$, for $i=1,\ldots ,n$. A tuple-timestamped temporal relation r on R is defined as:

$$\begin{aligned} & r\subset \{\bar{a}/\tau \mid \bar{a}=(a_1,\ldots ,a_n)\in \text {dom}(B_1) \times \ldots \times \\ & \quad \text {dom}(B_n)\wedge \tau \text { is a finite subset of } \mathcal {T}\}. \end{aligned}$$

$\square $

For a tuple $\bar{a}/\tau $, we denote the value component by $\nu (\bar{a}/\tau ) = \bar{a}$ and the time component by $\tau (\bar{a}/\tau ) = \tau $. Similarly, for a temporal relation r we denote $\nu (r) = \{\bar{a} \mid \bar{a}/\tau \in r\}$ and $\tau (r) = \{\tau \mid \bar{a}/\tau \in r\}$.

A temporal relation r may have a set of value-equivalent tuples $\{\bar{a}/\tau _1,\ldots ,$ $\bar{a}/\tau _n\}$ for $n>1$, such that their value $\bar{a}$ coincide but their temporal part $\tau _i$ do not. In this case, a coalescing operation must be applied [8] to remove the redundancy by replacing the above set of tuples with a single one $\bar{a}/\tau $, where $\tau $ is the union of the time values $\tau _i$. We define next the COALESCE operation, following the ideas in [39].

Definition 6

(Coalesce operation) Given a temporal relation r, we define $\textit{COALESCE}\,(r)=r^c$, where

$$\begin{aligned} & r^c = \{ \bar{a}/\tau \mid \bar{a}\in \nu (r),\, \{\bar{a}/\tau _1,\ldots ,\bar{a}/\tau _n\}\subseteq r, n\ge 1, \\ & \quad \text {are all the tuples in}\, r\, \text {whose value is } \bar{a} \wedge \\ & \quad \tau = \textstyle \bigcup _{i=1}^{n} \tau _i\}. \end{aligned}$$

$\square $

In what follows, we assume that relations are always coalesced. We define now the algebraic operators over temporal relations, which generalize the classical ones. Redundancies in the resulting relation are removed by means of the COALESCE operation.

Definition 7

(Temporal selection) Let r be a temporal relation of schema $R(\bar{A})$ and $\varphi $ be a Boolean selection formula on the attributes in $\bar{A}$ and/or the time component $\tau $ of the tuples in r. Then,

$$\begin{aligned} \sigma _\varphi (r)= & \textit{COALESCE}\,(\{\bar{a}/\tau \mid \bar{a}/\tau \in r \wedge \varphi (\bar{a}/\tau )\}) \end{aligned}$$

$\square $

Definition 8

(Temporal projection) Let r be a temporal relation of schema $R(\bar{A},\bar{B})$. Then,

$$\begin{aligned} \pi _{\bar{A}}(r)=\textit{COALESCE}\,(\{\bar{a}/\tau \mid (\exists \bar{b})((\bar{a},\bar{b})/ \tau \in r)\}) \end{aligned}$$

$\square $

Definition 9

(Temporal join) Let $r_1$ and $r_2$ be two temporal relations of schemas $R_1(\bar{A})$ and $R_2(\bar{B})$, and $\varphi $ be a Boolean join formula involving the attributes in $\bar{A}$ and $\bar{B}$. Then,

$$\begin{aligned} r_1\bowtie r_2= & \textit{COALESCE}\,( \{(\bar{a},\bar{b})/\tau \mid \bar{a}/\tau _1 \in r_1 \wedge \\ & \bar{b}/\tau _2 \in r_2 \wedge ~\tau = \tau _1\cap \tau _2\wedge \varphi \}) \end{aligned}$$

$\square $

Definition 10

(Temporal union) Let $r_1$ and $r_2$ be two domain-compatible temporal relations. Then,

$$\begin{aligned} r_1\cup r_2 = \textit{COALESCE}\,(\{c/\tau \mid c/\tau \in r_1 \vee c/\tau \in r_2\}) \end{aligned}$$

$\square $

Definition 11

(Temporal difference) Let $r_1$ and $r_2$ be two domain-compatible temporal relations. Then, $r_1-r_2=\textit{COALESCE}\,( \{\bar{c}/\tau \mid \bar{c}/\tau _1 \in r_1 \wedge ([\bar{c}/\tau _2 \in r_2$ $~~\qquad \wedge \tau =\tau _1 - \tau _2]\vee [\bar{c}\not \in \nu (r_2) \wedge \tau =\tau _1]\,)\})$ $\square $

We define next the notion of temporal aggregation, which extends traditional aggregation. In this paper, we consider the five typical SQL aggregation functions, namely, count, min, max, sum, and avg, and their generalization to the temporal domain. Other aggregation functions can be generalized similarly.

Consider for example, the non-temporal relation

$$\begin{aligned} \textsf {Employee(SSN, FName, LName, Salary, DNo)} \end{aligned}$$

and the query “Number of employees and maximum salary by department.” This query can be expressed by the following aggregation

$$\begin{aligned} \mathcal {A}_{\textsf{DNo},\textsf{count}(\textsf{SSN}),\textsf{max}(\textsf{Salary})}(\textsf{Employee}). \end{aligned}$$

Here, DNo is the grouping attribute, which partitions the relation Employee in as many relations as there are distinct DNo values. Then, the expressions count(SSN) and max(Salary) are computed in each partition.

The following definition formalizes the above. We assume that every base type $B\in \mathcal {B}$, has a set of aggregate functions $\mathcal {F}_B$ such as $\textsf{min}(\cdot )$, $\textsf{max}(\cdot )$, $\textsf{sum}(\cdot )$, etc., that operate over sets of values in $\textrm{dom}(B)$.

Definition 12

(Aggregation) Let r be a non-temporal relation over a schema $R(\bar{A})$, and let $\bar{B},\bar{C}\subset \bar{A}$, $\bar{B} \cap \bar{C} = \varnothing $. $\bar{B}$ is a set of grouping attributes that defines a partition $\mathcal {P}_{\bar{B}}(r)=\{P_1,\ldots ,P_m\}$ of r, where

$$\begin{aligned} \begin{array}{l} \pi _{\bar{B}}(r) = \{\bar{b}_1,\ldots ,\bar{b}_m\}, \text { and} \\ P_{i} = \pi _{\bar{B},\bar{C}}(\sigma _{\bar{B}=\bar{b}_i}(r)), \text { for } i=1,\ldots ,m. \end{array} \end{aligned}$$

Also, let $\bar{f}(\bar{C})=(f_1(C_1),\ldots ,f_n(C_n))$, $f_i\in \mathcal {F}_{\textrm{dom}(C_i)}$, be aggregate functions over the attributes $C_i$. An aggregation of r is an operation of the form $\mathcal {A}_{\bar{B},\bar{f}(\bar{C})}(r)$ defined as follows.

$$\begin{aligned} \mathcal {A}_{\bar{B},\bar{f}(\bar{C})}(r)= & \textstyle \bigcup _{i=1}^{m} \mathcal {A}_{\bar{f}(\bar{C})}(P_i), \text { and}\\ \mathcal {A}_{\bar{f}(\bar{C})}(P_i)= & \{(\bar{b}_i,\bar{f}(\bar{c}))\mid \bar{c} \in \pi _{\bar{C}}(P_i)\} \end{aligned}$$

$\square $

We now generalize Def. 12 to the temporal case.

Definition 13

(Temporal aggregation) Let r be a temporal relation of schema $R(\bar{A})$, and let $\bar{B}$, $\bar{C}$, and $\bar{f}(\bar{C})$ be as in Def. 12, except that r is partitioned by the value component $\nu (r)$ of r, where each partition $P_i$ has a time component $\tau (P_i) = \{\tau _{1}^i,\ldots ,\tau _{m_i}^i\}$ that defines a timeline $\mathcal {T}(P_i) = \textstyle \bigcup _{j=1}^{m_i}\tau _{j}^i$ for each partition. Then, a temporal aggregation $\mathcal {A}_{\bar{B},\bar{f}(\bar{C})}(r)$ is defined by

$$\begin{aligned} \begin{array}{l} \mathcal {A}_{\bar{B},\bar{f}(\bar{C})}(r) = \textstyle \bigcup _{i=1}^{m} \textit{COALESCE}\,(\mathcal {A}_{\bar{f}(\bar{C})}(P_i)), \text { and}\\ \mathcal {A}_{\bar{f}(\bar{C})}(P_i) = \textstyle \bigcup _{j=1}^{m_i} \mathcal {A}_{\bar{f}(\bar{C})}(P_i,\tau _j) \text { for all }\tau _j\in \mathcal {T}(P_i) \\ \mathcal {A}_{\bar{f}(\bar{C})}(P_i,\tau _j)= \{(\bar{b}_i,\bar{f}(\bar{c}))/\tau _j \mid \bar{c}/\tau \in \pi _{\bar{C}}(P_i)\wedge \tau _j\in \tau \} \end{array} \end{aligned}$$

$\square $

As shown above, the temporal aggregation of each partition is obtained by applying the non-temporal aggregation to each instant in the timeline of the partition. This corresponds to the sequenced operations in [29].

Example 1

Consider a temporal version of the Employee table above, now with three tables:

$$\begin{aligned} \begin{array}{l} \textsf {Employee(SSN, FName, LName)} \\ \textsf {EmpSal(SSN, Salary, Time)} \\ \textsf {EmpDept(SSN, DNo, Time)} \\ \end{array} \end{aligned}$$

where Employee is a non-temporal table and both EmpSal and EmpDept are temporal tables whose temporality is kept in the Time column.

The query “Time when a department has at least one employee” can be expressed as follows:

$$\begin{aligned} \mathcal {A}_{\textsf{DNo},\textsf{union}(\textsf{Time})}(\textsf{EmpDept}). \end{aligned}$$

The result of this aggregate operation for a single partition is illustrated in Fig. 3b, where the $\tau _i$ are the time components of the partition and thus each one represents the period during which an employee was affiliated to the given department. The union of the intervals is shown below $\tau _3$. $\square $

2.3 Attribute timestamping

We define next the attribute timestamping temporal model. The model builds on the definition of temporal attributes, which are attributes whose value evolves over time. The value of a temporal attribute is a partial function that maps instants of the time domain to values of the domain of its base type. A non-temporal attribute can be considered a particular case of a temporal one, whose values are represented as a constant function that maps all instants of the time domain to a single value in the domain of its base type. More formally, for each base type $B\in \mathcal {B}$ in Def. 4, the corresponding temporal type is defined using a temporal type constructor $\theta (\cdot )$. Examples are $\theta (\textsf {int})$ and $\theta (\textsf {real})$. We denote temporal types by adding the prefix ‘t’ to their base type, such as tint or treal.

Definition 14

(Temporal domain of attributes) The temporal domain of an attribute A with base type B, denoted $\text {tdom}(A)$, is defined as follows:

$$\begin{aligned} \{f \mid f: \mathcal {T} \rightarrow \text {dom}(B) \text{ is } \text{ a } \text{ partial } \text{ function }\}\end{aligned}$$

$\square $

Intuitively, the temporal domain of an attribute A defines all possible functions f that assign to each element in the time domain $\mathcal {T}$ a value in the domain of the base type B. The notion of temporal domain can be easily generalized for constant (i.e., non-temporal) attributes, as shown next.

Consider table Employee(SSN, Name, Salary), where Salary is the only temporal attribute. Suppose that an employee John has a salary of 30K during the interval $[t_1,t_2)$, 35K during the interval $[t_2,t_3)$, and undefined elsewhere. The value of this attribute for John can be defined as a function $f_1: \mathcal {T}\rightarrow \mathbb {R}$ as follows

$$\begin{aligned} f_1 = \left\{ \begin{array}{ll} 30 & \text{ for } t_1 \le t< t_2\\ 35 & \text{ for } t_2 \le t < t_3\\ \bot & \text{ otherwise } \end{array} \right. \end{aligned}$$

On the other hand, the temporal domain of the Name attribute can be defined as the set of constant functions $\mathcal {T}\rightarrow \mathbb {S}$ that assign a single value $s\in \mathbb {S}$ for every element in $\mathcal {T}$. In the case of John, we can define his name as a constant function $f_2$ as follows

$$\begin{aligned} f_2 = \hbox {`John'} \text{ for } \text{ all } t \in \mathcal {T} \end{aligned}$$

Therefore, without loss of generality, we define temporal relations in the attribute timestamping model considering only the temporal domains of the attributes.

Definition 15

(Attribute-timestamped temporal relation) Let $R(\bar{A}\!:\!\bar{B}) = R(A_1:B_1,\ldots ,A_n:B_n)$ be a relation scheme where attribute $A_i$ is defined over the base type $B_i\in \mathcal {B}$, for $i=1,\ldots ,n$. Then, an attribute-timestamped temporal relation r on R is defined as follows:

$$\begin{aligned} & r \subset \{ \bar{a} \mid \bar{a}=(a_1,\ldots ,a_n)\in \text {tdom}(B_1)\times \ldots \times \\ & \quad \text {tdom}(B_n)\}. \end{aligned}$$

$\square $

Next, we define the range type constructor $\rho (\cdot )$ that allows representing set of disjoint intervals of time or base values. The range constructor is only applicable to types that have a total order <. Intervals have a lower and an upper bound, which can be either open or closed and have their usual meaning. We denote open bounds by ‘(’ and ‘)’, and closed bounds by ‘[’ and ‘]’. Examples of intervals are $[l_1, u_1)$ or $(l_2, u_2]$.

Definition 16

(Range domain) Let $B\in \mathcal {B}$ be a base type to which the range type constructor $\rho $ is applicable. The range domain is defined as follows.

$$\begin{aligned}\begin{array}{l} \text {rdom}(B) = \{\rho \mid \rho = \{\rho _1,\ldots ,\rho _n\}, n\ge 1, \rho _i = \langle l_i, u_i\rangle , \\ \hspace{3mm}l_i,u_i\in \text {dom}(B), l_i\le u_i, \text{ and }\, `\langle ' \text{ is } \text{ either } `[' \text{ or } `('\\ \hspace{3mm}\text{ and } `\rangle ' \text{ is } \text{ either } `]' \text{ or } `)' \} \end{array}\end{aligned}$$

$\square $

We are now ready to define a collection of temporal operations over the abstract model. Following Güting [16], we define these operations by means of signatures over infinite sets. In Sect. 3 we will show that the MobilityDB database provides a concrete straightforward implementation of the operations defined in this way, where only finite representations are used.

Table 1 Classes of operations on temporal types

Full size table

Table 1 shows a set of operations associated with temporal types grouped in several classes, while Table 2 depicts their signatures. Some of these operations are discussed next.

Table 2 Signature of operations on temporal types

Full size table

Operations getTime and getValues return, respectively, the projection of a temporal value into its domain and range, which result both in a range value. Operations atTime and atValues restrict the function to a given subset of the time or base domain defined by a range value. Operations atMin and atMax restrict the function to the points in time when its value is minimal or maximal, respectively. Operation valueAtTimestamp gets the base value of the function at a given timestamp.

Generalizing operations on base types for temporal types is called lifting [16]. As illustrated in Table 2, an operation for base types (e.g., $+$) is lifted by allowing any of the arguments to be a temporal type and return a temporal type. The semantics of lifted operations is that the result is computed at each time instant using the non-lifted operation. Lifted operations correspond to sequenced operations [29] in tuple timestamping.

Two interpretations may be considered when applying a lifted operation to two temporal values defined over different time extents. The first one considers that the result is defined in the intersection of both extents and it is undefined elsewhere. The second interpretation considers that the result is defined in the union of the two extents and a default value (e.g., 0 for $+$) is used for combining over the extents that belong to only one temporal value. We apply the first interpretation when combining two temporal values and apply the second one to temporal aggregation.

Finally, aggregate operations may also be lifted. Examples are tCount, tMin, tMax, and tAvg, which combine several temporal values, yielding a new temporal value where the traditional aggregate functions count, min, max, and avg are computed at each instant. We illustrate this with the tCount operation in Fig. 3b, explained in the next example.

Example 2

As a follow-up of Ex. 1, consider the query “Evolution over time of the number of employees by department.” This query can be expressed by the following aggregation:

$$\begin{aligned} \mathcal {A}_{\textsf{DNo},\textsf{tCount}(\textsf{Time})}(\textsf{EmpDept}).\end{aligned}$$

Fig. 3b shows the result of this aggregation represented as a temporal value (a tint). Since the timeline is discrete (recall Def. 1), each $\tau _i$ is actually a set of instants, thus, the traditional count is applied at each instant of the timeline of the result. $\square $

2.4 Reconciling tuple and attribute timestamping

We explain next how the tuple- and attribute- timestamping approaches can be reconciled, so that users can choose the approach that best fits the application requirements, while being able to easily switch between them. Figure 1 shows a concrete example of two different ways of encoding the same information using either attribute- or tuple-timestamping. At a more conceptual level, Fig. 4 depicts a commutativity diagram illustrating the fact that we can encode temporal information using either tuple or attribute timestamping while leveraging efficient mechanisms to transform between the two representations. For example, we can start with a tuple-timestamped representation, transform it into an attribute-timestamped one, apply a temporal operation, and finally transform the result back into a tuple timestamped representation. In this way, users can choose the most appropriate representation on a case-by-case basis. We explain next how the tuple- and attribute- timestamping representations complement each other by showing how some operations are more efficiently computed in one of the representations.

Consider temporal aggregation which is a costly operation[6, 7]. Example 2 shows that, starting from a tuple-timestamping representation, we can represent the result of a temporal aggregation as a temporal value. The reason for this is that temporal aggregation is more efficiently computed using attribute-timestamping. To take advantage of the above, we need to be able to switch between representations when required. Figure 5 shows how we can use the unnest operation to transform from an attribute- to a tuple-timestamping representation.

As another example, we show next how both representations can be used to solve the partition problem introduced in Sect. 1 and Fig. 1, where we have table Stocks(StockId, Price) with three time series of stock prices represented using attribute timestamping. To partition this table into three tables with approximately the same number of couples (value,timestamp), we use tuple timestamping to compute the period covered by each one of the three partitions. For this, we first create the table AdaptiveBins as follows.

Table Times extracts all timestamps in the Stocks table. Table MaxTime obtains the maximum timestamp in the table. Table Bins1 applies the NTILE window function to divide the timestamps sorted in ascending order into three bins with approximately the same number of records. Table Bins2 adds the row number to the rows in each bin. Finally, the main query keeps the rows with row number equal to 1 to create with the span function the periods covered by each partition. For this, the LEAD window function adds to each row the timestamp of the next consecutive row and COALESCE assigns to the last row the maximum timestamp in the table. As a result, table AdaptiveBins contains three rows composed of the BinId and the associated Period. Note that most of the work is done over the timestamped representation produced by the unnest operation that creates the table Time. Now we can create the partitioned table.

Finally, we fill the partitions as follows.

The query above uses the atTime function to restrict the temporal float attribute Price in table Stocks to the time period indicated in the second argument.

In summary, the tuple versus attribute timestamping debate, dating from the 1990 s, is still open. SQL has adopted the tuple-timestamping approach and provides data modification operations (insert, delete, update) taking into account the temporal semantics. However, SQL does not provide a temporal generalization of the relational algebra. As a consequence, to implement the relational operators the user needs to write complex SQL queries, as we discuss in Sect. 5.

3 A concrete model for temporal databases

This section presents a concrete model corresponding to the abstract model defined in Sect. 2, as implemented in MobilityDB.^{Footnote 4} A complete description of the model can be found in [41].

MobilityDB provides collection types, namely, set, span, and span set types, for representing finite subsets of the domains of base or time types. Set types represent a set of distinct values. Examples are intset or dateset, which are defined over the int and date types provided by PostgreSQL. Span types represent ranges of base or time values and are defined by lower and upper bounds, that can be inclusive or exclusive. Examples are intspan and datespan. Span set types represent set of disjoint spans. Examples are intspanset and datespanset.^{Footnote 5}

3.1 Tuple timestamping

MobilityDB implements tuple timestamping by means of set operations on time types, which are defined at two granularities: date or timestamptz (timestamp with time zone). Four times types are used for defining finite subsets of the time domain at each granularity: date, dateset, datespan, and datespanset, as well as timestamptz, tstzset, tstzspan, and tstzspanset. We describe next these types using the date granularity.

A date value represents a time instant at a day granularity. A dateset value represents a set of distinct date values. It must contain at least one element and the elements must be ordered. An example is

A value of the datespan type has two bounds, the lower and the upper bounds, which are date values. The bounds can be inclusive (represented by ‘[’ and ‘]’), or exclusive (represented by ‘(’ and ‘)’). A datespan value with equal and inclusive bounds corresponds to a date value. An example of a datespan value is

A datespanset value represents a set of disjoint datespan values. It must contain at least one element and the elements must be ordered. An example is:

MobilityDB provides an efficient implementation^{Footnote 6} of the time types and its operations (see Def. 3) based on a sweepline algorithm. As discussed in Sect. 2 (we will give more details in Sect. 5), the intersection and the difference of two time types are used for the temporal join and the temporal difference operation, respectively (Defs. 3, 9, and 11). We next give an example of a table representing the variation of prices of items, using the tuple timestamping model in MobilityDB.

The attribute VT represents the valid time of tuples and is represented with values of type datespanset. An instance of this table is given next.

3.2 Attribute timestamping

MobilityDB provides temporal types for representing values that evolve across time.^{Footnote 7} The temporal types tbool, tint, tfloat, and ttext are, respectively, based on the PostgreSQL types bool, int, float, and text.^{Footnote 8} Temporal types may be discrete or continuous depending on their base type. Discrete temporal types (such as tbool, tint, or ttext) evolve in a stepwise manner, while continuous temporal types (such as tfloat) evolve in either a linear or stepwise manner. The duration of a temporal value states the time extent at which the evolution of values is recorded. Temporal values come in three durations, namely, instant, sequence, and sequence set. A temporal instant value represents the value at a time instant, such as

A temporal sequence value represents the evolution of the value during a sequence of time instants, where the values between these instants are interpolated using either a discrete, stepwise, or linear function. An example with discrete interpolation is

where the value is defined at the given timestamps and undefined everywhere else. An example of a temporal sequence value with step interpolation is

Finally, a temporal sequence value with linear interpolation is shown next:

The value of a temporal sequence is interpreted assuming that the time period defined by every pair of consecutive values v1@t1 and v2@t2 is lower inclusive and upper exclusive, unless they are the first or the last instants of the sequence and, in that case, the bounds of the whole sequence apply. Furthermore, the value of the temporal sequence between two consecutive instants depends on whether the data type is discrete or continuous. For example, the tint sequence above represents that the value is 10 during (2022-01-01 08:00:00, 2022-01-01 08:05:00), 20 during [2022-01-01 08:05:00, 2022-01-01 08:10:00) and 15 at the end instant 2022-01-01 08:10:00. On the other hand, the tfloat sequence above tells that the value evolved linearly from 10 to 20 during (2022-01-01 08:00:00, 2022-01-01 08:05:00) and from 20 to 15 during [2022-01-01 08:05:00, 2022-01-01 08:10:00]. MobilityDB also allows representing sequences with stepwise interpolation when the type is continuous, for example:

Finally, a temporal sequence set value represents the evolution of the value at a set of sequences, where the values between them are unknown, for example:

Temporal types have a rich set of operations corresponding to the ones defined for the abstract model in Sect. 2 (Tables 1 and 2). As discussed in [41], to ensure the closure of operations, when the operands of a lifted operation have a linear interpolation, the result of the operation must also be represented using linear interpolation.

The table containing the evolution of item prices, shown in Sect. 3.1 in the tuple-timestamped model, is modeled as follows in the attribute-timestamped approach, where the unit price of an item is represented as a temporal float (tfloat).

The corresponding instance is shown next.

Over this table, the query

returns the following values

where the NULL value above represents the fact that the item’s price for i1 is undefined on 2021-07-15. Consider now the following query and its result:

Here, the temporal attributes have been restricted to the period given in the query.

As an example of temporal aggregation, the next query asks for the average unit price across time.

Below we show the result, and Fig. 6 shows such result graphically.

To compute the temporal aggregate operations, a skiplist^{Footnote 9} is used. We explain this implementation using the example of Fig. 3b that shows how to compute the temporal count (tCount) at the abstract level. Figure 7a shows three records and their timespans, and, at the bottom, the count of the number of records at any time instant. Figure 7b shows, schematically, the iterative procedure to compute the aggregation using the skiplist. The list at the top contains the interval for record r1. When record r2 arrives, the list is split to accommodate the intervals for this record: between $t_2$ and $t_4$ there are two valid records (count = 2) and only one in the other intervals. When r3 arrives, we must make room for another element, since now there are three elements between $t_3$ and $t_4$. The interval $[t_6, t_8]$ is also split. The bottom of the figure shows the skiplist after processing r3.

3.3 Reconciliation in the concrete model

We conclude the section showing how we can use MobilityDB to apply the most appropriate representation for different operations, as discussed in Sect. 2.4. Initially, the user chooses the design that best fits the application purposes, in this case, tuple timestamping. The query below shows how we use both approaches to compute the tMax operation, that means, the evolution of the maximum price of the items across time.

The result of this query is shown next:

The query first performs a transformation from the original tuple-timestamping representation to the attribute timestamping one, yielding the table TempPrice (which is, basically, table Items_AS above. Table TempMaxPrice contains the result of the tMax temporal operation. The main query transforms back this result into a tuple timestamping form. Note that these transformations between models and the application of the temporal operation corresponds to the operations in the commutativity diagram in Fig. 4. Also note that, as in relational databases, the user may decide to physically materialize frequently used views in both representations. However, this would be redundant in most cases, since MobilityDB is very efficient in performing these transformations.

4 Temporal data warehouses

In this section we first explain the basic notions of temporal data warehousing, and then propose the three implementations over which we develop our study.

4.1 Non-temporal data warehouses

DWs are typically represented using a multidimensional model, where data are perceived as an n-dimensional space composed of facts and dimensions. A fact is a subject of interest, e.g., a business event. Each observation in a fact is called a fact member. A fact is quantified by one or more measures, usually numerical quantities. Dimensions provide context to facts, e.g., sales events can be represented as a three-dimensional Sales fact with dimensions Product, Time, and City, and a measure Amount, where the dimensions represent information on when and where a product was sold at a certain price. Dimensions are organized in levels, which are described by attributes. For example, a level Day provides all possible values for dimension Time. Instances of a level are called level members. A DW may contain multiple levels and multiple facts may share these levels. The bottom level in a dimension determines the granularity of the latter, that is, the level of detail at which measures are recorded. An aggregation relationship relates a child and a parent levels and enables the aggregation of measures at various granularites, e.g., dimension Time may have an aggregation relationship between Day and Month. These relationships define dimension hierarchies. As usual, the cardinality of a relationship between dimension levels or between a fact and a level can be one-to-one (1-1), many-to-one (m-1), and many-to-many (m-m).

4.2 Temporal data warehouses

In real-world applications, DW objects may change in content and/or structure. Content changes (e.g., a modification in the price of a product) are due to routine business operations or to correction of existing data, while schema changes may occur because of changes in the modeled reality, for example, a modification of the cardinality from m-1 to m-m in the assignment of products to categories. A DW should be able to store such time-varying data. This requires extending the classic DW model, to capture time-varying dimensional data, for example, along the lines of temporal database theory [31]. We explain this next, remarking that in this paper we only consider content changes, not schema changes.

A temporal attribute keeps track of the changes in its value and the time when these changes occurred. Note that a level may be composed of both, temporal and non-temporal attributes.

A temporal level is a level for which the application requires storing the lifespan of its members, that is, the time during which they exist(ed). The lifespan of a temporal member is a subset of the time domain, while the lifespan of a non-temporal member covers all the time domain. The lifespan can be an interval (a set of consecutive instants) or it may have time gaps between two intervals making it a set of intervals.

A temporal aggregation relationship keeps the evolution of the links between child and parent members in dimension hierarchies. It is a function that maps each instant in the time domain to a relation between the level members. For a non-temporal aggregation relationship, this function maps all the instants in the time domain to a fixed relation between level members. A synchronization constraint must guarantee that a parent and a child in a relationship instance coexist at each instant in the validity interval of the relationship and a child member must be assigned to some parent member throughout its lifespan.

Throughout the paper we will use, as a running example, a portion of the DW of the TPC-DS benchmark [26]. Figure 8 shows the schema of the example in the MultiDim conceptual model [25, 37] where the temporality of the elements is represented with pictograms (for brevity, not all pictograms are shown in the figure). The DW represents sales of items occurring at a certain date. The items’ lifespan indicates the periods when items were available for sale. Further, the items’ price, brand, and category change across time. We explain the model’s notation next.

An instant ( ) represents a single point in time, an interval ( ) denotes a set of consecutive instants between two time instants, an instant set ( ) represents a set of (distinct) time instants, and an interval set ( ) represents a set of disjoint intervals. The symbol next to the level item, indicates that it is a temporal level and thus the lifespan of its members is kept. The symbol next to attribute i_current_price indicates that this is a temporal attribute and the evolution of changes in its values is kept. The symbol next to the relationships relating item to brand and category tells that the evolution of the assignment of child members to their parent member is kept.

4.3 Temporal data warehouse implementations

This section compares alternative implementations of temporal data warehouses using as example the conceptual schema in Fig. 8. The usual approach favored by practitioners is referred to as slowly changing dimension (SCD), shown in Fig. 9. As an alternative, Ahmed et al. [1] proposed a temporal DW model (denoted TDW), depicted in Fig. 10, and associated OLAP operations. Finally, we also discuss the MobilityDB DW model (MobDB) based on time types shown in Fig. 11.

4.3.1 The slowly changing dimension model

In a DW, the values of some dimension attributes may change across time. Kimball and Ross [21] denoted the dimensions containing such attributes as slowly changing dimensions (SCD). They proposed three basic implementation techniques to track the changes in attribute values and denoted them “types.”^{Footnote 10} Type 1 refers to changes that occur when an error is found in the data. In this case, the errors are fixed by overwriting the existing data, thus, no evolution is maintained. In Type 2 (the model most usually followed and the one we use in this paper), a new version of a level member is created for every change in any of its attribute values. A flag attribute or two time attributes representing a validity interval are used to represent the current version of level members. In this way, an unlimited number of versions of a level member can be created, but a surrogate key is required to uniquely identify each version. Finally, in Type 3, only the last two values of an attribute are kept, in two different columns. For example, in a Product dimension, we may have two attributes, namely Category and newCategory. When a product’s category changes, the new value is stored in the newCategory column. If a subsequent change occurs, the values are shifted and the current value in the newCategory overwrites the current one in the Category column while the new value is stored in the newCategory column.

Figure 9 shows an SCD Type 2 implementation of the conceptual model of Fig. 8, based on the so-called star schema, where dimension tables are denormalized, to favor query performance. An alternative design, based on the snowflake schema, would normalize the dimension tables, leading to a higher number of joins. To keep track of the items’ evolution, attributes i_rec_start_date and i_rec_end_date are added to the dimension table scd_item. These attributes represent an interval when all other attribute values in a row are constant. More precisely, they represent an interval during which (1) an item exists; (2) an item’s current price is valid; (3) an item–brand assignment exists; and (3) an item–category assignment exists. The value of the attribute i_rec_end_date of the current record is set to 9999-12-31. Although i_item_id is a business key of the level item, a surrogate key is required to identify the various versions of the same item. Thus, i_item_sk is added to the table and serves as its primary key.

4.3.2 The temporal data warehouse model

The Temporal DW model was proposed as an alternative to the SCD approach. Figure 10 shows the temporal DW implementation of the conceptual schema in Fig. 8. We briefly discuss the model next.

An item’s lifespan is stored in table tdw_item_vt, which comprises the item’s identifier and a pair of attributes FromDate and ToDate. Table tdw_item_price stores the items’ price, and includes attributes FromDate and ToDate. Attributes i_item_id and FromDate compose the primary key for each table above. The evolution of the item–brand assignments is kept in table tdw_item_brand, which contains the item identifier, the brand identifier, and the period during which the assignment was valid. Finally, the assignment of items to categories is kept in table tdw_item_category, which depicts the interval during which an item was associated to a category. In this model, there is no need to create versions of records belonging to a table, and thus surrogate keys are not introduced. Attributes i_item_id, i_category_id, and FromDate compose the primary key of the table tdw_item_category. Table tdw_item_brand follows a similar rationale.

4.3.3 The MobilityDB data warehouse model

As an alternative to the previous classic temporal DW implementations, Fig. 11 shows a representation of the conceptual schema in Fig. 8, using the MobilityDB time data types explained in Sect. 3. We can see that the lifespan of items is represented in the table mobdb_item using an attribute of type datespanset. The same type is used in the three other tables that keep the evolution of items’ features.

The relationship between items and brands helps us to explain the rationale of our proposed approach. Notice that table mobdb_item_brand is designed using tuple timestamping, where the periods of validity of each item–brand combination are represented by means of a datespanset type (i.e., a set of intervals with granularity date). As a first example, suppose that over this representation, we want to compute the temporal count (i.e., the evolution of the count over time) of the number of items of a brand. Figure 12a shows that this can be done by stacking over time the periods when any item belongs a brand and counting, at each point in time, how many periods overlap. The figure shows the result for brand B1 which, across time had between one and three items. The result is concisely represented in one single tuple using attribute timestamping. Opposite to this, the tuple timestamping representation of the result in this example would require at least three tuples (one for each item in the brand) or even more, if the database could not handle temporal elements. Figure 12b, shows another example, where we want to compute the evolution on time of the maximum number of items of any brand. This could be computed applying a temporal maximum operation (tMax), over the result of the temporal count previously computed. We remark that in Fig. 12a we have shown the computation of the temporal count for brand b1, while, for clarity, in Fig. 12b we show the temporal count of three brands, namely b1, b2 and b3. It can be intuitively seen that reusing the previous computation can be more easily done over the attribute timestamped representation of the result.

5 Temporal algebra operators in SQL

Querying and updating time-varying information using standard SQL is challenging. Snodgrass [28] proposed a temporal extension to SQL, called TSQL2. Some of its features are part of the SQL standard [22] and have also been incorporated into major DBMSs. Since database practitioners still use standard SQL for manipulating time-varying information, Snodgrass [29] has shown how most temporal relational operations can be written in standard SQL. Zimányi [40] then showed how to implement temporal aggregates and temporal universal quantifiers using standard SQL. From these works, it follows that in order to foster massive use of temporal features in databases, a different approach is still needed.

In this section, we study how to express temporal relational algebra operations over the three alternative tuple-timestamping implementations of the TPC-DS conceptual schema depicted in Fig. 8. We remark that we start with the algebra operations because, as we will see Sect. 7, they are a core part of the OLAP queries over DWs, which typically aggregate over the results of basic algebra operations [37]. We compare the three approaches in terms of conciseness and simplicity, leaving performance to Sect. 8. For the sake of brevity, in this paper we only use the portion of the schema related to items. In Sect. 7 we apply these ideas to address OLAP queries in a DW environment.

The reader may be asking why do we design the whole schema using the tuple-timestamping approach instead of attribute timestamping, for example, representing the i_item_price attribute as a temporal float data type (tfloat). Both representations would be equivalent. The choice between the two depends on the application requirements. A typical query in our case would ask for sales with respect to the product price. Such a query would take each price of an item and compute the sales of the item at that price. Implementing such operation over an attribute timestamping schema would require to perform the unnest operation depicted in Fig. 5, to disassemble the evolution of the price for an item into pairs of the form (value, datespanset) and then to aggregate the sales for that value. This query would be more efficiently evaluated using tuple timestamping that would not require unnesting. On the other hand, Fig. 12a, b show that temporal aggregations are computed more efficiently using attribute timestamping and unnesting the result into tuple timestamping, which is straightforward, as showed in Sect. 2.4.

In what follows, we carry out our study using the classes of temporal queries proposed in [29, 40], namely, temporal selection, projection, join, union, difference, and aggregation. These queries are shown in Table 3.

Table 3 Temporal algebra queries and the corresponding relational operator

Full size table

Temporal Selection As an example of this operation (Def. 7), we want to obtain the time when an item has a given brand.

Query 1

“Time when an item has brand B.”

In the TDW implementation, the SQL expression for the query would read as follows.

First, the query filters the rows in the WITH clause; then, the result is coalesced in the main query. The coalesce operation combines into one row a set of value-equivalent rows (i.e., rows that are equal on all their columns except for FromDate and ToDate), if their time periods overlap. This is a demanding operation that requires three nested NOT EXISTS predicates. We select the period [FromDate, ToDate) obtained from two rows f and l such that there is no gap during the period; that is, for every row m1 valid within the selected period there is a row m2 that can extend m1 to the left. This is implemented by the first two NOT EXISTS predicates in the query above. The third NOT EXISTS predicate ensures that no other row m can extend the selected period to the left or to the right.

Next, we show how the query is expressed over the SCD Type 2 implementation of Fig. 9. To avoid unnecessary repetition, in what follows we only show the portions of the queries that differ from those in the TDW implementation.

We can see that all temporal aspects are kept in a single table scd_item. Thus, the only difference with the TDW implementation, is the table from which ItemBrandBAll obtains its content. The situation is analogous for all queries in the SCD implementation.

Finally, we show how the query is expressed over the schema in Fig. 11 (the MobilityDB DW model) using the spanUnion operation over time types, which automatically performs the coalescing operation.

To obtain exactly the same result as the query above, the datespanset in the second column is decomposed into two columns FromDate and ToDate as follows:

The expression unnest(spans($\cdot $)) transforms the argument datespanset into several rows, one for each composing period, and the functions lower and upper extract the corresponding bounds from each period.

Temporal Projection We now show how the temporal projection (Def. 8) is expressed in SQL. We omit the details, since the query is similar to the previous one: first, the table ItemAnyBrandAll is computed and then this table is coalesced.

Query 2

“Time when an item was assigned to any brand.”

The query over the TDW implementation reads:

As mentioned, the SCD implementation is similar to the TDW one, as shown next.

In MobilityDB the query is written as follows.

Temporal Join We now compute the time when an item had a given brand and a given price. This corresponds to a temporal join (Def. 9). Given one row from each table whose validity periods intersect, a temporal join will return the brand and price values together with the intersection of the two validity periods. As showed in [29], expressing a temporal join in SQL requires four SELECT statements and complex inequality predicates in order to verify that the validity periods of the rows to be combined intersect. This is illustrated in Fig. 13. Assuming that there are no duplicate rows in the tables (that is, at each point in time an item has one brand and one price), a UNION ALL operation can be used to combine the four cases. Finally, the result is, again, coalesced. The SQL:2023 standard provides two functions, greatest and least that return, respectively, the minimum and maximum of their two arguments. Thus, we can write the temporal join in a simpler way. We refer to [29, 37] for details.

Query 3

“Time when an item has brand B and its price is greater that €80.”

The SQL solution for this query, over the classic TDW, reads:

The SQL query for the SCD implementation reads

In the TDW implementation, the evolution of brands and prices of items is kept in different tables and thus, a temporal join is needed to combine them, while this information is kept together in the SCD approach. Thus, the tables in the TDW model are much smaller than the ones in the SCD one, since the table scd_item keeps the temporal Cartesian product of all the corresponding tables in the TDW approach.

The same query is expressed in MobilityDB using the intersection (*) operation on two sets of intervals, represented using the datespanset type. After that, a coalescing using spanUnion is performed.

Temporal Union Our next query asks for the times when an item has a given brand or has a given price. This is an example of a temporal union operation (Def. 10). To compute this operation in SQL, we first obtain the tuples that satisfy each condition and perform their union. Finally, we coalesce the result of the previous operation.

Query 4

“Time when an item has brand B or its price is greater than €80.”

The query over TDW is written as:

Over SCD, the SQL would be expressed as follows, omitting, as before, repeated parts.

The corresponding MobilityDB query is given next.

Temporal Difference We now give an example of the temporal difference operation (Def. 11). To express this operation in SQL, we must consider the four possible cases shown in Fig. 14, where an item satisfies the first condition but not the second one.

Query 5

“Time when an item has brand B and its price is not greater than €80.”

Over the TDW implementation, we have:

The query performs a temporal difference between tables ItemBrandB and ItemPriceGT80, which are the result of coalescing ItemBrandBAll and ItemPriceGT80All. The difference is computed in table ItemBrandBAndNotPriceGT80All. The four inner queries implement (using the NOT EXISTS predicate) the four cases in Fig. 14. The result is coalesced in the main query.

Over the SCD implementation we have:

Again, this query and the one corresponding to the TDW implementation differ in how the tables ItemBrandBAll and ItemPriceGT80All are computed.

In the MobilityDB DW model the query is written:

The difference in this case is computed using the difference (-) operation on two interval sets in the SELECT clause of the inner query, which makes this query easier to express than in the TDW or SCD approaches.

Temporal Aggregation The temporal version of the five classic SQL aggregate operations, as stated in Def. 13, requires a three-step process: (1) split the timeline into periods of time during which all values are constant, (2) compute the aggregation over these periods, and (3) coalesce the result. This is illustrated next.

Query 6

“Time when a category has more than three items.”

The aggregation over the TDW implementation is:

Table CategoryChanges gathers, for each category, the dates at which a category assignment started or ended. Table CategoryPeriods constructs the periods from such dates. Table CategoryGT3ItemsAll filters out the previous table, selecting the categories that have more than three items assigned to it. Finally, this table is coalesced in the main query.

The SCD implementation reads:

We can see that, with respect to the query for the TDW implementation, the query above must compute the table ItemCategory before continuing with the aggregation that starts from the computation of the table CategoryChanges. We will see in Sect. 8 that this has a strong impact on the query performance.

Before showing the MobilityDB solution, we introduce the whenTrue function, which returns the time when a predicate is satisfied. For example, the query

returns the time when the value of the temporal float is greater than three, shown next:

To express our query, we first compute the temporal count (Fig. 12a) and then use the whenTrue function to obtain the periods when the temporal count is greater than three.

In conclusion, the queries discussed in this section illustrate the benefits of the temporal types approach, implemented in the MobilityDB DW. The temporal types allow expressing the queries very concisely. In contrast, in classic temporal databases, most queries require writing long pieces of SQL code, in particular due to the coalesce operation and the timeline splitting for temporal aggregation. Indeed, in the classic temporal approach, we need to write, in every query, the implementation of the temporal operations and the required coalescing operation that follows, while in the temporal types approach, this functionality is provided by the system. Sections 7 and 8 show that similar results are obtained in a DW environment, also reducing the execution times of the queries.

6 Temporal OLAP operators

We now define the temporal equivalents of the classic (non-temporal) roll-up and dice OLAP operators, for the three implementations in Sect. 4. In what follows, we assume the reader is familiar with the non-temporal operations. We will use the example illustrated in Fig. 15. There is an item $\textsf {i}_1$ introduced at instant $t_{1}$ with unit price €10 belonging to category $\textsf {b}_1$. At $t_{2}$, its unit price changed to €15. At $t_{3}$ it was assigned to brand $\textsf {b}_2$, and at $t_{4}$ it was discontinued from selling. Finally, the product was reintroduced at $t_{5}$ with unit price €18 and brand $\textsf {b}_2$, and it is available for selling until today. The evolution of $\textsf {i}_1$ and the corresponding instance of dimension Item are shown in Fig. 15a, b, respectively. We also assume that item $\textsf {i}_2$ belonged to brand $\textsf {b}_2$ throughout its entire lifespan, and item $\textsf {i}_3$ to brand $\textsf {b}_1$, also during its whole lifespan (Fig. 15b).

Temporal Roll-up A non-temporal roll-up operation aggregates fact data up to a dimension level using the parent–child relations defined in the dimension hierarchies. The temporal roll-up operation is similar, except that it must consider the state of the roll-up hierarchy at a specific instant. Depending on the time instant used to obtain the state of the hierarchy, the temporal roll-up can be temporally-consistent or time-sliced. Figure 16 depicts an example scenario to explain these operations. Figure 16b shows the current state of the aggregation hierarchy from items to brands.

In a temporally-consistent roll-up, the state of the hierarchy is obtained at the time when each fact instance occurred. For example, if a user wants to know the total quantity sold for each brand, considering the item-brand assignment valid at the time of the sales, she must perform a temporally-consistent roll-up. Figure 16c shows the result of a temporally-consistent roll-up on the Sales fact from Fig. 16a. The sales of each product are aggregated to the brand to which it belonged at the time indicated in column Date.

A time-sliced roll-up aggregates measures using the state of the hierarchy at a fixed instant t. For example, to compute the total quantity sold for each brand, using the item-brand assignment valid now, we perform a time-slice roll-up considering the current state of the dimension (depicted in Fig. 16b). The time-slice roll-up on fact Sales from Fig. 16a is shown in Fig. 16d. For example, as depicted in Fig. 16b, the sales of item $\textsf {i}_1$ are aggregated to brand $\textsf {b}_2$. We remark that both kinds of roll-up operations are valid in any of the implementations defined in Sect. 4.3.

Temporal Dice A non-temporal dice operator filters instances of a fact based on a condition over measure values and attribute values of related dimension levels. Analogously to the case of the temporal roll-up, we can have a temporally-consistent or time-sliced dice. For example, to obtain the sales of items that are assigned to brand ${{\textsf {\textit{b}}}}_2$ and have a price equal or higher than €10, the operator must consider whether we want to obtain the sales of the products that were assigned to brand $\textsf {b}_2$ either at the time of the sales or that belong to this brand now. Since the price of an item may also change over time, the same considerations apply to the price as well. For example, Fig. 16c shows the result of a time-slice dice that keeps the items that currently have brand $\textsf {b}_2$ and a price equal or higher than €10. Obviously, we can mix the two interpretations, for example, by asking sales of items currently assigned to category $\textsf {b}_2$ and that have a price equal or higher than €10 at the time of the sale.

7 Temporal OLAP operators in SQL

In this section, we show how to express the temporal OLAP operations defined in Sect. 6 in the three implementations presented in Sect. 4.3. For this, we extended the temporal algebra queries in Sect. 5 with the aggregation of fact data, yielding temporal OLAP queries. The queries are shown in Table 4, indicating also the query type. Notice that while algebra queries have the template “Time when $\ldots $,” the corresponding OLAP queries have the template “Total sales and time when $\ldots $” For the sake of clarity, we have created views for each of the temporal algebra queries of Sect. 5. These views (named, e.g., Q1_MobDB, Q3_TDW, or Q5_SCD) are used in the temporal OLAP queries below. In Sect. 8 we compare their performance over the three implementations.

Table 4 Temporal OLAP queries and the corresponding relational operator

Full size table

Temporal OLAP Selection This operator implements a temporal dice operator. As explained in Sect. 6, a dice can be temporally-consistent or time-sliced. The query below corresponds to the former type.