Help Center/ DataArts Studio/ User Guide/ DataArts Factory/ Notebook Development/ Scheduling Notebook Tasks in Pipeline Jobs in DataArts Factory

Updated on 2025-07-25 GMT+08:00

View PDF

Scheduling Notebook Tasks in Pipeline Jobs in DataArts Factory

After creating a notebook task on the DataArts Factory console, you can create a batch processing pipeline job and associate the notebook task with the Notebook operator. Then the notebook task can be scheduled in DataArts Factory.

Creating a Notebook Task

In the navigation pane on the DataArts Factory console, choose Data Development > Notebook.
Locate a running notebook instance.
Click Open to go to the notebook development page.
Click and select Notebook to create a notebook file. Add an .ipynb file to create a notebook.
On the notebook development page, enter and debug development code.
Click to run the code.
View the code execution result.

Uploading the Notebook Task to an OBS Bucket

Method 1: Download the notebook task to a local path and then upload it to an OBS bucket (parallel file bucket).
1. In the JupyterLab file list, right-click the file to be downloaded and select Download from the shortcut menu.
  The file will be downloaded to your browser's downloads folder.
  
  Figure 1 Downloading a file
2. Upload the local file to an OBS bucket. For details about how to upload an object, see Uploading an Object.
Method 2: Upload the notebook task to an OBS bucket directly. (The OBS bucket must be a parallel file bucket.)
Use the ModelArts SDK to upload the target file from the notebook instance to OBS.
- Call the ModelArts SDK to upload a file to OBS.
  Sample code: Upload file1.txt in the notebook instance to OBS bucket obs://bucket-name/dir1/. All the bucket name, folder name, and file name are customizable.
```
from modelarts.session import Session
session = Session()
session.obs.upload_file(src_local_file='/home/ma-user/work/file1.txt', dst_obs_dir='obs://bucket-name/dir1/')
```
- Call the ModelArts SDK to upload a folder to OBS.
  Sample code: Upload /work/ in the notebook instance to obs://bucket-name/dir1/work/ of bucket-name. The bucket name and folder name are customizable.
```
from modelarts.session import Session
session = Session()
session.obs.upload_dir(src_local_dir='/home/ma-user/work/', dst_obs_dir='obs://bucket-name/dir1/')
```

Scheduling a Notebook Task Through DataArts Factory

On the DataArts Studio console, locate a workspace and click DataArts Factory.
In the left navigation pane of the DataArts Factory console, choose Data Development > Develop Job.
In the job directory list, right-click a directory and select Create Job.
Create a batch processing pipeline job and click OK to open the job development page.
Drag a Notebook operator to the canvas.
Click the Notebook operator and configure parameters.
Configure the following important parameters: Spark Job Name, DLI Queue, Spark Versions, Input Directory, Input Notebook File, Notebook File Output Directory, and Output Notebook File Name. Configure other parameters as needed.
Figure 2 Configuring parameters for the Notebook operator
Configure the scheduling mode.
Save and submit the job version.
Click Execute to start the job.
Choose Job Monitoring and view the job execution result on the Batch Jobs page.
Figure 3 Viewing the execution result of the batch processing job