Scheduling Notebook Tasks in Pipeline Jobs in DataArts Factory
After creating a notebook task on the DataArts Factory console, you can create a batch processing pipeline job and associate the notebook task with the Notebook operator. Then the notebook task can be scheduled in DataArts Factory.
Creating a Notebook Task
- In the navigation pane on the DataArts Factory console, choose Data Development > Notebook.
- Locate a running notebook instance.
- Click Open to go to the notebook development page.
- Click
and select Notebook to create a notebook file. Add an .ipynb file to create a notebook.
- On the notebook development page, enter and debug development code.
- Click
to run the code.
- View the code execution result.
Uploading the Notebook Task to an OBS Bucket
- Method 1: Download the notebook task to a local path and then upload it to an OBS bucket (parallel file bucket).
- In the JupyterLab file list, right-click the file to be downloaded and select Download from the shortcut menu.
The file will be downloaded to your browser's downloads folder.
Figure 1 Downloading a file - Upload the local file to an OBS bucket. For details about how to upload an object, see Uploading an Object.
- In the JupyterLab file list, right-click the file to be downloaded and select Download from the shortcut menu.
- Method 2: Upload the notebook task to an OBS bucket directly. (The OBS bucket must be a parallel file bucket.)
Use the ModelArts SDK to upload the target file from the notebook instance to OBS.
- Call the ModelArts SDK to upload a file to OBS.
Sample code: Upload file1.txt in the notebook instance to OBS bucket obs://bucket-name/dir1/. All the bucket name, folder name, and file name are customizable.
from modelarts.session import Session session = Session() session.obs.upload_file(src_local_file='/home/ma-user/work/file1.txt', dst_obs_dir='obs://bucket-name/dir1/')
- Call the ModelArts SDK to upload a folder to OBS.
Sample code: Upload /work/ in the notebook instance to obs://bucket-name/dir1/work/ of bucket-name. The bucket name and folder name are customizable.
from modelarts.session import Session session = Session() session.obs.upload_dir(src_local_dir='/home/ma-user/work/', dst_obs_dir='obs://bucket-name/dir1/')
- Call the ModelArts SDK to upload a file to OBS.
Scheduling a Notebook Task Through DataArts Factory
- On the DataArts Studio console, locate a workspace and click DataArts Factory.
- In the left navigation pane of the DataArts Factory console, choose .
- In the job directory list, right-click a directory and select Create Job.
- Create a batch processing pipeline job and click OK to open the job development page.
- Drag a Notebook operator to the canvas.
- Click the Notebook operator and configure parameters.
- Configure the scheduling mode.
- Save and submit the job version.
- Click Execute to start the job.
- Choose Job Monitoring and view the job execution result on the Batch Jobs page.
Figure 3 Viewing the execution result of the batch processing job
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot