Building a Regression Model
One of the great things about PySpark ML module is that most algorithms can be tried and tested without changing much code. Random Forest Regression is a fairly simple ensemble model, using bagging to fit. Another tree based ensemble model is Gradient Boosted Trees which uses a different approach called boosting to fit. In this exercise let's train a GBTRegressor.
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- Import
GBTRegressorfrompyspark.ml.regressionwhich you will notice is the same module asRandomForestRegressor. - Instantiate
GBTRegressorwithfeaturesColset to the vector column of our features named,features,labelColset to our dependent variable,SALESCLOSEPRICEand the randomseedto42 - Train the model by calling
fit()ongbtwith the imported training data,train_df.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
from ____ import ____
# Train a Gradient Boosted Trees (GBT) model.
gbt = ____(featuresCol=____,
labelCol=____,
predictionCol="Prediction_Price",
seed=____
)
# Train model.
model = gbt.fit(train_df)