In this blog, we will discuss more features of Orange Tool to Split our data into training data and testing data and how to use cross-validation.
Open Orange tool and add by default file of iris data set into workspace. Next, add the widget Data Sampler. Data Sampler selects a subset of data instances from an input data set. and outputs a sampled and a complementary data set. Here I sampled the data 70% output sampled data and 30% will be complementary data set.
Now send the sample data from Data Sampler to Test and Score widget. The widget tests learning algorithms. Different sampling schemes are available, including using separate test data. The widget does two things. First, it shows a table with different classifier performance measures, such as classification accuracy and area under the curve. Second, it outputs evaluation results, which can be used by other widgets for analyzing the performance of classifiers, such as ROC Analysis or Confusion Matrix.
The sample data from Test and Score is send to three different learning algorithms namely Neural Network, Naive Bayes and Logistic Regression.
Split data in training data and testing data in Orange
To split the data into train and test datasets, we will send 70% of the sampled data from Data Sampler as the train data and remaining 30% data as the test data by clicking on the link between Data Sampler and Test and Score. In there set the link from Data Sample box to Data box and Remaining Data box to Test Data as shown in below figure.
Now as you can see in below image there are two flows from data sampler to test and score.
choose the option of Test on train data there and get the scores for all the three algorithm.
In this blog we see use of data sampler widget and train and test widget for cross validation.