Sequence Models
Last updated
Last updated
Sequence Analysis describes a chain of events, or states, to provide a concise summary of the sequence-generating process. This summary allows us to predict the next likely state in the sequence, determine the likelihood of occurrence for a particular sequence, and generate a new sequence having the same statistical properties as the original.
For a brief introduction to data-mining concepts, as well as general guidance on navigating the ML Toolkit, please refer to the respective .
A Markov Chain (MC) is a graphical representation of a sequence of states and the relationship among them. At its most basic, an MC model shows the probability of transitioning from one state to another.
Like the MC, a Hidden Markov Model (HMM) is a graphical model describing the probabilities of transitioning from one state to another. It extends the Markov process by focusing on two set of states:
Observed
Hidden
The observed states, as their names imply, are visible at any given time. While hidden states are not visible directly, the user knows about their existence. The most common use case for HMM is to infer the sequence of underlying hidden states given a set of visible states.
In the context of SymetryML, the output of a sequence generating process is represented as a dataset containing an array of states in discrete time.
For a MC model, the sequence can be as simple as an array of comma-separated values.
In the following figure, the sequence represents a daily weather pattern, which we will learn.
Hidden Markov Models require two sets of sequences:
One for the hidden state.
One for the observed state.
We will use the activity dataset, which contains a labeled sequence of human activity (the hidden state), and the readings from various sensors around the room.
To work with MC models:
Create a new project of type Sequence.
Select the weather.seq file as your data source.
Verify the format of the file.
Verify the types. For this example, the absence of a header in the CSV file forced the program to generate header names automatically. Leave the attribute types as String. Click Finish to continue.
After the project is learned, the following univariate statistics appear.
Click Build Model > Sequence > Markov Chain.
Click a name, and then click Finish.
After the model is built, you can view its properties by double-clicking the corresponding model icon.
Prediction works similar to every other model. You select the input file that you want to predict and the destination file where the prediction will be written.
The result should resemble the following:
In this figure:
res_Result = the next most likely step in the sequence.
prob_Result = the probability of that res_Result.
seq.p_Result = the likelihood of the sequence given your MC Model.
To work with a Hidden Markov model:
Create a new project. Verify that the project type is Sequence.
Select the input data source.
Verify your data.
Convert the attribute types to Categorical.
Click Finish to complete the wizard.
After the learning finishes, you should see the following attributes in the univariate statistics.
To build the HMM model:
Click Create Model > Sequence > HMM.
Enter a name for the model.
Select the appropriate category for the hidden and the observed state. Click Build Model to start the model build process.
After the model is complete, you should see the following Model Info.
You can now use your HMM to predict the sequence of hidden states given the set of observed states
Right-click the model icon, and then click Predict.
Select an input data source.
Select and output data source.
Verify that your data and the type mapping are valid.
The output of the prediction should be the pred_Result column, a sequence of inferred hidden states, and the columns from the original file.
To perform an assessment:
Right-click the model, and then click Assessment.
Select the input Data Source. In this example, we’re using a smaller subsample of the original data source.
Validate that your data is valid.
Verify the correct type mapping of attributes.
Be sure there is a match between the model attribute name and the name in the test file. Click Finish to start the assessment job.
After the assessment completes, you should see a confusion matrix that compares the predicted hidden sequence to the actual sequence present in the file.