Getting started

This document is intended to be a lightweight introduction to Moonsense data and the ways data science teams can leverage it.

The information here complements the Moonsense demos, which include Jupyter notebooks with sample data, code, analysis, and modeling.

What data does Moonsense collect?

Once deployed, Moonsense collects a wide range of data on user interactions - key press data, accelerometer data, gyroscopic data, and much more.

The types of data collected differ based on what devices interact with Moonsense and the manner in which the Moonsense client SDK is implemented.

For additional details about the data that gets collected, check out the Data Model documentation.

Data structures

Data collected from devices for discrete periods of time are organized into sessions. Each session includes both metadata and raw sensor data. All the data is structured as JSON. Each session is uniquely identified with a numerical identifier, and multiple sessions that span different interactions point that are nonetheless related (e.g., a sign in page that leads into a checkout page) may be linked together with a Client Session Group identifier.

The metadata includes informatino on identifiers that can be used to associate sessions, the device used, and labels that distinguish the source from which data was collected (such as login pages or checkout pages), amongst other details.

The raw data for each session is broken up into smaller chunks of data referred to as bundles. Those bundles contain the sensor data collected by Moonsense.

The range of the data collected within each bundle is time-bound. The contents of each bundle is dependent on the manner in which the Moonsense client SDK is configured and also on the types of data points that happen to arise during the period of data collection.

For instance, if there is no typing activity during a given bundle's period of data collection, there would not be any key press data. However, there might be other data associated with movement, such as accelerometer data if the device happens to be moving at the time.

Each individual session potentially includes many bundles.

Introduction to Behavioral Biometrics

A lot of the data that Moonsense collects is behavioral in nature. It reflects how people behave when they interact with a device – the cadence at which they type, the manner in which they move the mouse, how their fingers touch a mobile device screen, the position of how they hold their devices, and so on.

That sort of behavior is unique to each person. In a way, it’s like a distinguishing fingerprint. No two people are exactly alike when it comes down to how they interact with computers and devices, and that distinctiveness makes it possible to use the sort of data collected by Moonsense to perform behavioral biometrics.

Behavioral biometrics is the practice of analyzing human interactions with technology to identify patterns. Those identified patterns can be used for such purposes as to distinguish between normal and abnormal activities and to highlight cases of possible concern (e.g., fraud).


Behavioral biometrics is oftentimes applied in the domain of identity authentication.

Authentication methods can traditionally be separated into three major types:

  • Knowledge - authentication that is based on restricted knownledge, such as usernames and passwords.
  • Possession - authentication that is based on what is physically possessed. Common variants include physical hardware keys or mobile devices configured to receive SMS messages containing unique numerical combinations.
  • Inherence - authentication based on the identity of a person. Common types of inherence-based authentication include things like fingerprint scanners, retinal scanners, or facial recognition.

Behavioral data can be regarded as another variant of inherence authentication.

By examining the volumes of data around how people interact with devices, we can extract signals in the data that can reveal whether the person trying to access a service or perform some action is the legitimate user or a fraudulent actor.

Moonsense is a mechanism that enables the collection of such behavioral data that can in turn be utilized for inherence authentication purposes.


You can interact with Moonsense data programmatically using the Moonsense Python SDK, which can be installed from PyPI with the following command:

pip install moonsense

To access the data using the Python SDK, first we'll have to generate a secret authentication token through the Moonsense Console by navigating to Apps, selecting the appropriate App, and creating a token.

tokens create

Once the token creation process is completed, a secret token and a public token will appear.

tokens generated

We recommend saving the tokens and then assigning the secret token to an environment variable. The default token name that the Moonsense Python SDK searches for is called MOONSENSE_SECRET_TOKEN.

You can confirm that the token is accessible by running the following command within Python:

import os print(os.environ['MOONSENSE_SECRET_TOKEN'])

Other modules that are commonly used to interact with Moonsense data for data science applications include:

Interacting with the Data

In this section, we'll go into more detail on how we can utilize the Moonsense Python SDK to interact with the collected data.

Bulk Download

For convenience, we've provided a script that can programmatically download Moonsense data in bulk. It accepts three optional arguments, including:

  • A start date
  • An end date
  • A .csv file with a group_id that will filter the returned data to only include data with matching Client Session Group IDs.

For instance, we can run the script with the following:

python3 --since 2022-08-03 --until 2022-08-06 --filter_by filtered_list.csv

Where filtered_list.csv is a .csv containing Client Session Group IDs.

Successful execution of the script downloads a mass of data:

> tree data -l 3 data ├── 2022-08-04 │ └── 33643170 │ ├── eHZQczKU7Kcj57g586kZda │ │ ├── metadata.json │ │ └── raw_sealed_bundles.json │ └── mdHAbnakubRW5NMdbii8g8 │ ├── metadata.json │ └── raw_sealed_bundles.json └── 2022-08-05 ├── 06136989 │ ├── B8qd5qLumW6c4NQs9zj9xY │ │ ├── metadata.json │ │ └── raw_sealed_bundles.json │ ├── NUtMTn28SqVZSxbt2zFopN │ │ ├── metadata.json │ │ └── raw_sealed_bundles.json │ ├── QJEknkpX6HhUCj2rq7eWp7 │ │ ├── metadata.json │ │ └── raw_sealed_bundles.json ...

The path structure is of the form: ./data/<date>/<Client Session Group>/<session>/*.json.

The metadata.json files contain metadata associated with each session, and the raw_sealed_bundles.json contain all the raw sensor data, organized as multiple lines of valid JSON where each JSON line corresponds to a single bundle of data.

Reshaping the Data

The raw data collected by Moonsense is rich in detail, but will likely have to be reshaped based on your intended use case.

Oftentimes, it’s preferable to work with data structured as a singular, two-dimensional dataframe that contains details of multiple different observations, where each row is a distinct unit of observation.

For example, if we’re looking for key press data for multiple different sessions, we might want to structure the data as follows:


Where each row represents a distinct key press event, the determined_at field is an indication of time, type is an indication of the type of key press event, masked_key is an obfuscated representation of what key was pressed, and the GID is a distinct identifier for a particular session.

To arrive at this desired data structure, we'll have to:

  1. Filter all the metadata to identify the sessions that contain the desired data. In this example, we would filter for a label called key_press_data.

  2. Specify the corresponding raw data files associated with the metadata that have already been filtered.

  3. Extract the desired features across the different bundles associated with the different sessions.

  4. Combine all the extracted data into a singular dataframe.

For a detailed demonstration and some sample code of how to collect the data and combine it, check out the Sign In demo.

Exploratory Analysis

It's common practice to start data science projects with some preliminary exploratory analysis, especially if the dataset is new or unfamiliar.

The different methods of data exploration are broad and might include, but are not limited to:

  • Visual examination of the raw data itself
  • Aggregate summaries of numerical fields of data, such as means, medians, and standard deviation calculations
  • Counts of categorical fields to determine relative frequencies
  • Search for any missing values interspersed in the dataset

For instance, to get a snapshot view of the relative prevalence of different key press types, we can generate a basic count of the distinct categories from a specified data field:

key press count

We can also examine histograms of the time differences between key presses for interactions labeled as “Legitimate” and “Illegitimate” as one means of attaining a better familiarity with the overall distribution of a feature derived from the data.

legit histogram

This sort of exploratory analysis oftentimes reveals unexpected patterns or gaps in the data, which can help prevent missteps later on when it comes time to perform further analysis or to build a model.

Training a Model

The precise type of model that will be trained will be dependent on a broad range of considerations, including the type of data available, performance requirements, and computational cost, amongst other factors.

Supervised vs. Unsupervised

Oftentimes, the first consideration is whether or not there is data relating to some target attribute, commonly referred to as a dependent variable. A dependent variable might be data that indicates "Yes" or "No", reflects some sort of grouping or categorization, or represents some form of quantifiable measure. The specified target variable is highly dependent on the particular use case under consideration.

If there is a dependent variable, then the model can be a supervised learning model. A supervised learning model learns the patterns between different attributes - or independent variables - that can inform what the dependent variable is likely to be.

In general, Moonsense data does not in of itself present a usable target attribute. The Moonsense data is instead a rich source of independent variables that can be leveraged to identify patterns associated with some other dependent variable.

For instance, a payment transaction company can combine their own data on chargeback activity with the raw Moonsense behavioral data. Chargebacks are oftentimes indicative of some form of fraud. By coupling that data with Moonsense data, it is possible to build a supervised learning model that can indicate when a transaction is likely to result in a chargeback based on such possible indicators as typing patterns, motions, and screen interactions.

On the other hand, if there is no dependent variable available, then the machine learning model is likely an unsupervised model that aims to define groups or clusters based on the available attribute data. Possible use cases might be to derive groupings of users based on the manner in which they interact with their mobile devices. For instance, a grouping criteria might be to separate users into groups based on the range of physical motion.

While this alone might not convey much utility, any groupings can subsequently be used for further analysis or modeling purposes. As an example, it might turn out that different ranges of physical motion becomes a useful indicator for some outcome of interest.

Qualitative vs. Quantitative

In the case of supervised learning, the dependent variable can be qualitative (sometimes referred to as categorical) or quantitative (sometimes referred to as numerical).

A qualitative dependent variable can manifest as a binary variable where there are two possible outcomes, or a multi-class variable, where there are more than two possible outcomes.

Moonsense data can be leveraged as independent variables in qualitative modeling, where the target variable might be:

  • Fraud or not
  • Bot or not
  • Likely to lead to chargeback or not
  • Whether a given interaction is similar to the legitimate user

A quantitative dependent variable can be a measure of some form, and can be combined with Moonsense data to derive such values as:

  • A probability value in the range from 0 to 1, indicating the likelihood that some event is fraudulent
  • The likely age of a device user

Feature Engineering

Generally, we do not recommend utilizing all the raw data collected by Moonsense in a model training process without adequate consideration.

The nature of Moonsense data is highly detailed and voluminous, and might not be appropriate for modeling in its untransformed state.

It’s often preferable to engage in feature engineering based on the Moonsense data prior to use. Feature engineering is the process of applying domain knowledge to extract particular transformations of the raw data that are of greater utility than just the data in its raw form.

There are quite a few benefits to feature engineering, including:

  • Reduction in the data size. Raw Moonsense data can get quite large. By condensing or aggregating the data through feature engineering, the size of the data can be reduced prior to transmission.

  • Avoiding overfitting. There is a lot of noise and variation in the raw data itself, and training a model based on noisy raw data might lead the model to adapt to all the idiosyncracies of the data. When a model starts relying on irrelevant details in the data, it may overfit to the data. Overfitting is a scenario where a model becomes well trained to a specific dataset, but in doing so, becomes a poor performer when applied to previously unseen data.

  • Focus on signals. By applying domain knowledge to the process of extracting useful details out of the raw data, we can actually derive data that is more useful and informative and can result in better model performance.

There are countless ways to engage in feature engineering on Moonsense data. The precise feature engineering approaches applied are highly dependent on the specific use case.

One example of how feature engineering can be used is to compare the cadence of typing in a username field against the cadence of typing in a password field. The difference in cadence between the two signals is not a feature directly captured by Moonsense, but can easily be calculated and can reveal a signal useful to detecting when an account may have been compromised. For a detailed example, check out the Sign In demo.

Tuning the Model

Developing a machine learning model is typically an iterative process. Oftentimes, the initial models are rough and unrefined and require cycles of tuning until the model performs at a desired level.

Models can be tuned by changing the model technique applied, parameters associated with different modeling techniques (such as the number of trees in a random forest method or perhaps the seed specification in K-means clustering), and the elements in the dataset that get incorporated into the model.

For instance, an initial state of a model intended to identify chargebacks in transactions might perform only slightly better than a random coin flip. After some iteration, involving further feature engineering based off of Moonsense data, it may be possible to dramatically improve the model's ability to identify transaction instances that are likely to result in chargeback.

How much tuning is performed can be informed by the assessed performance of a model.

Evaluating the Model

The performance of a machine learning model can be evaluated using a number of commonly applied metrics in the space of data science.

The techniques we discuss here are primarily applicable to supervised learning methods for categorical targets, where there is a clearly identifiable target variable.

Fundamentally, evaluating model performance is based on comparing a model's predictions of the target outcome for a set of data against the known target outcomes for that same set of data. In effect, it's an attempt to figure out how well the model did at guessing the true state of the data.

Oftentimes, evaluating a supervised model requires segmenting the underlying data into three distinct components:

  • Training - the data used to actually train the model
  • Testing - the data used to evaluate the model at intermediate stages, which informs the tuning process
  • Validation - the data used to make a final evaluation of the overall performance of the model. This set of data is never involved in the tuning process and allows for an objective assessment of the model's performance against previously unseen data. Any significant divergence between the model performance levels between testing and validation might suggest overfitting.

Depending on the volume of data available and the complexity of the model, additional evaluation techniques may be applied. If there are few samples available but the model complexity is low, a leave-one-out cross-validation approach might be suitable. If there are insufficient numbers of samples and the model complexity is high, then a K-folds cross-validation approach could be used.

Once the data is properly segmented for evaluation purposes, a number of different assessment calculations can be made, including:

  • Accuracy - a measure of how well the model did at predicting the correct outcomes. In a case where there are two possible outcomes, a model is correct when it predicts true when the actual value is true, and also when it predicts false when the actual value is false.
  • Precision - a measure of how many of the true predictions were actually true.
  • Recall - a measure of how many of the actually true cases were predicted to be true.

As an example, imagine a case where the goal is to identify all transactions that will result in chargebacks. Let's also assume that 99% of all transactions do not involve chargebacks, and only 1% do.

Since chargebacks are fairly uncommon, a naive approach might be to simply claim that all transactions are not likely to result in chargebacks (i.e., predict that all chargebacks are false). Because of the unbalanced nature of the data, where there are very few chargebacks, then the model would have a very high accuracy of 99%. But as far as modeling goes, this is pretty useless since it doesn't really predict based on signals in the underlying data.

A more refined approach would involve a machine learning model that is trained on historical data. That historical data might include distinct records that include an indicator of whether or not chargebacks occurred, some feature derived from Moonsense behavioral data, and any other set of informative features derived from other pertinent sources. In such a case, it is possible to train a model to yield predictions that are informed by signals in the data.

Once the model is trained and predictions are generated for a set of data, the set of predictions can be compared against the known chargeback state (whether they happened or not).

The ratio of the cases predicted to have chargebacks that truly involved chargebacks compared against all predictions of chargebacks would be the precision measure.

The ratio of the cases predicted to have chargebacks that truly involved chargebacks compared against all actual cases of chargebacks would be the recall measure. This recall measure can be improved by reducing the number of cases where actual chargebacks were incorrectly predicted to not be chargebacks.

In the context of chargebacks, the monetary and overall business cost of failing to identify actual cases of chargebacks tends to be higher than inaccurately flagging benign transactions that do not involve chargebacks. It's therefore generally preferable to emphasize recall as a measure in the context of trying to predict chargebacks.

Deploy the Model

Once you’ve sufficiently trained a model and are comfortable with its performance, you can persist the model and deploy it into an operational process so that it can be used to trigger any range of actions.

For instance, let’s say you have a well-performing chargeback detection model that includes Moonsense behavioral data, and you’d like to deploy the model such that it can monitor for future behaviors at checkout that might reveal chargeback fraud.

The model can be deployed such that it ingests a comparable set of independent variables data and yields predictions of chargebacks in near real time. This could allow for a scenario where abnoraml behavior could cause a model to predict a chargeback. The overall operational process can be configured such that a prediction for chargeback results in immediate remediations, such as a block to the transaction.

By being able to predict a chargeback and blocking it immediately, the business can be saved from the cost of a chargeback event.

Such models should be periodically evaluated and re-trained with new data to ensure that the model continues to perform reliably.

Next steps

We hope this guide helps you understand how to utilize Moonsense data in data science projects.

You can learn more by exploring the rest of our documentation.