Monitorito: Frequently Asked Questions

Basics
Terminology
How it works
Statistics
Other

Basics
Do I need an account to install the application ?

No account is needed to install and use the application. The application is installed as a browser extension. So, you only have to first download the browser that you want to use and then install the extension through the Extensions Store of this browser. You can find the corresponding links for each supported browser in the home page.
Do you have access to my data ?

No, your data are safe. The data are only stored locally in the user's computer and the application does not send any data from the user's browsing activity to anyone.
What data does the application capture ?

The application only captures requests being made by the browser from which the application is running, so requests being made by other browsers or by the system (e.g. system updates) are not captured by the application. Furthermore, the application only captures requests being made after the application has started executing.
What if my question is not included in this section ?

If you have a question missing from this section, feel free to send us your question here. We will try to reply as soon as possible and we will consider adding your question here.

Terminology
What is the difference between a root request and an embedded request ?

A root request is a request being made directly from the user, such as when the user enters a new URL in the browser bar or when the user clicks at a link in a page. An embedded requests is being made as a dependency of a root request. For instance, when we visit a website, there are multiple requests being made for the images of the page, the scripts and stylesheets needed. All these requests are considered embedded requests, that are being triggered by the root request of the main page.
What is a first-party domain and a first-party cookie ?

A first-party domain is a domain, containing at least one root request at one of its resources. A first-party cookie is a cookie being sent by a root request.
What is a third-party domain and a third-party cookie ?

A third-party domain is a domain, containing only embedded requests at its resources. A third-party cookie is a cookie being sent by an embedded request.

How it works
What does the visualised graph represent ?

Monitorito captures your browsing activity and represents it in the form of a graph. This graph is mainly consisted of circular nodes, which correspond to the domains you have visited, and edges between these nodes. An edge might correspond to a request being made by a domain to a different domain or to a redirect from a domain to a different domain. The color and style of the edges reveal the events they contain. The user can expand domain nodes, generating a diamond node for each resource of the domain, where all these nodes will be linked to their domain and to each other, depending on the requests and redirects being made. The user can also cluster multiple domain nodes into a single cluster node, which will be a circular node of bigger size than the domain nodes.
Why do nodes have different sizes ?

Domain nodes have different size, depending on the way you accessed them. First-party domains have bigger size than third-party domains (look above question for different between first-party and third-party domains). The same logic applies to resource nodes as well.
Why do edges have different color and styling ?

Edges of the graph have different color and styling depending on their properties. If an edge is colored red (or orange for edges between resource nodes), this means that at least one of the requests and redirects included in this edge contains an HTTP Referer header. If an edge is colored grey (or brown-green for edges between resource nodes), this means that no request or redirect in this edge contains a Referer header. So, the coloring visualises whether the destination node of an edge has acquired knowledge that you have visited the source domain (in other words is "tracking" you). Regarding the styling, if an edge contains only requests, it will be a solid one. If an edge contains only redirects, it will be a dotted one. If an edge contains both requests and redirects, it will be a dashed one.
What's the point of "expanding" a domain node ?

The default mode of the graph displays the domain nodes and connections between these nodes. This mode is suitable when one wants to get a high-level view of the browsing activity, such as the visited domains and the connections between them. However, if one wants to investigate a specific event in lower level, this mode is not so helpful. For instance, a user might want to investigate the workflow of a specific web protocol, such as OAuth. To achieve this, the user can expand the domain nodes of interest and observe the resource nodes and the edges between them.
What's the point of "clustering" different domain nodes ?

You can use clustering to aggregate multiple domains, forming a new node. A common case is when you want to handle several sub-domains as a single node. For instance, you might want to cluster all sub-domains of a tracking domain (let's say tracking.com), to aggregate the acquired knowledge of all the sub-domains to a single node, since all the knowledge naturally belongs to the root domain. You can achieve this by providing the root domain in the clustering form. Note that statistics are also calculated for cluster nodes and this is how you can aggregate metrics of all the included sub-domains to a single node. You can also define a list of domains that you want to cluster, if you want to cluster unrelated domains. For instance, you can provide the domains google.com and youtube.com, to aggregate all their sub-domains in a single cluster node, since they belong to the same organisation.
How is filtering applied ?

Each time you filter the graph, you provide a set of conditions related to nodes. You also define whether you want to show or hide the nodes matching the criteria. During the filtering, the provided criteria are tested against each node. In the end, the matched nodes are shown (or hidden), depending on your choice. An edge will be shown after filtering, only if both the destination and the source node of this edge are visible.
What happens to current filters, clusters and expanded domain nodes during active browsing ?

If there are active clusters, filters or expanded domain nodes while the user is still browsing to new websites, then the new incoming nodes will be added to the graph without considering them. For instance, a node that would normally belong to a cluster will not be added to it, if the specific domain is visited after the cluster is being created. Another example is incoming nodes that will always be visible and will not be checked by the currently applied filters. As a result, it is recommended that the user applies these functionalities (filtering, clustering, expanding) after having visited the whole dataset of interest. Alternatively, the user can reset filters, clusters and collapse all expanded domain nodes, before visiting new websites.

Statistics
What are the exact statistics calculated for the graph ?

The statistics calculated about the nodes of the graph are the following:
- percentage of first-party domains (visited directly by the user)
- percentage of third-party domains (visited in the background, as dependencies)
There are 4 different categories of edges in the graph depending on the following criteria: incoming vs outgoing, referral vs non-referral. For each different category, the following statistics are being calculated across all the nodes of the graph:
- maximum
- minimum
- average
- standard deviation
Can you simply explain the main idea behind the node metrics ?

Regarding the phishing websites, we have observed that a significant percentage of phishing domains do not have many connections to other domains (so that they are not easily identified through logs etc.). So, the phishing metric classifies domains, according to the degree of connectivity to the rest of the graph. Regarding tracking, we considered the HTTP Referer header as the main mechanism of tracking used by websites. So, when a request includes a Referer Header, this means that the destination resource's domain will acquire knowledge that the user linked with the specific cookie has also previously visited the resource denoted by the header field value. As a result, the more incoming referral (red) edges a domain node contains, the more tracking capabilities this domain possesses. Following the same approach, the more outgoing referral (red) edges a domain node contains, the more vulnerable to tracking this domain is. We call these kinds of domains leaking domains, because they "leak" information to tracking domains.
How are the per-node metrics calculated ?

Currently, there are 4 defined per-node metrics and they are calculated as explained below:
- Phishing Metric: 1 divided by the total number of outgoing and incoming edges
- Tracking Metric: ratio of incoming referral edges of the selected domain to the maximum number of incoming referral edges of a node across the whole graph
- Tracking Cookies Metric: percentage of third-party cookies to total number of cookies of the selected domain domain
- Leaking Metric: sum of squares of values of Tracking Metric for all neighbour nodes reached by outgoing referral edges from the selected domain, divided by the number of these nodes
Can I define my own node metrics ?

Currently, there is not a capability to add new user-defined metrics from the user interface. However, one can easily define a new metric, slightly modifying the source code and the new metric will be accordingly included in all other functionalities, such as filtering. The source code is available here. To add a new metric, just define a new metric here, which complies with the interface of this class.

Other
Is the application fully functional in all browsers ?

The ideal browser for the application is Chrome, because it provides the full set of functionalities. All functionalities are also provided in Opera. In Firefox, the only difference is that the body of HTTP requests cannot be captured, due to some APIs not yet implemented by the Firefox team. From our experience, the application has the optimal performance, when executed in Chrome.
How can I use the exported data ?

The data are exported in .csv format, since it is a standard format that is widely accepted by multiple analytics tools. You can import and analyse the exported graph, using several analytics tools, such as Apache Spark, MapReduce etc. We have experimented importing and analysing the data, using the Neo4j graph database. You can use these scripts as a reference to import the exported graph data from Monitorito to a Neo4j database.
Can I visit a very big number of websites with Monitorito ?

Monitorito has 2 modes of operation: the online and the offline mode. During the offline mode, the graph is generated and stored but not visualised, so the memory consumption is singificantly reduced. For this reason, if you want to visit bigger datasets with Monitorito, the offline mode is the most suitable and you can export the graph in the end to import it to a more powerful analytics tool. You can also automate this process, so that you do not have to manually visit all the websites in your browser. We have already done this using Selenium and you can find a sample script as a reference here.

Basics

Terminology

How it works

Statistics

Other