University Paris-Sud Interactive Visualization for IMDb database Interactive Information Visualization Walter FERREIRA 2/18/2014 Dataset description: This IMDb database version is composed by two separate files. The first one represents the movie database itself. It is composed by main characteristics of each movie, such as title, budget, genre, rating and crew. However the crew information is not complete, in this first file it is possible to find only the id codes of each person with relation to the second file. This second one is composed by specific information about actors, directors, producers, writers and some other professional categories. In order to have the whole information about a movie, it is compulsory to merge both the database files programmatically, doing so it is possible to have richer information about movies and actors. The focus of this visualization is directed towards the actors, director, producers and writers and how they are related among them. They are represented as a social network and this type of visualization brings a few known issues. The first of them is scalability when displaying hundreds of thousands of nodes. For instance, the crew database is composed by around 250.000 people, which makes impossible the task of representing everybody at once in the same visualization. Therefore I submitted the crew dataset to a preprocessing before building the social network. Out of the 250.000 is easy to say most of them are not known by the common people, since we are talking about famous people and stars, so the program does not lose a lot of information if unknown people are not considered. For the scope of this visualization 100 people among actors, directors, producers and writers were chosen according to the IMDb popularity rank. Data encoding: The main encoding property of this visualization is the node-link graph representation for social networks. I built the layout based on LinLog force field algorithms with clustering to be able to show which people cooperate more frequently between them. Each node represents one person. The nodes have circular formatting and the area of the circle represents how many movies this person has done considering the people that are also displayed in the social network. The default configuration shows all the nodes with the same color, but it is possible to highlight the clusters by a key command. The edges represent cooperation between people. All cooperation between two people is merged together into a single edge and is represented by the weight of the edges and clusters. Looking at the basic visualization it is not possible to see the weights of the graph, it is necessary to hover one node to see all the connections highlighted and the proper weights of each node. However it always possible to have the notion of people that work often together by taking a look at the cluster formation. Colors and transparency are used to help identify nodes and highlight information. All the nodes are initially blue, but when the user hovers any node, it turns to red, all its node connections turn to black and the nodes that are not related to it become a bit transparent. After clicking on one node, the vision is blocked to that node and if the user hovers one of the related nodes, initially in black, it turns to red to help identifying the connection between them. Information about movies appears by default next to the mouse cursor, as if the mouse was a magic lens, but there is also the option to show these information at the corner of the screen. For more details concerning interaction techniques available, please refer to the following sessions. Technical aspects: The project was developed under Processing environment (v.2.1) that provides tools to deal easier with graphical representation, yet using Java libraries. In order to build the node-link layout, I used a modified version of the LinLogLayout for Java Swings developed by A. Noack and distributed as a free software under the GNU free software license. Interactions: A few interaction techniques are available in this project to help the user better explore the data. See them on the following list: Key commands to filter data encoding; Pan; Zoom; Brushing by hovering and clicking; Augmentation on mouse hovering, close to magic lens. The interactions will be explained in the following session with pictures and more details on how to explore the dataset. How to use: The following picture represents the default visualization presented to the user once the program is all loaded. The user has the choice of hiding the links by pressing the key “L” and he can also decide to highlight or not the clusters by pressing the key “C”. The next picture shows the visualization with clusters highlighted and without links. By tapping the commands “I” and “O” the user can zoom in and out, respectively, and by clicking and dragging it is possible to pan the visualization. Below is a representative image of the zoomed in graph. In order to discover all the movies that a particular person did, the user can press and hold the CTRL key and then hover a particular node. By default, the list of movies appears just next to the mouse cursor. If the user prefer it not to be attached to the mouse, he can press the key “F” and the list will be displayed at the top left of the screen. The list is sorted by rating is descending order. On the images below, we are looking at Richard Gere’s movies. In order to see the connections of a particular actor, it suffices to hover the mouse over the node representing the actor. Once the nodes are highlighted, if you click on the node, the nodes become fixed and you can navigate through the peer actors to see the connections between them. If you hover on the peer, the movies will be displayed by default next to the mouse, or at the top left corner if the user prefers. Below, see all of Woody Allen’s connections and then his movies with Scarlett Johansson. Challenges and limitations: At first, I was trying to create my own layout method using circular layout and animations, but it turned out to be too hard and with very little scalability, so I decided to tackle the layout with the force field approach. The algorithm for displaying the node-link layout has some limitations and it scales for a few thousand nodes. I tried running the algorithm for the whole set of crew, but after more than 10 minutes it crashed. Therefore, I was obliged to reduce the dataset. One current issue of the visualization is the zoom. The coordinates system in processing works in a blurry way and when the user zooms in or out, the coordinates are lost and I did not find a way to transform between screen coordinates and scaled coordinates. Therefore, it’s not possible to hover objects while zooming. Improvements and future work: A good improvement would be to allow the user to decide who the people that are meaningful to him are. A small set of the top 100 people from IMDb may be meaningful, but still not everybody has the same taste in movies. It would be interesting to add a first screen before the proper visualization to choose who they want to see, or do a search tool in the visualization itself where the user can look for someone and add them dynamically. References: Processing references, examples, tutorial and forum: http://processing.org/ LinLogLayout for Java Swings, by A. Noack: https://code.google.com/p/linloglayout/ The Avis class website, particularly the lessons about Graphs, Interactions and Visual Variables: http://www.aviz.fr/Teaching2013/Schedule
© Copyright 2026 Paperzz