Information design to support the analysis of organized crime in Northern Italy

Abstract
The phenomenon of organized crime in Italy is current and urgent.
This article talks about one of the possible ways of how design can contribute to the study of such a complex issue.
The described project, a visual analysis of the organized crime in the Northern Italy, aims to be a valuable tool to support the study of the phenomenon, providing to the users (journalists and academics) the ability to analyze the data extracted from the annual reports of the Direzione Nazionale Antimafia. The extractions of data were made possible thanks to the collaboration with the ItaliaNLP Lab. The research covers the time period between 2000 and 2012 and allows the analysis of trends and changes over time in nine provinces in Northern Italy. The design process was characterized by a continuous and constant dialogue with the users, so as to evaluate the usefulness, clarity and value of the project in all its phases. The result of the process of research and design is an interface that allows the users to explore the visualized data.
1. Supporting the study of the phenomenon. The users’ demands.
A serious problem in Italy is the phenomenon of organized crime. Besides, the issue of organized crime in Northern Italy has been controversial for decades.
Faced with such a complex problem, I wondered how a designer could provide support to the study of the phenomenon. One of the possible answers came from the field of information design.
For these reasons I decided to realize my M.Sc. Thesis: an information design project dealing with the phenomenon of organized crime in Northern Italy.
The target audience of the project is specific: journalist, academics and experts. These users already know the issue, so I decided to realize a tool to support them in their studies and researches.
During all the design phases I was in contact with journalists and experts, in order to constantly evaluate the usefulness of the project. Every visualization has been checked by them and by my supervisor Paolo Ciuccarelli.{{1}}
In particular, in the first phase of the project I contacted some journalists so as to understand their actual needs.
From these meetings a few common points are emerged:
– visualizing the names linked to the phenomenon;
– individuating and geolocating the main categories of committed crimes;
– linking the names and the crimes.
2. Extracting the data from the sources
In order to realize the information visualization tool to give answer to these demands, I had to individuate a source from which to extract data.
I decided to use an official source: the annual reports of the Direzione Nazionale Antimafia.{{2}}
These reports contain a chapter dealing with the situation in the main Italian cities: I decided to analyze the paragraphs regarding the Northern cities.
In this phase I needed a tool to automatically extract the data. For this reason I contacted the “Antonio Zampolli” Institute of Computational Linguistics and started a collaboration with the ItaliaNLP Lab{{3}}, a research laboratory that “gathers researchers, postdocs and students from computational linguistics, computer science and linguistics who work on developing resources and algorithms for processing and understanding human languages, with particular attention to the Italian language.”
I selected and prepared the files for the data extraction: the paragraphs dealing with the situation in nine cities in Northern Italy: Bologna, Brescia, Firenze, Genova, Milano, Torino, Trento, Trieste, Venezia. I used the annual reports of the time period 2000-2012.
The ItaliaNLP Lab researchers performed the extraction for my analysis. They used T2K (Text-To-Knowledge){{4}}, a tool that allows you to automatically extract linguistic and domain-specific information from text.
During this phase, 3.999 pages have been analyzed and 25.935 words have been extracted. I personally inspected and cleaned the files using Microsoft Excel.
The extracted data were divided in three main categories: named entities (the names), domain terminology (the specific terms) and matrix of distances (the proximity in the text between names and terms) [Fig.2]. I used from the domain terminology only the terms referring to the committed crimes.
For every term was indicated the frequency in the documents; for the names entities and the specific terms was also indicated the relevance in the document, calculated as tf-idf value.
“The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.”{{5}}
3. The role of design: the visualizations and the interface
At this stage I had all the data I needed, so I began to design the final project.
I decided to realize an interface that allows the users to explore the visualized data. [Fig.3]
This phase was characterized by a constant dialogue with a few possible users as well, so as to validate the clarity of the project and of the visualizations.
The interface is divided into three sections: Persone (People), Vicinanze (Proximities in the text) and Crimini (Crimes). Each section has different views.
All the visualizations have a few common characteristics:
– each term is represented by a geometrical element;
– the size of the element indicates the frequency in the document;
– regarding the names and the crimes, the distance of the element from the center indicates the relevance of the term in the document.
The People section contains:
All the Names view: all the extracted names, organized in alphabetical order or ordered by frequency; [Fig.4]
Geographical view: the names geolocated in the selected cities and divided by year; [Fig.5,6]
Temporal view: a comprehensive view that shows all the geolocated names and all the years. [Fig.7]
The Proximities section is composed of:
All the Groups view: this view shows all the groups that have been individuated during the extraction. Every group contains the terms close to each other within the text; [Fig.8]
Groups by Year view: the groups divided by year. [Fig.9]
The Crimes section contains:
All the Crimes view: this view shows all the crimes extracted in the documents, ordered by frequency or by category. I grouped the terms into 12 categories: public procurement, money laundering, extortion, gambling, illegal immigration, enslavement, kidnapping, prostitution, arms trafficking, human trafficking, waste trafficking, and drug trafficking. [Fig.10]
Geographical view: the crimes geolocated in the cities and divided by year; [Fig.11,12]
Temporal view: a comprehensive view with all the geolocated crimes and all the years. [Fig.13]
This interface allows the user to explore and analyze the extracted data.
When the user hovers the mouse pointer over an element, all the information about the term are visualized. [Fig. 14] Besides, clicking on the element is it possible to access additional information continuing the exploration, so as not to interrupt the work flow.
The project has not been realized yet.
Anyhow, I showed all the screens of the interface to some of the possible users: their comments were extremely useful to obtain a clear final result. Most of the journalists I met weren’t accustomed to refer to data visualization projects so it was very important to me to focus on the immediacy and the clarity of the visualizations.
There are a lot of projects that already use complex data in order to analyze and understand crime: this is one of the many examples of how information design can be useful to support the analysis and the study of complex phenomena.
 
Bibliographical references
Bonin F., Dell’Orletta F., Montemagni S., Venturi G. (2012). Lessico settoriale e lessico comune nell’estrazione di terminologia specialistica da corpora di dominio. In Ferreri, S. (edited by), Lessico e lessicologia. Società di linguistica italiana.
Dell’Orletta F., Lenci A., Marchi S., Montemagni S., Pirrelli V., Venturi G. (2008). Dal testo alla conoscenza e ritorno: estrazione terminologica e annotazione semantica di basi documentali di dominio. AIDAinformazioni: Rivisa di Scienze dell’informazione, vol. 26 (1-2), 197-218.
Dell’Orletta F., Venturi G., Cimino A., Montemagni S. (2014). T2K2: a System for Automatically Extracting and Organizing Knowledge from Texts. Proceedings of 9th International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, 26-31 May 2014. Curran Associates, Inc.
 
[[1]] www.densitydesign.org/person/paolo-ciuccarelli/[[1]]
[[2]] www.giustizia.it/giustizia/it/mg_2_10_1.wp[[2]]
[[3]] www.italianlp.it[[3]]
[[4]] www.italianlp.it/demo/t2k-text-to-knowledge[[4]]
[[5]] en.wikipedia.org/wiki/Tf-idf[[5]]

Lascia un commento