From Words to Networks – Information and Relation Extraction from Text Data and Network Analysis

Institution: see Organisers & Acknowledgements

Program of study: International Research Workshop

Lecturer: Jana Diesner, PhD, Assistant Professor at the iSchool/Graduate School of Library and Information Science (GSLIS), University of Illinois at Urbana-Champaign (UIUC)

Date:

30.09.2013, 14:00 – 17:30
01.10.2013, 14:00 – 17:30
02.10.2013, 14:00 – 17:30

Room: n.s.

Max. number of participants: 20

Semester periods per week: n.s.

Credit Points: 5 CP for participating in the whole IRWS

Language of instruction: English

Contents:

1. What is covered in the workshop? What will you learn?

This interdisciplinary workshop introduces you to selected fundamental theories, concepts, methods and applications for bringing together text analysis and network analysis. You will learn how to conduct data analysis at the nexus of these areas in an informed, systematic and efficient fashion, and how to:

  • Construct semantic networks and socio-technical networks from unstructured, natural language text data.
  • Visualize and analyze network data.
  • Interpret network analysis results.

Throughout the workshop, we will discuss practical applications from the academic, administrative and business domain. At the end of the workshop, you will be able to design and conduct research projects for scholarly and commercial use in these fields.

Semantic networks are structured representations of information and knowledge. Socio-technical networks represent interactions between social agents, infrastructures and information. The functioning and dynamics of these networks involve the continuous production, processing and flow of information. This information is often available as text data, and can serve as a single or complementary source of information about networks. Examples for data sources include news wire data, scientific information such as publications and patents, communication data such as conversations transcripts and emails, self-presentations such as mission statements and annual reports, and social media data such as tweets and wikis. Using text data to construct or enhance network data has been used to answer questions such as:

  • Who is talking to whom, and about what?
  • What are the mental models of individuals or groups about certain topics?
  • How do memes and innovations emerge and spread in society and online?
  • Who are the key entities in a network?
  • What benefits and risks result from an observed network structure for an organization and its wider context?

The main component of this workshop is to teach to you practical, hands-on skills in working with text analysis and network analysis tools. You will perform basic natural language processing techniques on the lexical, syntactic and semantic level including:

  • Pre-process texts with techniques such as reference resolution, stemming and parts of speech tagging.
  • Identify salient concepts and themes from single documents and entire text collections.
  • Create and apply codebooks, which are also known as dictionaries or thesauri.
  • Locate and classify entities that can serve as nodes for networks. We will move beyond the classic set of entity classes (people, organizations, locations) to also consider other classes that relevant for studying social processes and culture, e.g. tasks, resources and knowledge.
  • Relation Extraction, linking entities into edges based on various criteria.

You will also perform basic network analysis techniques, including:

  • Manipulate and visualize network data.
  • Compute basic network metrics on the graph and node level.
  • Identify meaningful groups and clusters of nodes.

Going from texts to networks involves some principles and strategies originating from computer science that are not only applicable to the task at hand, but to a wide range of problems. These principles and strategies are referred to as “Computational Thinking” – a basic skill like reading, writing and arithmetic that is crucial for solving problems and understanding human behavior across fields (Wing 2006). In this workshop, you are introduced to Computational Thinking and practice applying this way of thinking.

3. Who should attend?

This is an interdisciplinary and interactive workshop designed to benefit from the participation of attendants from different backgrounds. The material, exercises and mode of delivery are suitable for researchers and practitioners alike. No specific prior knowledge or computational skills are required. The delivery is driven towards forming an understanding of fundamental concepts and gaining hands-on experience with text analysis and network analysis methods and tools.

4. What to bring to the workshop?

Software: Prior to the workshop, we will send an email to confirmed participants with links to the software tools that we will use for the workshop. You are invited to bring a laptop to the workshop. If you cannot bring a laptop you will still fully benefit from the workshop as we screen-project all live walk-through exercises. At the workshop, we will provide you with a tutorial document and further learning resources.

Data: You can work with the sample data that we provide you with and/ or bring your own data. If you bring your own text data, we recommend a sample of not more than 20 text documents of less than two pages in length, and network data with not more than 200 nodes. The tools we use scale up to larger data sets, but large-scale data might not be practical for training purposes.

5. Readings

Prior to the workshop, we ask people to go read the following overviews on the concepts and methods addressed in the workshop (copies of both papers will be emailed to confirmed participants prior to the workshop):

  • Diesner, J., Carley, K. M. (2011): Semantic Networks. In G. Barnett (Ed), Encyclopedia of Social Networking, (pp. 595-598). Sage Publications.
  • Diesner, J., Carley, K. M. (2011): Words and Networks. In G. Barnett (Ed.), Encyclopedia of Social Networking, (pp. 958-961). Sage Publications.

All further readings are optional:

The instructor is available for pointing participants to further readings in their areas of interest.

6. Information about the instructor

Jana Diesner is an Assistant Professor at the iSchool (a.k.a. Graduate School of Library and Information Science) at the University of Illinois at Urbana-Champaign. Jana conducts research at the nexus of network science, natural language processing and machine learning. With her work, she aims to advance the understanding and computational analysis of the interplay and co-evolution of information and socio-technical networks. She develops and analyzes methods and technologies for extracting information about networks from text data and considering the substance of information for network analysis. In her empirical work, she studies networks from the business, science and geopolitical domain. She is particularly interested in covert information and covert networks. Jana obtained her PhD from Carnegie Mellon University, School of Computer Science. She has taught the “Words to Networks” workshop 24 times before at various institutions, and also teaches courses on Social Computing, Network Analysis and Digital Humanities. For more information about Jana see http://people.lis.illinois.edu/~jdiesner/.

7. Questions?

Contact Jana with any questions about the workshop.

You have to register for the 7th International Research Workshop to participate in this course.