From Words to Networks: Information and Relation Extraction from Text Data and Analysis of Socio-technical Networks

Institution: see Organisers & Acknowledgements

Program of study: International Research Workshop

Lecturer: Jana Diesner (The iSchool at the University of Illinois at Urbana-Champaign)

Date:

01.10.2012, 14:00 – 17:30
02.10.2012, 14:00 – 17:30
04.10.2012, 14:00 – 17:30

Room: n.s.

Max. number of participants: 20

Semester periods per week: n.s.

Credit Points: 5 CP for participating in the whole IRWS

Language of instruction: English

Contents:

1. What is covered in the workshop, what will you learn?

In this workshop, you will learn how to extract information about socio-technical networks from unstructured, natural language text data, how to analyze network data, and how to use the results for practical purposes. We will discuss practical applications from academia, administration and business, such as answering substantive questions about online and offline networks, designing policies and interventions, and tracking trends and opinions.

Socio-technical networks represent interactions between people and organizations, infrastructures and information in systems such as corporations, countries and communities of practice. The functioning and dynamics of these networks involve the continuous production, processing and flow of information. This information is often available as text data and can serve as a single or complementary source of information about networks. Examples for data sources include communication data such as conversations transcripts and emails, news wire data, scientific information such as publications and patents, self-presentations such as mission statements and annual reports, and social media data such as blogs and wikis. Using text data to construct or enhance network data has helped people to answer questions like:

  • Who talks to whom, and about what?
  • What are the mental models of individuals or groups about certain topics?
  • How do memes and innovations emerge, spread, change and vanish in society?
  • Who are the key players in a network? What tasks, resources and knowledge are they linked to?
  • What benefits and risks result from an observed network structure for the network and its wider context?

In this workshop, you will learn how to do the following:

  • Put the network analysis process into action to answer questions and solve problems.
  • Extract relevant information and network data from text data in an informed, systematic and efficient fashion.
  • Use network analysis software to visualize and analyze network data, inlcuding grouping and simulating what-if scenarios.
  • Develop actionable interpretations of text analysis and network analysis results.

You are introduced to a selected set of theories, concepts and methods from text analysis and network analysis. You gain practical, hands-on experience in working text mining and network analysis software. You will perform basic natural language processing techniques on the lexical, syntactic and semantic level including:

  • Identification of key concepts and themes from single documents and text collections.
  • Creation and application of codebooks and thesauri.
  • Identification and classification of entities. We will move beyond the classic set of named entities to facilitate the detection of nodes that represent categories that are also relevant for studying socio-technical network, such as tasks, resources and knowledge.
  • Filtering and pre-processing techniques such as stemming, parts of speech tagging, and N-gram detection.
  • Relation Extraction, i.e. distilling socio-technical networks from text data.

Going from texts to networks involves basic some principles and strategies originating from computer science that are not only applicable to the task at hand, but to a wide range of problems. These principles and strategies are referred to as Computational Thinking – a basic skill like reading, writing and arithmetic that is essential for solving problems and understanding human behavior (Wing 2006). You are introduced to the concept of Computational Thinking and learn how to apply this way of thinking to the problems addressed in the workshop.

Summary of learning goals:

1. Information and Relation Extraction: Gain theoretical, methodological and practical experience in distilling relevant information and network data from text data. Learn how this process can be used for practical purposes in academia and business.

2. Network Analysis: Gain theoretical, methodological and practical experience in visualizing, analyzing and interpreting network data. Learn how the results can be used for practical purposes in academia and business.

3. Computational Thinking: Be introduced to a fundamental approach to problem solving and actively apply your expertise.

2. Schedule

Day 1 – 01.10.2012:

Introduction:

  • Analysis of socio-technical networks
  • Semantic Networks and Information Networks
  • Intersection of Text Analysis and Network Analysis

Hands-on training:

  • Information Extraction
  • Relation Extraction
  • Network Visualization

Day 2 – 02.10.2012:

Hands-on training:

  • Network Analysis
  • Node-level and graph-level analysis
  • Grouping
  • Simulation of what-if scenarios
  • One-mode and multi-mode networks

Introduction and hands-on exercises:

  • Interpretation and actionable use of results
  • Evaluation of data and results
  • Data privacy and data security issues related to network analysis

3. Who should attend?

This is an interdisciplinary and interactive workshop designed to benefit from the participation of people from different backgrounds. The material, exercises and mode of delivery are suitable for researchers and practitioners alike. No specific prior knowledge or computational skills are required. The delivery is driven towards forming an understanding of fundamental concepts and gaining hands-on experience with relational data analysis methods and tools.

We will email you additional material prior to the workshop.

4. What to bring to the workshop?

Software: The software used in this workshop runs on Windows. Please download and install the following tools prior to the workshop:

Data: You can work with the sample data that we provide for the workshop and/or bring your own data. We provide two small sample datasets; one with email data and one with news articles. If you bring your own text data, we recommend a sample of not more than 20 texts that are not longer than 5 pages each. If you bring your own network data, we recommend data with not more than 200 nodes. The tools we use scale up to larger data sets, but those might not be practical for training purposes.

5. Readings

For everybody:

Overview on the concepts and methods addressed in the workshop:

Diesner, J., & Carley, K. M. (2010) Extraktion relationaler Daten aus Texten. In C. Stegbauer & R. Häußling (Eds.), Handbuch Netzwerkforschung (pp. 507 -524) Vs Verlag. (We will email regisrered participants a copy).

For more/ specialized information:

New to Information Extraction/text mining?

McCallum, A. (2005). Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9), 48-57. URL: http://www.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf

New to Social Network Analysis?

Hanneman, RA & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California. URL: http://www.faculty.ucr.edu/~hanneman/nettext/

Easley, D. & Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. URL: http://www.cs.cornell.edu/home/kleinber/networks-book/

New to Computational Thinking?

Wing, J. M. (2006). Computational thinking. Communications of the ACM, 49(3), 33 – 35. URL: http://www.cs.cmu.edu/afs/cs/usr/wing/www/publications/Wing06.pdf

6. Information about the instructor

Jana Diesner is an Assistant Professor at the University of Illinois at Urbana-Champaign, The iSchool/Graduate School of Library and Information Science. She conducts research at the nexus of machine learning, natural language processing and network analysis. She develops and analyzes methods and technologies for extracting information about networks from text data and considering the content of information for network analysis. Her goal is to contribute to a better understanding and rigorous computational analysis of the interplay and co-evolution of information and the structure and functioning of socio-technical networks. In her empirical work, Jana studies networks from the geopolitical, business and science domain. She is particularly interested in covert information and covert networks.

7. Questions?

Contact Jana with any questions about the course at jdiesner@illinois.edu.

You have to register for the 6th International Research Workshop to participate in this course.