Tag Archives: Computational Thinking

From Words to Networks – Information and Relation Extraction from Text Data and Network Analysis

Institution: see Organisers & Acknowledgements

Program of study: International Research Workshop

Lecturer: Jana Diesner, PhD, Assistant Professor at the iSchool/Graduate School of Library and Information Science (GSLIS), University of Illinois at Urbana-Champaign (UIUC)

Date:

30.09.2013, 14:00 – 17:30
01.10.2013, 14:00 – 17:30
02.10.2013, 14:00 – 17:30

Room: n.s.

Max. number of participants: 20

Semester periods per week: n.s.

Credit Points: 5 CP for participating in the whole IRWS

Language of instruction: English

Contents:

1. What is covered in the workshop? What will you learn?

This interdisciplinary workshop introduces you to selected fundamental theories, concepts, methods and applications for bringing together text analysis and network analysis. You will learn how to conduct data analysis at the nexus of these areas in an informed, systematic and efficient fashion, and how to:

  • Construct semantic networks and socio-technical networks from unstructured, natural language text data.
  • Visualize and analyze network data.
  • Interpret network analysis results.

Throughout the workshop, we will discuss practical applications from the academic, administrative and business domain. At the end of the workshop, you will be able to design and conduct research projects for scholarly and commercial use in these fields.

Semantic networks are structured representations of information and knowledge. Socio-technical networks represent interactions between social agents, infrastructures and information. The functioning and dynamics of these networks involve the continuous production, processing and flow of information. This information is often available as text data, and can serve as a single or complementary source of information about networks. Examples for data sources include news wire data, scientific information such as publications and patents, communication data such as conversations transcripts and emails, self-presentations such as mission statements and annual reports, and social media data such as tweets and wikis. Using text data to construct or enhance network data has been used to answer questions such as:

  • Who is talking to whom, and about what?
  • What are the mental models of individuals or groups about certain topics?
  • How do memes and innovations emerge and spread in society and online?
  • Who are the key entities in a network?
  • What benefits and risks result from an observed network structure for an organization and its wider context?

The main component of this workshop is to teach to you practical, hands-on skills in working with text analysis and network analysis tools. You will perform basic natural language processing techniques on the lexical, syntactic and semantic level including:

  • Pre-process texts with techniques such as reference resolution, stemming and parts of speech tagging.
  • Identify salient concepts and themes from single documents and entire text collections.
  • Create and apply codebooks, which are also known as dictionaries or thesauri.
  • Locate and classify entities that can serve as nodes for networks. We will move beyond the classic set of entity classes (people, organizations, locations) to also consider other classes that relevant for studying social processes and culture, e.g. tasks, resources and knowledge.
  • Relation Extraction, linking entities into edges based on various criteria.

You will also perform basic network analysis techniques, including:

  • Manipulate and visualize network data.
  • Compute basic network metrics on the graph and node level.
  • Identify meaningful groups and clusters of nodes.

Going from texts to networks involves some principles and strategies originating from computer science that are not only applicable to the task at hand, but to a wide range of problems. These principles and strategies are referred to as “Computational Thinking” – a basic skill like reading, writing and arithmetic that is crucial for solving problems and understanding human behavior across fields (Wing 2006). In this workshop, you are introduced to Computational Thinking and practice applying this way of thinking.

3. Who should attend?

This is an interdisciplinary and interactive workshop designed to benefit from the participation of attendants from different backgrounds. The material, exercises and mode of delivery are suitable for researchers and practitioners alike. No specific prior knowledge or computational skills are required. The delivery is driven towards forming an understanding of fundamental concepts and gaining hands-on experience with text analysis and network analysis methods and tools.

4. What to bring to the workshop?

Software: Prior to the workshop, we will send an email to confirmed participants with links to the software tools that we will use for the workshop. You are invited to bring a laptop to the workshop. If you cannot bring a laptop you will still fully benefit from the workshop as we screen-project all live walk-through exercises. At the workshop, we will provide you with a tutorial document and further learning resources.

Data: You can work with the sample data that we provide you with and/ or bring your own data. If you bring your own text data, we recommend a sample of not more than 20 text documents of less than two pages in length, and network data with not more than 200 nodes. The tools we use scale up to larger data sets, but large-scale data might not be practical for training purposes.

5. Readings

Prior to the workshop, we ask people to go read the following overviews on the concepts and methods addressed in the workshop (copies of both papers will be emailed to confirmed participants prior to the workshop):

  • Diesner, J., Carley, K. M. (2011): Semantic Networks. In G. Barnett (Ed), Encyclopedia of Social Networking, (pp. 595-598). Sage Publications.
  • Diesner, J., Carley, K. M. (2011): Words and Networks. In G. Barnett (Ed.), Encyclopedia of Social Networking, (pp. 958-961). Sage Publications.

All further readings are optional:

The instructor is available for pointing participants to further readings in their areas of interest.

6. Information about the instructor

Jana Diesner is an Assistant Professor at the iSchool (a.k.a. Graduate School of Library and Information Science) at the University of Illinois at Urbana-Champaign. Jana conducts research at the nexus of network science, natural language processing and machine learning. With her work, she aims to advance the understanding and computational analysis of the interplay and co-evolution of information and socio-technical networks. She develops and analyzes methods and technologies for extracting information about networks from text data and considering the substance of information for network analysis. In her empirical work, she studies networks from the business, science and geopolitical domain. She is particularly interested in covert information and covert networks. Jana obtained her PhD from Carnegie Mellon University, School of Computer Science. She has taught the “Words to Networks” workshop 24 times before at various institutions, and also teaches courses on Social Computing, Network Analysis and Digital Humanities. For more information about Jana see http://people.lis.illinois.edu/~jdiesner/.

7. Questions?

Contact Jana with any questions about the workshop.

You have to register for the 7th International Research Workshop to participate in this course.

From Words to Networks: Information and Relation Extraction from Text Data and Analysis of Socio-technical Networks

Institution: see Organisers & Acknowledgements

Program of study: International Research Workshop

Lecturer: Jana Diesner (The iSchool at the University of Illinois at Urbana-Champaign)

Date:

01.10.2012, 14:00 – 17:30
02.10.2012, 14:00 – 17:30
04.10.2012, 14:00 – 17:30

Room: n.s.

Max. number of participants: 20

Semester periods per week: n.s.

Credit Points: 5 CP for participating in the whole IRWS

Language of instruction: English

Contents:

1. What is covered in the workshop, what will you learn?

In this workshop, you will learn how to extract information about socio-technical networks from unstructured, natural language text data, how to analyze network data, and how to use the results for practical purposes. We will discuss practical applications from academia, administration and business, such as answering substantive questions about online and offline networks, designing policies and interventions, and tracking trends and opinions.

Socio-technical networks represent interactions between people and organizations, infrastructures and information in systems such as corporations, countries and communities of practice. The functioning and dynamics of these networks involve the continuous production, processing and flow of information. This information is often available as text data and can serve as a single or complementary source of information about networks. Examples for data sources include communication data such as conversations transcripts and emails, news wire data, scientific information such as publications and patents, self-presentations such as mission statements and annual reports, and social media data such as blogs and wikis. Using text data to construct or enhance network data has helped people to answer questions like:

  • Who talks to whom, and about what?
  • What are the mental models of individuals or groups about certain topics?
  • How do memes and innovations emerge, spread, change and vanish in society?
  • Who are the key players in a network? What tasks, resources and knowledge are they linked to?
  • What benefits and risks result from an observed network structure for the network and its wider context?

In this workshop, you will learn how to do the following:

  • Put the network analysis process into action to answer questions and solve problems.
  • Extract relevant information and network data from text data in an informed, systematic and efficient fashion.
  • Use network analysis software to visualize and analyze network data, inlcuding grouping and simulating what-if scenarios.
  • Develop actionable interpretations of text analysis and network analysis results.

You are introduced to a selected set of theories, concepts and methods from text analysis and network analysis. You gain practical, hands-on experience in working text mining and network analysis software. You will perform basic natural language processing techniques on the lexical, syntactic and semantic level including:

  • Identification of key concepts and themes from single documents and text collections.
  • Creation and application of codebooks and thesauri.
  • Identification and classification of entities. We will move beyond the classic set of named entities to facilitate the detection of nodes that represent categories that are also relevant for studying socio-technical network, such as tasks, resources and knowledge.
  • Filtering and pre-processing techniques such as stemming, parts of speech tagging, and N-gram detection.
  • Relation Extraction, i.e. distilling socio-technical networks from text data.

Going from texts to networks involves basic some principles and strategies originating from computer science that are not only applicable to the task at hand, but to a wide range of problems. These principles and strategies are referred to as Computational Thinking – a basic skill like reading, writing and arithmetic that is essential for solving problems and understanding human behavior (Wing 2006). You are introduced to the concept of Computational Thinking and learn how to apply this way of thinking to the problems addressed in the workshop.

Summary of learning goals:

1. Information and Relation Extraction: Gain theoretical, methodological and practical experience in distilling relevant information and network data from text data. Learn how this process can be used for practical purposes in academia and business.

2. Network Analysis: Gain theoretical, methodological and practical experience in visualizing, analyzing and interpreting network data. Learn how the results can be used for practical purposes in academia and business.

3. Computational Thinking: Be introduced to a fundamental approach to problem solving and actively apply your expertise.

2. Schedule

Day 1 – 01.10.2012:

Introduction:

  • Analysis of socio-technical networks
  • Semantic Networks and Information Networks
  • Intersection of Text Analysis and Network Analysis

Hands-on training:

  • Information Extraction
  • Relation Extraction
  • Network Visualization

Day 2 – 02.10.2012:

Hands-on training:

  • Network Analysis
  • Node-level and graph-level analysis
  • Grouping
  • Simulation of what-if scenarios
  • One-mode and multi-mode networks

Introduction and hands-on exercises:

  • Interpretation and actionable use of results
  • Evaluation of data and results
  • Data privacy and data security issues related to network analysis

3. Who should attend?

This is an interdisciplinary and interactive workshop designed to benefit from the participation of people from different backgrounds. The material, exercises and mode of delivery are suitable for researchers and practitioners alike. No specific prior knowledge or computational skills are required. The delivery is driven towards forming an understanding of fundamental concepts and gaining hands-on experience with relational data analysis methods and tools.

We will email you additional material prior to the workshop.

4. What to bring to the workshop?

Software: The software used in this workshop runs on Windows. Please download and install the following tools prior to the workshop:

Data: You can work with the sample data that we provide for the workshop and/or bring your own data. We provide two small sample datasets; one with email data and one with news articles. If you bring your own text data, we recommend a sample of not more than 20 texts that are not longer than 5 pages each. If you bring your own network data, we recommend data with not more than 200 nodes. The tools we use scale up to larger data sets, but those might not be practical for training purposes.

5. Readings

For everybody:

Overview on the concepts and methods addressed in the workshop:

Diesner, J., & Carley, K. M. (2010) Extraktion relationaler Daten aus Texten. In C. Stegbauer & R. Häußling (Eds.), Handbuch Netzwerkforschung (pp. 507 -524) Vs Verlag. (We will email regisrered participants a copy).

For more/ specialized information:

New to Information Extraction/text mining?

McCallum, A. (2005). Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9), 48-57. URL: http://www.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf

New to Social Network Analysis?

Hanneman, RA & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California. URL: http://www.faculty.ucr.edu/~hanneman/nettext/

Easley, D. & Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. URL: http://www.cs.cornell.edu/home/kleinber/networks-book/

New to Computational Thinking?

Wing, J. M. (2006). Computational thinking. Communications of the ACM, 49(3), 33 – 35. URL: http://www.cs.cmu.edu/afs/cs/usr/wing/www/publications/Wing06.pdf

6. Information about the instructor

Jana Diesner is an Assistant Professor at the University of Illinois at Urbana-Champaign, The iSchool/Graduate School of Library and Information Science. She conducts research at the nexus of machine learning, natural language processing and network analysis. She develops and analyzes methods and technologies for extracting information about networks from text data and considering the content of information for network analysis. Her goal is to contribute to a better understanding and rigorous computational analysis of the interplay and co-evolution of information and the structure and functioning of socio-technical networks. In her empirical work, Jana studies networks from the geopolitical, business and science domain. She is particularly interested in covert information and covert networks.

7. Questions?

Contact Jana with any questions about the course at jdiesner@illinois.edu.

You have to register for the 6th International Research Workshop to participate in this course.