Introduction to Networks
Background
This course contains multiple exercises that address different
aspects of complex networks. These are:
- Introduction to Networks (below):
a prerequisite to the remaining exercises, introducing basic aspects
of network construction and traversal.
- Small World Networks:
explores the statistics of path lengths in random networks, and the
importance of a small number of long-range shortcuts in creating
small worlds.
- Percolation:
explores the connectivity of random networks on regular lattices,
and the universality of critical behavior near the percolation
transition.
- Exploration of network growth, structure, etc.
Introduction to Networks
This is the first of several exercises on networks. A network (or
graph, as it is often referred to in mathematics) is a
data structure in which nodes are connected by edges, which provides
a very general concept that plays a role in many scientific
problems. This first exercise is focused on defining
networks and on implementing algorithms to compute some of their properties.
As such, it lays the groundwork for particular
scientific applications in subsequent exercises, aimed at analyzing
"small worlds" and percolation in networks.
In particular, in this exercise we will generate some random networks
and calculate the distribution of distances (path lengths) between
pairs of nodes. We will study small world networks, a
theoretical model that suggests how a small number of shortcuts (e.g.,
unusual international and intercultural friendships in a social
network) can dramatically shorten the tyical path lengths. Finally
we will study how a simple, universal scaling behavior emerges for
large networks with few shortcuts.
Learning Goals
Science: You will learn what a graph/network is.
This exercise in largely about
building infrastructure for use in more interesting scientific problems,
such as the study of small worlds and percolation.
Computation:
You will learn some basics of object-oriented programming: what a
class is, how to use and manipulate them, and how they capture
abstract relationships among data.
Procedure
This introductory exercise is focused on defining
undirected graphs/networks.
The UndirectedGraph Class
- Create a new directory
- Download the following files:
- Hints file for
general network algorithms
-
Network graphics software
- NOTE: If you encounter errors regarding fonts in the
Network Graphics software, download this file instead and save it as NetGraphics.py.
- NOTE: If you are running these programs on your own machine
(i.e, not on the course lab machines), you will also need to download
font file Vera.ttf and save it
in the same directory that the Network graphics software is installed.
- If you are not familiar with graphs and networks, read
Jon Kleinberg and David Easley,
"Handout on graph theory", Networks:
Economics 204 / Sociology 209 / Information Science 204.
- Rename NetworkHints.py as
Networks.py, and open it in a text editor (kate or emacs -- we
recommend against using kedit). In this file, you will notice Python
code that has already been written, but it mostly consists of hints to
help you flesh out the code, i.e., comments
(lines that begin with #) and documentation strings (material enclosed in
triple quotes """ that document what each module, class, and function is
about and can be queried with the Python help() function).
You will also
notice the keyword pass embedded within each function body: pass is a
legal Python expression that means "do nothing". The first thing you will
do when you start to define a function is to remove
the pass statement and replace it with new code.
- Open a terminal window, move to the correct directory and start ipython
- Define a class called UndirectedGraph, which will create
objects that represents graphs.
We suggest that you store the
core information about the graph in a dictionary named connections.
The keys in the dictionary will be the nodes of the graph, the
values associated with those keys will be the nodes that are
connected to that one. [If you are unfamilliar with python dictionaries,
we recommend the Random Text
exercise and NanoTutorial 1.]
- Define the following Method for your class.
- __init__
- This is the method or
"subprogram" that is run everytime you
create a new instance of the class. For this class, the initialization
routine should take one object (called self) which represents the
instance. We want this initialization routine to create the attribute
self.connections, and set it equal to the empty dictionary {} .
- Test your routine by saving your "Networks.py", and
typing in your ipython session %run Networks.py.
Create an instance of
your graph class by typing testgraph1=UndirectedGraph().
- Type testgraph1.connections to see if your
initialization worked right. testgraph1
is now an instance of your class, although at this point it is a
rather uninteresting instance, since it only contains an initializer
that creates an empty dictionary.
- Go back to your text editor and complete your
UndirectedGraph class by defining the following methods.
Following each definition test the routine. This strategy of testing
every routine immediately after writing it is the key to rapid software
development. Ideally you should never have more than one routine which
you are not certain works correctly. The hints file gives a good
description of what each of the methods should do.
- HasNode
- This is hard to test fully
until you have defined AddNode. Without any nodes added,
however, HasNode
should return False for any node that is queried.
- AddNode
- Test by typing testgraph1.AddNode(1),
then look at the dictionary by typing testgraph1.connections
(or, if you enjoy extra typing, print testgraph1.connections).
The command testgraph1.HasNode(1) should now be True while
testgraph1.HasNode(2) should be False.
It should be clear how to test the rest of the methods.
- AddEdge
- GetNodes
- GetNeighbors
- Next it would be nice to visualize the graph. We have
written some visualization software. To see a graphical representation of
your test graph type: NetGraphics.DisplayCircleGraph(testgraph1).
-
Here we've introduced the notion of accessing a function from another
module. See the notes below to learn more about
modules and namespaces.
DisplayCircleGraph
will work on any object which has the following methods:
GetNodes, GetNeighbors.
It doesn't care about the internal implementation.
- For further testing, your "Networks.py" file contains
a predefined function called pentagraph. Type
testgraph2=pentagraph() to run the function and assign the
output to a new object called testgraph2.
This should create a graph
and display a figure shaped like a pentagram.
- You now have a working class, and we can start to use it for
something. Proceed to either the
Small Worlds Exercise or the
Percolation Exercise.
Useful Constructions/Syntax
Science Concepts
Files
References
- Duncan Watts and Steven Strogatz,
Collective Dynamics of "small world" Networks
- Jon Kleinberg and David Easley,
"Handout on graph theory", Networks:
Economics 204 / Sociology 209 / Information Science 204
- Jon Kleinberg and David Easley,
"Handout on the small-world
phenomenon",
Networks: Economics 204 / Sociology 209 / Information Science 204
- Mark Newman,
The structure and function of complex networks
- Mark Newman,
Models of the Small World
- Mark Newman,
Scientific collaboration networks II: shortest paths, weighted
networks, and centrality.
- Mark Newman and Duncan Watts,
Renormalization--group
analysis of the small world network model
- Michelle Girvan and Mark Newman,
Community
structure in social and biological networks
- Yook, Oltvai, and Barabasi,
Comparing Different Protein Interaction Networks
Links
Notes
Networks and graphs come in many flavors. Undirected graphs involve
edges that have no intrinsic directionality, i.e., the two nodes
connected by the edge are symmetric. In directed graphs, edges have
directionality: the edge points from one node to the other,
introducing an asymmetry. We will be studying edges without weights
(i.e., all edges are equivalent), but in many important problems, edges have
weights. In this exercise, we're using the
specific name UndirectedGraph to be clear and avoid confusion.
- Objects represent as dramatic a conceptual leap in programming as
functions. A good way to think about it is that functions are verbs,
and objects are nouns. Functions are designed so that all you need to
know about them is what the form of the input is and what the form of
the output is -- you do not need to know what the implementation is to
use a function. Well, objects are the same way. An object has some
attributes which allows it to represent something useful (such as a
graph). The object carries with it some methods of accessing and
changing its properties (these are functions called "methods"). As
long as you only talk to the object through those methods, you do not
need to know anything about the internal implementation.
A class can either be thought of as a template for an object, or
perhapse more concretely as a function which creates an object. When
you call a class you create an instance of the class. This instance
is the object.
A quick introduction to classes can be found in
NanoTutorial 1. In particular,
if you are confused about the syntax of method (function) definitions in
classes and the seemingly mysterious self variable, you should reach
that NanoTutorial or the textbook to find out more.
- We use a class to represent our graphs, since this
lets us manipulate our computer representation of the graph just as we
think about manipulating real graphs. A network has nodes and edges.
An UndirectedGraph
object should carry with it the information needed to tell you what nodes
and edges it has. It should also provide tools for adding nodes and edges.
- Encaspulation is an important concept in object-oriented
programming, and in flexible software design generally.
The idea with objects is that the user of the class should
not need to know how things are implemented. As the programer, however,
you have to choose some implementation.
- A class is used to define the space of all possible
networks (UndirectedGraphs),
while a network instance (or object) is a
concrete entity that represents a particular network.
The methods of the network class are
the public face of the object. If you want to know what nodes the
object testgraph1 contains
you call the method GetNodes by typing
testgraph1.GetNodes().
If you whant to know if there is a node
labeled by the numeral 1,
you would type testgraph1.HasNode(1). In
cleanly written object-oriented programs, objects should always
be manipulated through their methods, and the external interface embodied
in the methods should be
independent of the internal representation of data. For example, you
could have written your network object so that the neighbors were
stored in an adjacency matrix (table).
The matrix would have as many columns and
rows as the network has nodes, and it would have 1's and 0's representing
which nodes were connected and which were not. Such a representation
of the network should have the exact same methods. The user of the
class should not see a difference between the behavior of the two
objects.
- By using this idea of encapsulation you can change
the implementation at a later time without changing the rest of your program.
- Another advantage of this approach is that suppose
at some later time you work with a related object, say a "directed graph"
(which means that the edges have directions, pointing from a source node
to a destination node). This new object will have many of the same methods.
Any tools that you write which interface with the object by using the
common methods should work without any change.
- [At some later point in your programing career
you can learn about "inheritance" which lets you reuse methods written
for one object in another object.]
A module is a way of grouping together a set of classes and functions
that are used for a particular purpose. For example, in this exercise
you generate a networks module that has functions and classes used for
manipulating networks. In python, you load a module with the command
import, although the ipython interpreter provides the shortcut
function %run that imports everything in the specified file into
the __main__ namespace.
Namespaces are a mechanism for organizing the names of variables
(objects, functions, classes, etc.), so as to avoid giving two
variables/functions/classes/objects the same name. Suppose you are
writing a program and you defined a function f. Suppose
later you load the usefulstuff module (by typing import
usefulstuff) and usefulstuff happens to also define a
function f. There could be a conflict here. You don't want
the f in usefulstuff to overwrite the f
that you defined. The way
python avoids this problem is that the two f's are in different
namespaces. To access the usefulstuff function "f", you must type
usefulstuff.f. You can think of the .
in the same way you think
of "/" when you are navigating directories. You can change the
namespace that you import a module into by typing import
usefulstuff as u -- which would put usefulstuff in the namespace
u. This saves typing, but in the end can lead to more confusion.
(You can also circumvent the protection that namespaces provide by
typing from usefulstuff import *, in which case everything
defined in usefulstuff will be imported into the space doing
the importing, which could result in your previously defined functions
being overwritten.)
Note that the syntax for calling the display function contained in
the NetGraphics module is
very similar to the syntax
used to call methods of a class. This similarity is intentional:
NetGraphics is a module, DisplayCircleGraph
is a function inside that module, and the "dot" operator . is
used to access the function within the module's namespace, just as
the dot operator is used to access a method within an object's namespace.
As an aside, one could add to UndirectedGraph
a method called Display,
so that if you typed testgraph1.Display() it would call
DisplayCircleGraph, creating a picture of the graph.
One would say that the object knows how to display itself.
In certain contexts this might be what you want. There are, however,
several downsides:
- Conceptually, displaying a graph is a complicated
process. For example, you need to decide how to lay out the points.
It is possible that in the future some user of your network class will
want to play with different ways of laying out the points. To the extent
that a network is usually only concerned with topology (how things are
connected) and not geometry (where things are), requiring this layout
information to be inside the graph object is undesirable. [On the other
hand, one might at a later time create a subclass which knows about
geometry, and that subclass might want to know how to display itself --
or perhaps not -- the display function could also simply ask the object
for the geometry and use it.]
- It makes for a clean program to have one object and
feed it into different display functions.
- You may be programing on different platforms -- s
ay one runs X-windows, and another runs Microsoft Windows. You may
need to use different display functions on the different machines.
- By separating the object and how it is displayed,
you are more likely to produce reusable code.
Many object-oriented languages have a syntactic notion of privacy:
data in a class that are declared private, for example, are incapable of being
modified from outside the class.
Python does not contain such constructs, generally relying instead on
programmers' recognizing the value of encapsulating internal data and
working instead through method interfaces.
Although we have created
the pentagraph using the externally defined methods like AddEdge,
we could instead break the encapsulation of internal
data and manipulate the connections dictionary directly.
(If you're feeling adventurous, go ahead and try it: after
creating the graph, now type testgraph2.connections[2].append(3).
Any idea what this will do? Run
NetGraphics.DisplayCircleGraph(testgraph2) to see. Looks can
be deceiving, though. Even though node 3 is now a neighbor of node 2,
is the converse true? Use GetNeighbors to check.)
Breaking the data encapsulation
makes for a fragile program. In scientific computing
this may not be a bad thing. If you are only going to use a function once,
then it might not be worth the time to figure out how to better
design things. It is a good exercise to think of alternative ways to
implement this
function -- for example, you could add a SetNeighborDict method to
write the entire neighbor dictionary by hand, or a SetNeighbors(node)
to set the neighbors of a given node.
Here is a useful construction that you might want to learn.
Create a "tuple" of nodes: newnodes=(4,5).
Suppose you wanted to add these nodes (and an edge between them) to
testgraph1. The obvious way is to type
testgraph1.AddEdge(newnodes[0],newnodes[1]).
The shorter way is testgraph1.AddEdge(*newnodes).
The asterix essentially "strips" the elements of the tuple so you can
apply the function to them; that is, a function that accepts N
arguments can be fed instead a tuple containing N elements.
(This is equivalent to the Apply function in Mathematica.)
James P. Sethna,
Christopher R. Myers,
Erich J. Mueller.
Last modified: August 25, 2008