From Microsoft Research:
We present a study of anonymized data capturing high-level communication activities within the Microsoft Instant Messenger network. We analyze properties of the communication network defined by user interactions and demographics, as reported and as derived from one month of data collected in June 2006. The compressed dataset occupies 4.5 terabytes, composed from 1 billion conversations per day (150 gigabytes) over one month of logging. The dataset contains more than 30 billion conversations among 240 million people. We focus on analyses of high-level characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. Analyses center on numbers and durations of conversations; the content of communications was neither available nor pursued. From the data we construct a communication graph with 190 million nodes and 1.3 billion undirected edges. We find that the graph is well connected, with an effective diameter of 7.8, and is highly clustered, with a clustering coefficient decaying slowly with exponent -0.4. We also find strong influences of homophily in activities, where people with similar characteristics overall tend to communicate more with one another, with the exception of gender, where we find cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.