With the increasing migration of communication and information provision to the web, new research methods are emerging to cope with the challenge of understanding the implications of this change. Link analysis is the study of hyperlinks between websites in order to discover (1) why they were created and what they are used for, (2) online networks or patterns of behavior, or (3) insights into reflected patterns of offline behavior.
Hyperlink analyses have been conducted on themed collections of websites or web pages, such as those owned by a selected group of politicians, universities, or nongovernmental organizations campaigning around a specific issue. Link analysis is particularly suitable for gaining rapid initial insights into offline topics that have an online reflection (e.g., left-right political debates) and for analyzing how the web is used for communication in particular situations (e.g., elections). The strength of hyperlink analysis is the ease with which practitioners can gain fast insights into a topic of interest, especially with the new generation of free link analysis software. The main drawback is the casualness with which links are often created (or not created), which inhibits making strong inferences, forming generalizations, and making future predictions.
It is useful to distinguish between different types of links. A “site outlink” is a hyperlink that points to a page in a different website, whereas a “site self-link” points to a page within the same website. Site self-links are usually ignored because the connections between different sites are more interesting. Site outlinks may expose such things as the site owner’s collaborators and information sources. Similarly, the “site inlinks” of a website are the links pointing to it from other websites. Site inlinks may reveal the importance or uses of a website. Alternatively, the site inlinks may also give information about how important the organization owning the website is and the types of other organizations that link to it. The links within a collection of websites may reveal informative patterns of interconnectivity, such as which is the most central site in the network, and whether there are any clear sub-groups. Potential applications include revealing the interconnection structure of related pressure groups on the web, or exploring politicians’ websites for differences in online networks and uses of the web across the political spectrum.
For a hyperlink analysis, relevant links must first be identified. The most straightforward data collection methods are to browse websites to manually identify links or to use a link search command in a commercial search engine. For example, the query linkdomain:wlv.ac.uk – site:wlv.ac.uk in Yahoo! returns a list of web pages that link to any page in the University of Wolverhampton website (www.wlv.ac.uk and all other domains ending in wlv.ac.uk, such as www.scit.wlv.ac.uk), but excluding pages within the University of Wolverhampton website. Given a set of websites to study, as long as it is not too large, the links between all possible pairs of websites can be counted by entering a series of queries like the one above. A second way to get link data is to use software that automatically downloads the web pages from a list of URLs (uniform resource locators), or downloads all of the pages in one or more websites. Free online software for this includes Issue Crawler, LexiURL, SocSciBot, and VOSON. It is also possible to automatically submit queries to search engines from LexiURL Searcher and VOSON, which is useful for counting links to or between a large set of websites.
Once the link data has been downloaded there are four main alternative investigative methods: visualizations, summary statistics, comparison with other data sources, and content analysis of link types. The above software typically produces visualizations such as network diagrams of the interlinking websites or pages, and forms of summary statistics, such as the total number of links to and from each website and the commonest sources and targets of links broken down by website, page, or top-level domain (e.g., .edu,.de). The data can also be imported into social network analysis software such as UCINET to obtain a wider range of statistics. Moreover, UCINET and standard statistical analysis software can compare the link count data with other sources of data (e.g., via correlation tests). For large-scale counting studies, particularly for academic websites, sophisticated link counting methods (Alternative Document Models) have been developed to prevent results being dominated by spurious or repeated links.
In contrast to the quantitative approaches described above, a descriptive or comparative content analysis of links may reveal why they have been created and what they are used for. This is particularly useful because the web and links are used for such a wide variety of purposes that it is easy for intuition to fail when hypothesizing about why hyperlinks are used in any given context, e.g., in politicians’ websites. The content analysis typically takes the form of classifying link types through the page containing the link and perhaps also the page that is the target of the link. The categories are often inductively defined, partly on the basis of a pilot study and partly on that of the research question or theoretical perspective underpinning the research. One finding from this type of research is that surprisingly many links are created primarily to acknowledge a relationship, such as funding or collaboration, rather than to be used by visitors. In addition, while there is a degree of copying and uniformity between similar websites about the types of links used, there are still significant variations, indicating that in many cases website usage and creation strategies are continuing to evolve and innovate.
- Park, H. W. (2003). Hyperlink network analysis: A new method for the study of social structure on the web. Connections, 25(1), 49 – 61.
- Rogers, R. (2004). Information politics on the web. Cambridge, MA: MIT Press.
- Thelwall, M. (2004). Link analysis: An information science approach. San Diego, CA: Academic Press.