Improved data visualisation aids genetic research

23 Feb 2011 by Qlucore

With such vast amounts of data to consider, it can be difficult for scientists to understand the true biological meaning of their research, says Carl-Johan Ivarsson, CEO of Qlucore.

However, new data visualisation techniques are now making it much easier to uncover new and unexpected results.

As recently as 10 years ago, many biologists were still working with glass slides that revealed a few thousand features of the genes that they were studying.

That number has grown dramatically in recent years, thanks to advances in technology.

As such, it has become more difficult for biologists to identify which genes are being expressed and to what level.

With such a large volume of data to consider, it is often impossible for these scientists to derive any real biological meaning from their findings with the naked eye alone, which means that sophisticated data algorithms need to be developed in order to interpret this data effectively.

As a result, much of the computer software that has been designed for use in this area has focussed on being able to handle increasingly vast amounts of data.

Unfortunately, this shift in focus has unintentionally pushed scientists and researchers to one side, since a lot of data analysis must now be performed by specialist bioinformaticians and biostatisticians, especially when complicated algorithms are required for the analysis.

This model has several drawbacks, however, since it is typically the scientist who knows the most about the specific subject area being studied.

The good news for scientists is that the latest data visualisation techniques and imaging technologies are already making it much easier for the researchers themselves to examine this enormous quantity of data, to test different hypotheses and to explore alternative scenarios within seconds, since important findings can now be displayed in an easy-to-interpret graphical form.

During the last decade, research into molecular biology has helped to identify a large number of disease-associated genes and is therefore helping researchers to unpick the fundamental biology of major illnesses.

Gene expression profiling, for example, is now regularly being used for the study of many serious diseases.

Gene expression experiments help to measure the activity of tens of thousands of genes at once, in order to create a global picture of cellular function.

These findings can then be used to distinguish between cells that are actively dividing, for example, or to show how the cells react to a particular treatment.

As part of this process, researchers often must consider sub-groups (such as patients who are in remission versus patients who have suffered a relapse), while also examining the different types of cell abnormalities related to clinical conditions such as diabetes and cancer.

Difficulties can arise, however, as a result of the vast amount of data that is created by experiments like these.

This data overload can present a serious problem for researchers, since it is essential to capture, explore and analyse this kind of data effectively in order to obtain the most meaningful results.

To address this issue, a new generation of data visualisation tools has been designed to take full advantage of the most powerful pattern recogniser that exists: the human brain.

Indeed, powerful software engines are already being used to help researchers to visualise their data in 3D, so that they can identify hidden structures and patterns more easily and therefore identify any interesting and/or significant results by themselves, without having to rely on specialist bioinformaticians and biostatisticians.

Data visualisation works by projecting high dimensional data down to lower dimensions, which can then be plotted in 3D on a computer screen, and then rotated manually or automatically and examined by the naked eye.

With the benefit of instant user feedback on all of these actions, scientists studying diseases like diabetes and leukaemia can now easily analyse their findings in real-time, directly on their computer screen, in an easy-to-interpret graphical form.

Scientists are already making use of this exciting new technology in a real-world setting.

New imaging functions contained within the latest data analysis applications are currently allowing scientists to analyse very large data sets by using a combination of different visualisation techniques, such as heat maps and Principal Component Analysis (PCA).

With visualisation tools like these, it is possible to investigate large and complex data sets without being a statistics expert.

The process begins by reducing high dimension data down to lower dimensions so that it can be plotted in 3D.

Principal Component Analysis (PCA) is often used for this purpose, as it uses a mathematical procedure to transform a number of possibly correlated variables into a number of uncorrelated variables (called principal components).

One of the key breakthroughs in the latest generation of bioinformatics software is the introduction of dynamic PCA, an innovative way of combining PCA analysis with immediate user interaction.

This unique feature allows scientists to manipulate different PCA-plots - interactively and in real time - directly on the computer screen, and at the same time work with all annotations and other links in a fully integrated way.

With this approach researchers are given full freedom to explore all possible versions of the presented view and are therefore able to visualise, analyse and explore a large dataset easily.

By using a tool known as a 'heat map' alongside dynamic PCA analysis, scientists have yet another way of visualising their data, since heat maps can take the values of a variable in a two-dimensional map and represent them as different colours.

Because modern heat maps use sophisticated mapping techniques to represent this data, they can provide a view of data that is simply not possible to achieve with simple charts and graphs.

Also, because they are often obtained from DNA microarrays, biology heat maps are often used to represent the level of expression of many genes across a number of comparable samples, such as cells in different states or samples from different patients.

Heat maps are also popular for their ability to be dynamically updated when any filter parameters are changed.

As computer technology improves - with greater processing power, better graphics applications and more sophisticated analysis software - data visualisation will continue to develop as well.

As such, these new methods of visualising data are likely to make traditional forms of data presentation, such as spreadsheets and basic graphics, obsolete in the future.

Already, a team of scientists at the Institute of Human Genetics of the Christian-Albrechts-University in Kiel, Germany, is using data visualisation to support a number of national and international projects related to the epigenetic alterations related to several cancers, including malignant lymphoma, colorectal cancer, and hepatocellular carcinoma, as well as developmental disorders and other diseases.

Even though the exploration and analysis of large data sets can be challenging, the use of tools like PCA and heat maps can provide a powerful way of identifying important structures and patterns very quickly, especially as visualisation typically provides the user with instant feedback, and with results that present themselves as they are being generated.

Already, the latest technological advances in this area are therefore making it much easier for scientists to compare the vast quantity of data generated by epigenetic studies and to test different hypotheses very quickly.

As a result, the latest generation of data analysis software is helping scientists to regain control of this analysis, and to realise the true potential of the important research being conducted in this area.

Contact Qlucore

Learn More