In 2017, I am shifting towards the analytics and visualization space, which is a nice segway to working on the Machine Learning / NLP that generates the data. One of the key features in Watson for Genomics is the ability to recommend clinical trials to patients based on their genetic variations and clinical conditions, an area where visualization can be of enormous assistance to both internal test and development teams verifying the overall quality of the data in the system.
Eventually, better analytics and visualizations can also help researchers on both industry and governmental agencies to better understand the ever shifting landscape in clinical trials.
Disclaimer: none of the content of this blog or github project represents actual data from a production system or from individual patient data. The visualizations below do not represent current or upcoming features for the actual product and solely represen my personal experiments with the various technologies and techniques that could be used in the space.
Leaving aside the clinician/patient use cases for a moment, which are largely centered around matching patients to trials, it is the general browsing and navigation aspects that caught my attention at first.
I started a small github side project (https://github.com/nastacio/clinical-viz) over the weekend to explore these visualizations, with the intention of expanding them into more powerful browsing capabilities in the future.
In order to cover the visualization potential, I started with a small data sample, querying for any clinical trials that covered retinal cancers (under 200 results) , modeling the resulting data as a graph containing nodes for "clinical trials", "sponsors", "conditions", and "locations". I then turned to an export to graphml format and a subsequent import into the (excellent) Gephi visualization tool.
In its full form, and after a few minutes of appearance customizations, the complete graph took on an interesting shape (for those familiar with Gephi, there are not many things more entertaining than watching the Fruchterman Reingold algorithm do its magic) .
Once you apply some light querying within Gephi, the potential for these types of visualization starts to become even clearer. As one example, I wanted to have an idea of the density and commonality of conditions covered in each clinical trial, which is useful to get a sense of how overwhelming the complete results may be for a given set of patients with a certain condition, so I created a filter removing all location and sponsor nodes, then added the labels for gender, phase, id, status, enrollment target, and starting year.
This visualization quickly surfaced clusters of conditions that are more researched than others, such as in the snapshot below:
Leveraging the "Degree Range" filter set to 1 or more edges to list only interconnected conditions, trials and sponsors, it was easier to arrive at a much more compact representation of the initial graph, now further filtered to contain only interventional clinical trials that were recruiting new patients:
Another visualization, which is more interesting to health industry wonks than it is to doctors and patients, is the degree and nature of collaboration amongst sponsors for a given set of trials, making the nodes proportional to the degree of collaboration (the nodes are colored according to the type of organization) .
Then it is possible to zoom into areas of interest and start to notice the relationship between hospitals, government agencies, pharmaceutical companies and others. On a larger data set, the thickness of an edge between two organizations would be proportional to the number of common clinical trials cosponsored by these organizations.
The next areas of focus for these visualizations involve dealing with much larger sets, at which point the examples above become really useful. I also need to work on better normalization of condition names, which often span multiple UMLS semantic types (neoplastic process, diseases or syndromes, findings, symptoms or signs, and many others) , and also some form of dashbording capability that would allow people to interact more directly with clinical trial data without having to generate and import a graphml file into an external visualization tool.