Research is increasingly driven by large-scale data. Yet, generating data at high-throughput sometimes carries negative connotations, as reflected in this quote from Sydney Brenner:
“Low input, high throughput, no output science”
Indeed, high-throughput data collection has sometimes been associated with poor experimental design and data quality. Sometimes, the problem is that treatment groups coincide with analysis batches, which makes it impossible to distinguish between biological effects and technological artifacts (batch effects). Another common problem is that high-throughput data often focus on what can be easily measured at high-throughput rather that what should be measured to answer scientific questions. Thus, high data volumes are accumulated even though the data are not the most direct and useful measurement for the investigated questions. A further problem is the fact that sometimes large numbers of assays make it difficult to evaluate the quality of individual assays. A related problem is that optimizing for high-throughput may directly undermine other important aspects of the data. For example, increasing the number of single cells analyzed by RNA-seq may decrease the copies of transcripts sampled per single cell.
Despite these problems, large volumes of data generated by high-throughput omics may support the publication of the results in influential journals. Sometimes, the above problems manifest in incorrect inferences that create confusion and take years to sort out. Such flawed inferences shape a negative connotation of high-throughput methods.
While the problems described above happen, they are not inevitable. They are not fundamentally inherent to every high-throughput measurement. For example, a high-throughput measurement is not necessarily indirect. Analysis batches can and often are randomized with respect to biological treatments. The accuracy and depth of single-cell analysis can be preserved while increasing the throughput, i.e., number of single cells analyzed per unit time and cost. Furthermore, high-throughput methods may facilitate the development of standardized and systematic analytical workflow that substantially improves data quality.
A way forward
How to avoid the bad and take advantage of the good? I think a key ingredient is suggested by the Brenner quote. We should modify ‘Low input’ to ‘High input’, which hopefully modifies the rest of the quote as well to:
“High input, high throughput, high output science”
What I mean here by high input is high intellectual input. I mean starting with a solid experimental design, compelling questions, and measurements that are worth doing, not only easy to do at high throughput.
I think high-throughput has much to offer if imbedded in well-motivated research focused on thoughtful scientific questions, not merely on maximizing data volumes.
Please leave comments as responses to the tweet below:
Indirect & easy omics <---> Researchers— Prof. Nikolai Slavov (@slavov_n) November 13, 2022
Halloween Candies <---> Children
Some thoughts on preventing obesity in scientific researchhttps://t.co/Vz8WU12moQ