Computation and Me

Thoughts from a student of biology and computer science.
http://kevinformatics.com

Mar 20

An Introduction to the Biological Basis of Cancer

I love teaching and education. If you know of my history of teaching experiences, it’s easy to see this. There is a dearth of introductory, well cited pieces, on scientific topics for people who have a basic knowledge of science and want to learn more. Here I present an introductory piece on the biology of cancer with a molecular basis. This article assumes basic knowledge of biology.

What is cancer?

Cancer is a general term for a broad range of diseases where cell growth is unchecked due to breakage in regulatory mechanisms. [1] [2] This is why diagnosis for some cancers is as simple as feeling for lumps; uncontrolled growth leads to masses in tissues.

Why does this occur?

Cancer occurs due to mutations in DNA. These mutations can occur due to a number of factors which include chemicals, viruses, radiation, and heredity. [3] Using the central dogma of biology, we can then conclude that mutations in DNA will eventually lead to creation of proteins which are not normally found in the cell. These mutations will either augment or diminish protein functions within a cell, leading to an abnormal cell. While there are safeguards within the cell to discover and repair these errors in the DNA, when those mechanisms are mutated, these abnormal cells then duplicate and propagate via mitosis. [4] This will then lead to an accumulation of mutations in DNA.

Cancer Treatments

Upon accumulation of these mutations, cancer cells adopt new functions which, among many things, allow them to endlessly divide. Most cancer treatments target this narrow margin of functional difference that cancer cells acquire. [5] Chemotherapy is a general class of drugs which interfere with cell division, preventing not only cancer cells, but also normal cells from dividing. This is why patients undergoing chemotherapy lose their hair; cells within the hair follicle stop dividing and creating hair cells. Chemotherapy is askin to carpet bombing, the specificity of the drug is very low, and affects all cells in the body. The hope is that chemotherapy will affect cancer cells more so than normal cells and treat the disease. Unfortunately, cancer cells can also acquire new mutations which create functions to become resistant to drugs. [6]

Chronic myelogenous leukemia (CML), a Case Study

One of the greatest success stories in cancer research is the discovery and successful treatment of CML, a cancer of white blood cells. As mentioned earlier, cancer is a where mutations create or diminish cellular function. CML is a mutation where a transaction of chromosomes 9 and 22 occur, forming a new protein called the BCR-ABL fusion protein. That is, a piece of chromosome 9 and 22 get swapped in one’s genome. This swap then creates a new protein which inhibits DNA repair and promotes cell growth. [7] However, as explained earlier, cancer treatments are created to target differences between cancer and normal cells. Gleevec was created to target this BCR-ABL protein and inhibit its abnormal function, killing the cell. Since this protein was a product of abnormal chromosomal rearrangement, the BCR-ABL protein is not seen in regular cells. Before the discovery of Gleevec, only 30% of patients diagnosed with CML survived for five years after being diagnosed. [8] An independent study in 2011 found that after 6 years of treatment with imatinib (generic name for Gleevec), 94.9% of patients remained in remission.

The State of Cancer Research

Currently, the largest funding of cancer research comes from the NCI which has a budget of $4.9 billion per year. Due to the complexity of cancer, there are various potential targets where cancer drugs can target. For example, Gleevec is a form of targeted therapy where the protein product is targeted. However, again referring to the central dogma of biology, there are also gene therapy treatments which act at the DNA level and miRNA drugs which treat at the RNA level. While there will probably never be a single drug to cure cancer due to the fact that cancer is an entire class of diseases, individual advances which target different areas can be used in combinational therapy.

Afterword

This post was inspired by Professor Clodagh O’Shea after she gave an excellent guest lecture for a genetics class. Her lab works on a novel form of cancer treatment where they engineer viruses to target and destroy cancer cells. This post is also dedicated to my mom.


Mar 13

Three Simple Steps to Becoming Successful

Yesterday, I attended my Plant Systems Biology class where Professor Yunde Zhao gave a talk on tryptophan-dependent auxin biosynthesis. As expected, I had to wikipedia almost every biomolecule and tissue mentioned due to my lack of plant biology education. However, I didn’t expect Yunde to give a profound and inspiring story about how he came to become successful in the world of research.

He explains that his work on yucca, an Arabidopsis mutant was initially pursued due to ignorance; the work on this mutant had been mostly abandoned due to the difficulty in finding a novel and publishable aspect of the study. Not only did he succeed in using this mutant to revolutionize the knowledge of auxin, his project eventually landed him his current faculty position at UCSD.

Yunde breaks success down to a simple, three step process:

  1. Find something you love.
  2. Work as hard as you can on it.
  3. Work as fast as you can on it.

Easier said than done right? In addition to this, there is always some factor of luck. Since it’s not always something that can be controlled, this factor can’t knowingly increased. Therefore, the only solution is to follow the three easy steps above.

Don’t believe me? I’m sure you’ve heard of the recent success of the social photo sharing site Pinterest.com which started making headlines around six months ago.

Little do most people know, the company was actually founded in 2008 and launched in 2009. Nine months later, the site had less than 10,000 users still. Why did the founders continue working on Pinterest even seeing such low user engagement? Co-founder Ben Silbermann says it’s because “I think the idea of telling people, ‘We blew it,’ was just too embarrassing.” Now the site has 11 million unique monthly visitors.

Now get back to work!


Oct 6

Steve Jobs Returns 0;

Steve Jobs’ influence on technology is widespread and probably everlasting. His minimalistic designs and strict standards have been the cornerstone of Apple. However, it wasn’t clear to me what a respected figure he was in the tech world until his passing. His death brought Twitter down and elicited a tribute on the Google homepage. However, the most surprising thing was an entire front page of posts honoring Steve Jobs on Hacker News, a social news site about startups and programming. Generally, there is a lot of infighting within the tech community regarding the merit of Apple products; however, it seems like in his passing, there is a universal realization of his achievements.

All things considered, Steve Jobs is not the first brilliant person in the past few years to die young due to pancreatic cancer. The bright side to this is increased awareness for cancer and scientific research. This was evident by a comment in a Hacker News post on Steve Jobs which asked what organization they could donate to in order to advance cancer research. Public awareness is especially important in light of recent budget cuts which have greatly affected scientific funding. In 2011 alone, congress has voted to cut the NIH budget by $1.6 billion, the NSF budget by $162 million, with further funding cuts occurring in 2012. Not only will these cuts affect American prominence in the global community, but will also slow the rate of human advancement. His death at the hands of cancer is welcome publicity to the scientific community.


Sep 27

Science in Cinema: Contagion

I rarely visit the movie theater due to outrageous ticket prices. However, since my friend invited me with some free movie vouchers, I eagerly took this chance to see Contagion. While, films are generally rife with scientifically inaccuracies, I was extremely impressed with what I found in Contagion.

Disclaimer: I have no formal training with infectious diseases or epidemiology. However, I have previous research experience in influenza research.

The film does a great job describing the difficulties of vaccine development and science without watering down some scientific terminology. It walks the viewers through the general steps of vaccine development from finding the correct cells to culture the virus, though the difference between a killed and attenuated vaccine to its final distribution. They also clearly define the R0 of the virus as well as display the potential for reassortment which, in reality, has been the cause of many deadly influenza strains. 

The movie also explores the potential repercussions of such an event. After the virus is declared to be an epidemic, the society falls to chaos. Social order is loss and mass hysteria occurs. Additionally, the movie features a freelance journalist who, through a blog, claims that there is a homeopathic cure for the disease featured in the movie. I was happy to see that his claims are proved false near the conclusion and is shown to be a result of greed. This was particularly satisfying to me; the recent reports of vaccine controversy are appalling. I’ve been surprised and outraged at parents who delay or deny vaccination of their children based on what seem to be poor science or myth altogether.

I’m happy to see that there are major media sources impressed as I am with this movie. Further positive scientific viewpoints can be see at New ScientistSlate, and Medscape.


Aug 25

A Summer at Cold Spring Harbor Laboratory

This past summer, I was accepted into a NSF-REU program at Cold Spring Harbor Laboratory (CSHL), a research institution in New York where I worked in Dr. Doreen Ware’s lab on maize de novo transcriptome assembly and functional annotation in the cloud. In short, I reassembled maize transcriptomes form RNA-seq data using Amazon Web Services and assigned function using Ensembl Plants’ gene ontology terms. Doreen is a wonderful advisor, with insights to research that is unparalleled. I also really appreciated input from Jer-Ming, Shiran, and Andrew. While working with leading researchers in the field was definitely one of the experience, that was only one of the contributing reasons why I’d recommend this specific REU program to others.

CSHL is a unique campus, while its main focus is discovering and developing novel research, CSHL is also known to host many well known conferences and courses which allow scientists to meet and learn about cutting edge research and techniques. CSHL is also fairly isolated from its environment, getting anywhere off the campus requires a short taxi or car ride. Therefore, around meal times, Blackford Hall is packed with scientists and researchers relaxing and discussing their latest findings from not only CSHL, but all over the world. 

The REU program at CSHL allows undergraduates to gain a realistic feel for what leading life as a research scientist is like. Housing is on campus encourages late hours at the lab and fully paid food and board relieves students of the stress of finding food and paying for living expenses. Personally, due to the computational nature of my work, I would work anytime there was free time on the weekdays since there was not much on campus and it was difficult to travel into the city. This experience was especially revealing of how life is as a graduate student. In addition, there were lectures and workshops every week from leading scientists in order to expose students to various areas of research. In addition, we even had to opportunity to not only have dinner with Dr. James Watson and see the inside of his house but also have dinner with Dr. Bruce Stillman! There were also scheduled volleyball games with the graduate students and faculty and even a trip into the city to see a Broadway show!

All in all, I’m very thankful for the illuminating experience I’ve had this summer with Dr. Doreen Ware and her lab as well as the guidance and careful planning of program coordinators Dr. Keisha John and Dr. Zach Lippman. Also, thanks to the NSF for funding this experience. I encourage undergraduates to apply to this program, as well as other summer research programs.

If you’d like to read more about the work I did there you can read my final report.


Apr 26

lonelyape asked: Interested in your experience using GNU parallel in conjunction with EC2 as a very lightweight job distribution framework. Is it hard to set up?

Nope, not hard to setup at all! Installing gnu parallel is trivial, just a “sudo yum install parallel”. Since you can access each node via it’s internal ip address (which is static) you can choose to setup your hosts file so that you can refer to these nodes by name. Then using parallel you can specify to run jobs on multiple nodes, give that the software to be run is already installed on all nodes. Then use the “trc” option to cleanup, return, and transfer your results back to your master node.


Dec 24

Mass Homology Screening

An update on my previously blogged project. I’ve decided that computers within the lab do not have enough computing power to accomplish these comparisons in a reasonable amount of time, so I’ve recently turned to Amazon Web Services (AWS) for additional computing power. I eventually found that GNU parallel did not offer enough flexibility for my uses.

To replace the previously software, I’ve used a combination of Amazon’s SimpleDB (SDB), Elastic Compute Cloud (EC2), and Simple Storage Service (S3). I’m using SDB to keep track of what jobs need to be run and have been run, EC2 as extra compute nodes in my makeshift cluster, and S3 as a safe place to store results to prevent accidental deleting. I’m using boto, a very convenient, and easy to use python library that interfaces with AWS. 

I’ve setup a git repo that hosts my project, you can find it here if you’re interested in taking a look!


Dec 8

Cluster-like Computing Using GNU Parallel

Recently I started working at a lab to start on my masters work. My first project was to run a protocol to determine protein similarity. However, the scripted workflow to determine this was slow (a 12-15 hour run/job). The lab had no access to supercomputing resources, but had many lab computers that were never used to their full potential. I looked around the web for potential grid software that would allow me to deploy jobs to these computers with little trouble. There was a complex distributive system which was in place, but there was little documentation as to how to use it, as it was quite complex. 

Eventually, I found GNU Parallel, which allows execution of parallel jobs locally and remotely. Setup was very painless, it required installing the required software on all computers, and setting up password-less ssh keys so that jobs and data could be sent without intervention.

There are two introductory youtube videos feature the author describing essential and basic features. After watching both, I need to read any documentation to use the software! Although the videos are dry, it’s still much more entertaining than reading documentation!

Introductory Video 1

Introductory Video 2

I only have one small feature request, it allows cleanup of generated and transferred files on remote computers. However, it will only remove files and not folders. Otherwise, GNU parallels has saved me endless hours of headaches!