Rules for becoming a better bioinformatician in 2023

Some observations as a trainee

Guangyuan(Frank) Li
5 min readJan 30, 2023

Rule 1: Triangle of bioinformatics

Taking at least an introductory course for the topics listed below will help you understand what and why you are doing certain things and better come up with innovative ideas. Otherwise, you are just performing tasks that you’ve been told instead of practicing science.

Triangle of bioinformatics (image by author)

Rule 2: Reading more papers

I remember there was once a survey on Twitter discussing what the №1 trait those people who succeed in academia have, and the answer is they tend to read more papers. This is my observation as well, people who read more papers will be quicker to be adept at their subjects.

Rule 3: Don’t take the shortcut

For example, your lab mate gives you a script with five different parameters. Although you can just run it as it is, you’d better spend some time understanding what those five parameters do and how certain modifications will change the results. It may save you time in the short term by just running it like a no-brainer, but it will hurt you in the long run for sure. Also, try to read the original research paper instead of the blog or watch the youtube video. Finally, if you encounter a problem, don’t just ask for help, first try to google it and figure it out yourself. Those shortcuts that seem to save you time will cultivate very bad habits and are not good practice for a good bioinformatician.

Rule 4: Have the courage to read the documentation from the first page

If you don’t know a package, start from page 1 of the documentation. If you don’t know deep learning, then find a textbook and read from page 1. Don’t just keep saying, well, I am not an expert on that. The time you wasted is sufficient for improving yourself and becoming an expert.

Rule 5: If you don’t put effort when asking for help, then don’t expect people to put effort into answering your question

Whenever you ask questions or help from your colleagues, please show that you’ve tried something, and specify what kind of help you want from the people you ask. People are busy and they don’t have any obligation to help you, so be apologetic and respect others’ time and efforts.

Rule 6: Don’t be defensive when asking for feedback

Whenever you are asking for feedback, be open to any critiques and don’t argue with them as long as the critiques are not personal or abusive. Being mentally strong is important, don’t just always expect applause from others.

Rule 7: Critique others with empathy

This is contrary to Rule 6, as a receiver you should be open to critique. However, when giving feedback, it would be better to be empathetic. Doing science is hard so please be nice and encourage people whenever you can.

Rule 8: It is always the presenter’s fault if the audience doesn’t get your points

When giving a presentation, focus on explaining to people what you’ve done instead of impressing anyone. You should let the audience understand what you are trying to convey.

Rule 9: Take reproducibility seriously

Make sure you can reproduce every single result you generate. In this regard, I personally think Excel is not an ideal choice for bioinformaticians. It is not because it is not coding, but it’s hard to reproduce. After a series of click-and-drag, can you really remember what you have done after 5 years?

Rule 10: Don’t finger-cross

I frequently come across individuals relying on finger-crossing for their code to work, which is not a productive mindset to have when programming. Instead, if the code doesn’t work, take the time to identify and resolve the issue. The worst-case scenario is that your code may appear to function correctly, but you do not understand the reason behind it.

Asides from the above ten rules for all bioinformaticians, I also have a few additional thoughts that I believe will make you more competent in this field.

Rule 11: Avoid using Jupyter-notebook

I mean, Jupyter notebook is good for demonstration or teaching, but I don’t think it should be your go-to code editor when developing your program. You should be comfortable working with bare scripts and be familiar with what your step-by-step output would look like in your heart, not through Jupyter output. Imagine how you would code in C or Fortran when you don’t have a such pretty thing to assist you, are you still able to code? Also, Jupyter basically is a program that is built on top of Python core, which means you are not directly interacting with Python. It has a few downsides, sometimes Jupyter crashes not because the logic of your code is wrong, but some weird constraints in Jupyter. It will also change the display of Python strings for example, when outputting it in the html, which is not how raw strings look like if you directly work with Python.

Rule 12: Understand the algorithms before you run the tool

People may think it is overkill, I just want to apply a tool to my dataset, so why can’t I just follow their tutorials? Well, then think about these questions. How do you know whether their official tutorial is wrong or not? When the results do not look good, how are you going to debug, or just abandon it and try another tool? After all, we consider ourselves scientists, if you do not even understand what you are running, what kind of science you are working on?

Rule 13: Stop complaining about installation error

I understand installing a package can be a pain, but after all, we are bioinformaticians and we are trained to deal with these problems. I personally think every bioinformatician should be capable of installing any software except in two scenarios:

[1] The source code or configuration file (i.e. Makefile) the author provides is problematic. In this case, you should contact the author through email or the GitHub issue page.

[2] The hardware issue, the software requires a certain OS or hardware like GPU. In this case, although ideally, you should find a device that is compliant with the requirement, I think it is beyond the reasonable requests from a bioinformatician.

Rule 14: Visualization

A good visual can serve at least two roles, one is to look deeply into the data itself before building models or drawing the conclusions, and another is to communicate very complex, even abstract concepts to other folks. To achieve this, you need to be super adept at all sorts of basic visualization plots available, understand the pros and cons of them, and also try to keep an eye on interesting plots you see when reading other papers, thinking deeply about why a plot is attractive and how you may be able to utilize that in your work.

Rule 15: Think beyond 2 dimensions

Well, not all computational problems can be defined in a data frame. If you refine yourself in a mindset that bioinformatics is just playing with a bunch of 2D objects (data frames or matrices), I personally think it hinders you from achieving higher goals and ideas. A lot of model involves 4th or 5th dimensions, and being able to visualize those high dimensions in your head will offer some very novel way to approach problems and write codes.

--

--