Bayes Theorem for PCR Tests

Cem Pekturk
7 min readNov 24, 2020
Photo by United Nations COVID-19 Response on Unsplash

Unfortunately, with the arrival of the winter, many countries are facing the 2nd wave of the pandemic. Some good news is coming though, Pfizer and BionTech announced that the vaccine is 90% effective which is amazing news. I am congratulating successful scientists Prof. Dr. Ozlem Tureci and Prof. Dr. Ugur Sahin.

Even though the vaccine reached 90% effectiveness, still PCR tests have not high accuracies. For example, just a week ago, Elon Musk complained about the results of PCR tests.

Sensitivity & Specificity of PCR tests

There are 4 possible outcomes when you are tested with the PCR tests. True positive and negative are accurate results. The other two are errors. Type I error (False Positive) is being tested positive even though you don’t have Covid-19. Type II Error (False Negative) is being tested negative even though you actually have Covid-19. In this case, the Type II error is dangerous for healthy people because asymptotic patients can transfer the virus to healthy people unconsciously.

Type I and Type II Error

There are a couple of ways to evaluate the result of tests. Sensitivity and Specificity are common ones. Basically, sensitivity is the probability of being tested positive if you truly have Covid-19. Specificity is the probability of being tested negative if you don’t have Covid-19.

According to the BMJ article, the sensitivity of the PCR tests is 70% and the specificity of the PCR tests is 95%.

Sensitivity & Specificity of PCR Test

Let’s say, 100 people be tested with PCR test. We know that 50% of them truly have Covid-19 and the other 50% don’t have Covid-19. As you can see from the table above, although 50 patients have Covid-19, 15 out of 50 are tested negative based on BMJ research sensitivity and specificity. Hence, those 15 people who have been tested negative as a false can spread the virus and infect more people.

What is the probability of having Covid-19?

As you know, some of the countries and regions suffer from pandemic more than others such as the US, India, Brazil, and France. These are 4 countries where Covid-19 occurs most based on the number of cases. However, their population is high as well, so what is the prevalence of Covid-19 by country? In other words, what is the probability of having Covid-19 by country?

European Centre for Disease Prevention and Control is publishing the daily updated case numbers for each country with its population information. So, I created a python script to calculate prevalence for all countries and all US states.

The Prevalence of Covid-19 in the World
The Probability of Having Covid-19 by Country

The charts above shows the prevalence of Covid-19.

The table on the left-hand side sorted by the total number of cases in the world. As can be seen, over 3% of the population in the USA, France, Spain, and Argentina diagnosed by Covid-19.

However, the table on the right-hand side sorted by the countries where has the highest prevalence portions in the world. 8.21% of Andorra has been diagnosed by Covid-19 which is the highest proportion in the world, but the population is considerably low compared to other countries.

As you can see from the table on the right side, the countries where the prevalence of Covid-19 has a lower population, but unfortunately Belgium and Czechia has both high population and prevalence of Covid-19, 4.88% and 4.62% respectively.

Prevalence of Covid-19 by States

The Map above shows the prevalence of Covid-19 by states. We can see that the prevalence of coronavirus in the population is higher in the states located in the middle part of America than the states that are located on the east and the west side of America.

The table on the left side shows most cases accrued in California in the US, but the prevalence of Covid-19 is way higher in Illinois, Wisconsin, and Tennesse, 5.18, 6.46, and 5.99 respectively.

The table on the right-hand side shows the prevalence of Covid-19 is almost 10% in North Dakota and 8.26 in South Dakota where has the highest proportion in the US.

Bayes Theorem

In statistics, the Bayes theorem is used to calculate the new probability (posterior probability) when new information is available in the event.

Now, we know the probability of having Covid-19 by country and state of the US. We will use this information in the Bayes theorem. Let me explain what the Bayes theorem is with a short example.

The people who get statistics lecture is probably heard Monty Hall Problem from their Professor which is a great example of Bayes theorem. There was a game show in the back days. In this show, there are 3 doors and one of the doors opens the brand new car. The participant can pick one of the doors. If the participant picks the right door out of 3 doors, can win the car. There is a 1/3 chance to pick the right door at the beginning. Let’s call the doors, A, B, and C. You pick door A, then the host (Monty Hall) opens door C. There is no car behind C, so you still have a chance to win the car. Then the host asks that do you want to change your door from A to B? Do you think that probability is still 1/3? Answer: No, it is not. Now, the probability has changed from 1/3 to 2/3. We know that thanks to the Bayes theorem.

Bayes Theorem

Let’s implement the Bayes theorem in our case. The new information is going to be your PCR test result. Now, we can ask that if I am tested positive, do I really have Covid-19?

Covid-19 Bayes Theorem

Based on Bayes theorem above, I have calculated the probability of having truly Covid-19 if you are tested positive with PCR test for all countries and all states by using Python.

Probability of having Covid-19 if I am being tested positive

Time to answer the question. Let’s say you live in Spain and feel a little bit tired. What is the probability you have Covid-19? Probability is 3.32%. Then you decide to go to the hospital to be tested and unfortunately you get the positive result, but you don’t feel sick anymore. So, what is the probability you have Covid-19 with a positive PCR test result? The probability of you have Covid-19 is 32.44% if you are tested positive with a PCR test in Spain.

This probability for the same situation is 35.11% in the US and 17.08% in Russia.

Probability of having Covid-19 if I am being tested positive in the US

Even though the probability of having Covid-19 is 3.72% in the US, the US is a big country, so this figure can change based on the states.

For example, if you live in Florida, the probability of you have Covid-19 is 4.30%. Then you get a positive result with a PCR test. The probability would increase to 38.64%. If you get a positive result and live in Wisconsin, the probability of you have Covid-19 is 49.17%.

Bonus — Prevalence of Covid-19 vs US 2020 Election

Prevelance of Covid-19 and the USA 2020 Election Results (from google) Maps

After I have created the prevalence of Covid-19 in the US, it reminds me the US 2020 president election result map.

Based on the maps above, there is a high similarity between the prevalence of Covid-19 and states election results and this insight surprises me a lot.

Python Code

If you would like to see the results for all countries and all the states. I have added my GitHub link below. Feel free to use it. Thanks for reading!

https://github.com/Cempek/COVID_19_Bayes_Analysis

References

https://www.bmj.com/content/bmj/369/bmj.m1808.full.pdf

https://opendata.ecdc.europa.eu/covid19/casedistribution/csv

https://data.cdc.gov/api/views/9mfq-cb36/rows.csv?accessType=DOWNLOAD

http://www2.census.gov/programs-surveys/popest/datasets/2010-2019/national/totals/nst-est2019-alldata.csv

https://gist.githubusercontent.com/mshafrir/2646763/raw/8b0dbb93521f5d6889502305335104218454c2bf/states_hash.json

https://brilliant.org/wiki/monty-hall-problem/

--

--