In the following, I am going to try to tackle the most raised question nowadays in the simplest possible way using principles from the computer science world. No worry, also dummies in computer science should be able to follow up easily, since I am going to explain the principles as easy as pie first, before going to compare with the matter of the vaccine.

So, fill up your cup of coffee and let us start.

What is a hash function?

First of all, a hash function is, as the name says, a function! Yes, nothing other than just a mathematical function that takes an input, computes something and gives the output result.

Well, and what does the hash function take as input and what does it compute and give as output?

It takes any data as input and gives a number as output that can be used as a fingerprint for this data. Exactly like humans’ fingerprints, this output can be used to identify the given input from other inputs.

A function that takes “data” as input may sound a little bit strange in comparison to those typical mathematical functions that you supposedly already know from math classes in school days. Those functions usually used to take only numbers as inputs and produced numbers as outputs. Well, it suffices here to know that numbers and data are just two faces of one medal, i.e. texts, pictures, huge computer files … etc are at the end of the day nothing other than bunches of “data” and these bunches of data can be represented as numbers. So, for example, the text “hello” is numerically represented as: 072101108108111. A picture is also numerically represented as a number (this number representing a picture would be very long and fill some pages so for the sake of readability it will not be mentioned here). The transition from texts or other kinds of data into numbers (i.e. how the numerical representation of the text “hello” has been computed as 072101108108111) is out of our scope here. Everything you need to know is that any data can be represented as numbers.

By storing these numbers electronically on some electronic device, we get the magical thing that everyone knows: the file.

Thus, hash functions could take even files as input, since files are nothing other than numbers, as we have just learned. And the output is then called the hash value of the file. So, hash values of files are used as fingerprints for the files.

But why do we need fingerprints for the files?

Exactly as by humans, we use a small piece of information to distinguish the person from other persons. Alike, files could be huge and consequently the numerical representations of them could be numbers of milliards of digits, but hashes are much shorter. So, if we want to say, this exact file is harmful or that exact file is so and so, we could use only the hash of it and say the file with this hash value is harmful or the file with that hash value is so and so.

So, are these hash values unique, i.e. are there no two files that have the same hash value?

To be honest, no. They are not unique. Theoretically, there could be many files that lead to the same hash value, but hash functions have specific mathematical properties that make it very very hard to deliberately create another file that gives the same hash value as of another one. For this reason and for other reasons, they are used for the identification, as if they were unique, although they are, in fact, not unique.

How does the antivirus work?

There are many approaches for antiviruses but the main are:

  • The antivirus program gets from its developers (when doing updates) lists of hash values of files that have been analyzed by the developing company and classified as harmful. The antivirus scans the files on the device it is installed on continuously in the background, computes the hash values of the files and compares them to the lists that it got. If some suspicious file found (suspicious means its hash value is listed on one of these blacklists that the antivirus has), it deletes the file.
  • Advanced antiviruses don’t depend only on hash comparisons but observe the behavior of the files and try to detect files with suspicious behaviors. For example, if a file has the extension “.jpg” in its name but it does not contain the numerical representation of a picture as it is supposed to be. Instead, this file contains the numerical representation of a computer program (computer program is a list of instructions that shall be executed). Moreover, this file tries to insert itself in the list of programs that runs in background when the computer starts and tries to connect to the internet and works completely in the background (it doesn’t have a graphical window). The antivirus analysis could detect it as a suspicious file, even if its hash value is not on one of those blacklists that the antivirus gets per updates.

The antivirus may also add the hash value of this file to its blacklists so that it detects this file when seen somewhen somewhere else later.

  • More advanced antivirus could send the hash of these suspicious files to the developing company to help in the analysis. If in future many reports came from different devices with the same hash value, the company would then try to look at the matter, try to find this file, analyze it and may insert this hash value in its blacklists in the next update so that all antiviruses can immediately detect this file in future as a threat without the necessity of analyzing it.

What is the Scheduler?

How computers do the magic that they do?

The magic they do is nothing other than executing programs. A program does not necessarily mean a typical program that you know with an icon that you click on to run it. Even when you do things that you don’t see explicitly as programs, such as doing copy and paste or resizing a window or connecting to a Wi-Fi network for example, these all are behind the scenes nothing other than small programs that are being executed.

Well, and what do programs consist of?

Programs are nothing other than lists of instructions that are stored in the memory of the computer. Again, the numerical representations of the instructions are stored, because electronic memories understand only numbers and store only numbers.

Computers contain processors which are complex electric circuits that read these lists of instructions, understand the numbers and execute them one after another.

In the old days, a computer contained only one processor and only one program stored in the memory. By connecting electricity to the processor, the electronic circuit worked and did the only thing it was built to do, namely reading the numbers from the memory and executing the instructions.

But things don’t stay so simple 😊 Computer scientists wanted to get crazier!

So, advanced principles came. We wanted to have not only one static program on the memory, but many programs, and we wanted to enable the user to select which one he would like to execute. We wanted to have computers not only with one processor but with multiple processors to execute many programs at the same time. We wanted to use the same memory to store programs and data (texts, pictures, … etc). Thus, the idea of files came out and every bunch of numbers that are related to each other is packed together in one unit that is called a file and the memory is so organized that every file has a specific address in the memory. By doing this, we achieved some kind of organization in the memory and we were able to store every program in a file.

Well, by doing that, we got a computer that has multiple processors, memories organized in files and store multiple programs. So, we had the musical instruments but how can musicians start playing? We needed an organizer: A main program that runs to enable the user to select which programs to run when and based on these wishes it knows where each program is stored in the memory and decides which program shall now by which processor be read and executed. This good guy is called the operating system. If you probably use Windows, then your clever guy is called Windows. Windows is this clever guy that delivers your wishes to the processor and tells him which program it shall execute to fulfill your wishes.

To be specific, the operating system has many other jobs, it manages many different things and helps even in things like creating new files or deleting files and many other things, but the part that is responsible for the matter of which program is given to which processor is called the “scheduler”.

What is a virus?

A virus is a program (stored in a file as we have just learned) that causes harmful things in the computer when executed.

What is a batch script?

A batch script is a file that contains a list of instructions.

So, it is a program, isn’t it?

Yes.

And why are you calling it by this strange name and don’t you simply say a program?

Hmmm, it is actually a special kind of programs and we, in the computer science, like to be precise. The difference between this kind of programs and the “real” programs is that the batch script does not contain instructions that can be read and executed directly by the processor as we described above. It contains some instructions that tell the operating system to do things like creating some files, deleting some files, executing some programs, ending some programs, shutting down the PC … etc. In other words, it contains what we called the wishes of the user but here are they written as a list of instructions instead of being given to the operating system in form of mouse clicks directly by the user.

Now let’s imagine the following scenario:

We have a known virus which spreads from a computer to another and causes harmful things when executed.

We wrote a batch script that tells the operating system to create some new files.

As we already learned, files are nothing other than bunches of numbers which represent either data (a picture, a text, a video … etc) or a program (a list of instructions).

These files that our batch script tells the operating system to create have two special properties:

  • They contain programs that behave a little bit suspiciously.
  • The numerical representation of the file is deliberately manipulated so that its hash value is equal to the hash value of the virus file.

We succeeded in injecting this batch script to the schedular of the operating system.

What happens?

The scheduler decides somewhen to execute the batch script. When executed, these new files are created. Somewhen the antivirus comes to scan them as it always does. Their hashes are not previously known to him. It analyses them and detects something suspicious in them and decides that they are threats. It computes their hash values and add them to its blacklists for the future.

Somewhen the real virus arrives this computer somehow. The antivirus comes to scan it. It detects it immediately because it already knows its hash value so no need to waste time in analyzing. It decides immediately to delete it.

Well, the batch script is the Pfizer/BioNTech Covid-19 vaccine! 😉


Tags: