The Processor Is The Central Processing Unit Of The Computer That Manages The Execution Of All Instructions And Is Often Referred To As The Brain Of The Computer.
This article will learn more about the processor and its mechanism.
The central processing unit (CPU) is a vital element in any computer and manages all the calculations and instructions transferred to other computer components and peripherals.
Almost all electronic devices and gadgets you use, From desktops, laptops, and phones to gaming consoles and smartwatches, everyone is equipped with a central processing unit. This unit is considered essential for computers without which the system cannot be turned on or usable.
The high speed of the central processing unit is a function of the input command, and the components of the computer only gain executive power if they are connected to this unit.
Today, the market’s most common central processing units consist of semiconductor components on integrated circuits, which are sold in various types. Since the central processing units manage the data of all parts of the computer simultaneously, it may work slowly as the volume of calculations and processes increases or even fail or crash as the workload increases. AMD and Intel have been competing in this field since 50 years ago, leading manufacturers in this industry.
What is a processor?
To get to know the central processing unit (CPU), we briefly introduce a part of the computer called SoC. SoC, or system on a chip, is a part of a system that integrates all the components a computer needs for processing on a silicon chip. The SoC has various modules, of which the central processing unit (abbreviated as CPU) is the main component, and the GPU, memory, USB controller, power management circuits, and wireless radios (WiFi, 3G, 4G LTE, etc.) are miscellaneous components that may be necessary. Not exist on the SoC.
The central processing unit, which from now on and in this article will be called the processor for short, cannot process instructions independently from other chips, But building a complete computer is only possible with SoC.
The SoC is slightly larger than the CPU yet offers much more functionality. In fact, despite the great emphasis placed on the technology and performance of the processor, this part of the computer is not a computer in itself, and it can be introduced as a speedy calculator that is part of the system on a chip or SoC; It retrieves data from memory and then performs some kind of arithmetic (addition, multiplication) or logical (and, or, not) operation on it.
The process of processing instructions in the processor includes four main steps that are executed in order:
Calling or fetching instructions from memory (Fetch): The processor first receives these instructions from memory to know how to manage the input and understand the instructions related to it. This input may be one or infinitely many commands that must be addressed in separate locations. For this purpose, there is a unit called PC (abbreviation of Program Counter) or program counter , which maintains the order of sent commands; The processor is also constantly communicating with RAM in a cooperative interaction to find the address of the instruction (reading from memory).
Decoding or translation of instructions (Decode): Instructions are translated into a form that can be understood by the processor (machine language or binary). After receiving the instructions, the processor needs to translate these codes into machine language (or binary) to understand them. Writing programs in binary language, from the very beginning, is a difficult task, and for this reason, codes are written in simpler programming languages, and then a unit called Assembler converts these commands into executable codes ready for processor processing.
Processing or execution of translated instructions (Execute): The most important step in the processor’s performance is the processing and execution of instructions. At this stage, the decoded and binary instructions are processed for execution with the help of ALU unit (abbreviation of Arithmetic & Logic Unit) or calculation and logic unit, at a special address.
Storage of execution results (Store): The results and output of instructions are stored in the processor’s peripheral memory with the Register unit’s help, so that they can be referred to in future instructions to increase speed (writing in memory).
The process described above is called a fetch-execute cycle , and it happens millions of times per second; Each time after the completion of these four main steps, it is the turn of the next instruction and all steps are executed again from the beginning until all the instructions are processed.
Operating units of processors
Each processor consists of three operational units that play a role in the process of processing instructions:
Arithmetic & Logic Unit (ALU): This is a complex digital circuit unit that performs arithmetic and comparison operations; In some processors, the ALU is divided into two sections, AU (for performing arithmetic operations) and LU (for performing logical operations).
Memory Control Unit (CU or Program Counter): This circuit unit directs and manages operations within the processor and dictates how to respond to instructions to the calculation and logic unit and input and output devices. The operation of the control unit in each processor can be different depending on its design architecture.
Register unit (Register): The register unit is a unit in the processor that is responsible for the temporary storage of processed data, instructions, addresses, sequence of bits, and output, and must have sufficient capacity to store this data. Processors with 64-bit architecture have registers with 64-bit capacity, and processors with 32-bit architecture have 32-bit registers.
The relationship between the instructions and the processor hardware design forms the processor architecture; But what is 64 or 32 bit architecture? What are the differences between these two architectures? To answer this question, we must first familiarize ourselves with the set of instructions and how to perform their calculations:
Set of instructions
An instruction set is a set of operations that any processor can execute naturally. This operation consists of several thousands of simple and basic instructions (such as addition, multiplication, transfer, etc.) whose implementation is defined in advance for the processor, and if the operation is outside the scope of this set of instructions, the processor cannot execute it.
As mentioned, the processor is responsible for executing programs. These programs are a set of instructions written in a programming language that must be followed in a logical order and exactly step by step execution.
Since computers do not understand programming languages directly, these instructions must be translated into a machine language or binary form that is easier for computers to understand. The binary form consists of only two numbers zero and one and shows the two possible states of on (one) or off (zero) transistors for the passage of electricity.
In fact, each processor can be considered a set of electrical circuits that provide a set of instructions to the processor, and then the circuits related to that operation are activated by an electrical signal and the processor executes it.
Instructions consist of a certain number of bits.
For example, in an 8-bit instruction; Its first 4 bits refer to the operation code and the next 4 bits refer to the data to be used. The length of an instruction set can vary from a few bits to several hundreds of bits and in some architectures it has different lengths.
In general, the set of instructions are divided into the following two main categories:
- Computer calculations with a set of reduced instructions (Reduced instruction set computer): For a RISC -based processor (read risk), the set of defined operations are simple and basic . These types of calculations perform processes faster and more efficiently and are optimized to reduce execution time; RISC does not need to have complex circuits and its design cost is low. RISC-based processors complete each instruction in a single cycle and only operate on data stored in registers; So they are simple instructions, they have a higher frequency, the information routing structure in them is more optimal, and they load and store the operations on the registers.
- Complex instruction set computer: CISC processors have an additional layer of microcode or microprogramming in which they convert complex instructions into simple instructions (such as addition or multiplication). Programmable instructions are stored in fast memory and can be updated. In this type of instruction set, a larger number of instructions can be included than in RICS, and their format can be of variable length. In fact, CISC is almost the opposite of RISC. CISC instructions can span multiple processor cycles, and data routing is not as efficient as RISC processors. In general, CISC-based processors can perform multiple operations during a single complex instruction, but take multiple cycles along the way.
RISC vs. CISC or ARM vs. x86
RISC and CISC are the beginning and end points of the instruction set category, and various other combinations are also visible. First, let’s state the basic differences between RISC and CISC:
RICS or Reduced Code of Practice
CISC or Complex Instruction Set
|RISC instruction sets are simple; They perform only one operation and the processor can process them in one cycle.||CISC instructions perform multiple operations, but the processor cannot process them in a single cycle.|
|RISC-based processors have more optimized and simpler information routing; The design of these commands is so simple that they can be implemented in parts.||CISC-based processors are complex in nature, and instructions are more difficult to execute.|
|RISC-based processors require stored data to execute instructions.||In CISC-based processors, you can work with instructions directly through RAM, and there is no need to load operations separately.|
|RISC does not require complex hardware and all operations are performed by software.||CISC design hardware requirements are higher. CISC instructions are implemented using hardware, and software is often simpler than RISC. This is why programs based on the CISC design require less coding and the instructions themselves do a large part of the operation.|
As mentioned, in the design of today’s modern processors, a combination of these two sets (CISC or RISC) is used. For example, AMD’s x86 architecture originally uses the CISC instruction set, but is also equipped with microcode to simplify complex RISC-like instructions. Now that we have explained the differences between the two main categories of instruction sets, we will examine their application in processor architecture.
If you pay attention to the processor architecture when choosing a phone or tablet, you will notice that some models use Intel processors, while others are based on ARM architecture.
Assume that different processors each have different instruction sets, which must be compiled separately for each processor to run different programs. For example, for each processor from the AMD family, it was necessary to develop a separate Windows, or thousands of versions of the Photoshop program were written for different processors.
For this reason, standard architectures based on RISC or CISC categories or a combination of the two were designed and the specifications of these standards were made available to everyone. ARM, PowerPC, x86-64 and IA-64 are examples of these architectural standards, and below we introduce two of the most important ones and their differences:
A brief history of processor architecture
In 1823, a person named Baron Jones Jacob Berzelius discovered the chemical element silicon (symbol Si, atomic number 14) for the first time. Due to its abundance and strong semiconductor properties, this element is used as the main material in making processors and computer chips. Almost a century later, in 1947, John Bardeen , Walter Brattin and William Shockley invented the first transistor at Bell Labs and received the Nobel Prize.
The first efficient integrated circuit (IC) was unveiled in September 1958, and two years later IBM developed the first automated mass production facility for transistors in New York. Intel was founded in 1968 and AMD was founded a year later.
The first processor was invented by Intel in the early 1970s; This processor was called Intel 4004 and with the benefit of 2,300 transistors, it performed 60,000 operations per second. The 4004 sold for $200 and only had 640 bytes of memory:
After Intel, Motorola introduced its first 8-bit processor (the MC6800) with a frequency of one to two MHz, and then MOS Technology introduced a faster and cheaper processor than the existing processors used in gaming consoles of the time, namely the Atari 2600 and Nintendo systems. Used like Apple II and Commodore 64.
The first 32-bit processor was developed by Motorola in 1979, although this processor was only used in Apple’s Macintosh and Amiga computers. A little later, National Semiconductor released the first 32-bit processor for public use.
In 1993, PowerPC released its first processor based on a 32-bit instruction set; This processor was developed by the AIM consortium (consisting of three companies Apple, IBM and Motorola) and Apple migrated from Intel to PowerPC at that time. In the following, you can see two promotional videos that Intel and PowerPC published in competition with each other:
PowerPC ad to show Intel’s weaknesses
Intel ad to show the weaknesses of PowerPC
Difference between 32-bit and 64-bit processor (x86 vs. x64): Simply put, the x86 architecture refers to a family of instructions that was used in one of the most successful Intel processors, the 8086, and if a processor is compatible with the x86 architecture, that processor known as x86-64 or x86-32, which is used for 32 (and 16) bit versions of Windows; 64-bit processors are called x64 and 32-bit processors are called x86.
The biggest difference between 32-bit and 64-bit processors is their different access to RAM:
The maximum physical memory of x86 architecture or 32-bit processors is limited to 4 GB; While x64 architecture (or 64-bit processors) can access physical memory of 8, 16, and sometimes even up to 32 GB. A 64-bit computer can run both 32-bit and 64-bit programs; In contrast, a 32-bit computer can only run 32-bit programs.
In most cases, 64-bit processors are more efficient than 32-bit processors when processing large amounts of data. To find out which programs your operating system supports (32-bit or 64-bit), just follow one of the following two paths:
- Press Win + X keys to bring up the context menu and then click system. -> In the window that opens, find the System type section in the Device specification section. You can see whether your Windows is 64-bit or 32-bit from this section.
- Type the term msinfo32 in the Windows search box and click on the displayed System Information. -> From the System Information section on the right, find the System type and see if your Windows operating system is based on x64 or X32.
The first route
The second path
ARM was a type of computer processor architecture that Acorn introduced in 1980; Before ARM, AMD and Intel both used Intel’s X86 architecture, based on CISC computing, and IBM also used RISC computing in its workstations. In fact, Acorn was the first company to develop a home computer based on RISC computing and named its architecture after ARM itself: Acorn RISC Machine. The company did not manufacture processors and instead sold licenses to use the ARM architecture to other processor manufacturers. Acorn Holding changed the name Acorn to Advanced a few years later.
The ARM architecture processes 32-bit instructions, and the core of a processor based on this architecture requires at least 35,000 transistors. Processors designed based on Intel’s x86 architecture, which process based on CISC calculations, require at least millions of transistors; In fact, the optimal energy consumption in ARM-based processors and their suitability for devices such as phones or tablets is related to the low number of transistors compared to Intel’s X86 architecture.
In 2011, ARM introduced the ARMv8 architecture with support for 64-bit instructions, and a year after that, Microsoft also launched a version of Windows compatible with the ARM architecture along with the Surface RT tablet.
ARM and X86-64 architecture differences
The ARM architecture is designed to be as simple as possible while keeping power dissipation to a minimum. On the other hand, Intel uses more complex settings with the X86 architecture, which is more suitable for more powerful desktop and laptop processors.
Computers moved to 64-bit architecture after Intel introduced the modern x86-64 architecture (also known as x64). The 64-bit architecture is essential for optimal calculations and performs 3D rendering and encryption with greater accuracy and speed. Today, both architectures support 64-bit instructions, but this technology came earlier for mobile.
When ARM implemented 64-bit architecture in ARMv8, it took two approaches in this architecture: AArch32 and AArch64. The first one is used to run 32-bit codes and the other one is used to run 64-bit codes.
ARM architecture is designed in such a way that it can switch between two modes very quickly. This means that the 64-bit instruction decoder no longer needs to be compatible with 32-bit instructions and is designed to be compatible with previous technology, although ARM has announced that processors based on the ARMv9 Cortex-A architecture will only be compatible with 64-bit instructions in 2023. and support for 32-bit applications and operating systems will end in next-generation processors.
The differences between ARM and Intel architecture largely reflect the achievements and challenges of these two companies. The optimal energy consumption approach in the ARM architecture , while it is suitable for power consumption below 5 watts in mobile phones, allows the performance of processors based on this architecture to be improved to the level of Intel laptop processors. Compared to Intel’s 100-watt power consumption in Core i7 and Core i9 processors or even AMD processors, it is considered a great achievement in high-end desktops and servers, although historically it is not possible to lower this power below 5 watts.
Processors that use more advanced transistors consume less power, and Intel has long been trying to upgrade its lithography from 14nm to more advanced lithography. The company recently managed to produce its processors with the 10nm manufacturing process, but in the meantime, mobile processors have also moved from 20nm to 14nm, 10nm and 7nm designs, which is a result of competition from Samsung and TSMC. On the other hand, AMD unveiled 7nm processors in the Ryzen series and surpassed its x86-64 architecture competitors.
Nanometer: A meter divided by a thousand is equal to a millimeter, a millimeter divided by a thousand is equal to a micrometer, and a micrometer divided by a thousand is equal to a nanometer, in other words, a nanometer is one billion times smaller than a meter.
Lithography or manufacturing process: lithography is a Greek word meaning lithography, which refers to the way components are placed in processors, or the process of producing and forming circuits; This process is carried out by specialized manufacturers in this field, such as TSMC. In lithography , since the production of the first processors until a few years ago, nanometers showed the distances of placing processor components together; For example, the 14nm lithography of the Skylake series processors in 2015 meant that the components of that processor were separated by 14nm. At that time, it was believed that the less lithography or processor manufacturing process, the more optimal energy consumption and better performance.
The spacing of components in processors is not so relevant nowadays and the processes used to make these products are more contractual; Because it is no longer possible to reduce these distances beyond a certain limit without reducing productivity. In general, with the passage of time, the advancement of technology, the design of different transistors and the increase in the number of these transistors in the processor, manufacturers have adopted various other solutions such as 3D stacking to place transistors on the processors.
The most unique capability of ARM architecture can be considered as keeping the power consumption low in running mobile applications; This achievement comes from ARM’s heterogeneous processing capability; ARM architecture allows processing to be divided between powerful and low-power cores, and as a result, energy is used more optimally.
ARM’s first attempt in this field dates back to the big.LITTLE architecture in 2011, when the large Cortex-A15 cores and the small Cortex-A7 cores arrived. The idea of using powerful cores for heavy applications and using low-power cores for light and background processing may not have been given as much attention as it should be, but ARM experienced many unsuccessful attempts and failures to achieve it; Today, ARM is the dominant architecture in the market: for example, iPads and iPhones exclusively use ARM architecture.
In the meantime, Intel’s Atom processors, which did not benefit from heterogeneous processing, could not compete with the performance and optimal consumption of processors based on ARM architecture, which made Intel fall behind ARM.
Finally, in 2020, Intel was able to use a hybrid architecture for cores with a powerful core (Sunny Cove) and four low-consumption cores (Tremont) in the design of its 10nm Lakefield processors, and along with this achievement, it also uses graphics and connectivity capabilities. , but this product was made for laptops with a power consumption of 7 watts, which is still considered a high consumption for phones.
Another important distinction between Intel and ARM is in the way they use their design. Intel uses its developed architecture in the processors it manufactures and sells the architecture in its products, while ARM sells its design and architecture certification with customization capabilities to other companies, such as Apple, Samsung, and Qualcomm, and these companies They can make changes in the set of instructions of this architecture and design depending on their goals.
Custom processors are expensive and complicated for companies that manufacture these products, but if done right, the end products can be very powerful. For example, Apple has repeatedly proven that customizing the ARM architecture can bring the company’s processors to par with x84-64 or beyond.
Apple eventually plans to remove all Intel-based processors from its Mac products and replace them with ARM-based silicon. The M1 chip is Apple’s first attempt in this direction, which was released along with MacBook Air, MacBook Pro and Mac Mini. After that, the M1 Max and M1 Ultra chips also showed that the ARM architecture combined with Apple’s improvements could challenge the x86-64 architecture.
As mentioned earlier, standard architectures based on RISC or CISC categories or a combination of the two were designed and the specifications of these standards were made available to everyone; Applications and software must be compiled for the processor architecture on which they run. This issue was not a big concern before due to the limitations of different platforms and architectures, but today the number of applications that need different compilations to run on different platforms has increased.
ARM-based Macs, Google’s Chrome OS, and Microsoft’s Windows are all examples in today’s world that require software to run on both Arm and x86-64 architectures. Native software compilation is the only solution that can be used in such situations.
In fact, for these platforms, it is possible to simulate each other’s code, and the code compiled for one architecture can be executed on another architecture. It goes without saying that such an approach to the initial development of an application compatible with any platform is accompanied by a decrease in performance, but the very possibility of simulating the code can be very promising for now.
After years of development, currently the Windows emulator for a platform based on ARM architecture provides acceptable performance for running most applications, Android applications also run more or less satisfactorily on Chromebooks based on Intel architecture, and Apple, which has a special code translation tool for has developed itself (Rosetta 2) supports older Mac applications that were developed for the Intel architecture.
But as mentioned, all three perform weaker in the implementation of programs than if the program was written from scratch for each platform separately. In general, the architecture of ARM and Intel X86-64 can be compared as follows:
|CISC vs. RISC||The ARM architecture is an architecture for processors and therefore does not have a single manufacturer. This technology is used in the processors of Android phones and iPhones.||The X86 architecture is produced by Intel and is exclusively used in desktop and laptop processors of this company.|
|Complexity of instructions||The ARM architecture uses only one cycle to execute one command, and this feature makes processors based on this architecture more suitable for devices that require simpler processing.||The Intel architecture (or the X86 architecture associated with 32-bit Windows applications) often uses CISC computing, and therefore has a slightly more complex instruction set and requires several cycles to execute.|
|Mobile CPUs vs. Desktop CPUs||The dependence of the ARM architecture on the software makes this architecture to be used more in the design of phone processors; ARM (in general) works better on smaller technologies that don’t have constant access to the power supply.||Because Intel’s X86 architecture relies more on hardware, this architecture is typically used to design processors for larger devices such as desktops; Intel focuses more on performance and is considered a better architecture for a wider range of technologies.|
|energy consumption||The ARM architecture not only consumes less energy thanks to its single-cycle computing set, but also has a lower operating temperature than Intel’s X86 architecture; ARM architectures are great for designing phone processors, because they reduce the amount of energy required to keep the system running and execute the user’s requested commands.||Intel’s architecture is focused on performance, so it won’t be a problem for desktop or laptop users who have access to an unlimited power source.|
|Processor speed||CPUs based on ARM architecture are usually slower than their Intel counterparts because they perform calculations with lower power for optimal consumption.||Processors based on Intel’s X86 architecture are used for faster computing.|
|operating system||ARM architecture is more efficient in the design of Android phone processors and is considered the dominant architecture in this market; Although devices based on the X86 architecture can also run a full range of Android applications, these applications must be translated before running. This scenario requires time and energy, so battery life and overall processor performance may suffer.||Intel architecture reigns as the dominant architecture in tablets and Windows operating system. Of course, in 2019, Microsoft released the Surface Pro X with a processor that uses the ARM architecture and could run the full version of Windows. If you are a gamer or if you have expectations from your tablet beyond running the full version of Windows, it is better to still use the Intel architecture.|
During the competition between Arm and x86 over the past ten years, ARM can be considered the winning architecture for low-power devices such as phones. This architecture has also made great strides in laptops and other devices that require optimal energy consumption. On the other hand, although Intel has lost the phone market, the efforts of this manufacturer to optimize energy consumption have been accompanied by significant improvements over the years, and with the development of hybrid architecture, such as the combination of Lakefield and Alder Lake, now more than ever, there are many commonalities with processors. It is based on Arm architecture. Arm and x86 are distinctly different from an engineering point of view, and each has its own individual strengths and weaknesses, however, today it is no longer easy to distinguish between the two’s use cases, as both architectures are increasingly supported. It is increasing in ecosystems.
Processor performance indicators
The performance of the processor has a great impact on the speed of loading programs and their smooth execution, and there are various measures to measure the performance of each processor, of which the frequency (clock speed) is one of the most important. So be careful, the frequency of each core can be considered as a criterion for measuring its processing power, but this criterion does not necessarily represent the overall performance of the processor and many things such as the number of cores and threads , internal architecture (synergy between cores), cache memory capacity, Overclocking capability, thermal power, power consumption, IPC , etc. were also considered to judge the overall performance of the processor.
Synergy is an effect that results from the flow or interaction of two or more elements. If this effect is greater than the sum of the effects that each of those individual elements could produce, then synergy has occurred.
In the following, we will explain more about the factors influencing the performance of the processor:
One of the most important factors in choosing and buying a processor is its frequency (clock speed), which is usually a fixed number for all its cores. The number of operations that the processor performs per second is known as its speed and is expressed in Hertz, MHz (MHz for older processors) or GHz.
At the same frequency, a processor with a higher IPC can do more processing and is more powerful
More precisely, frequency refers to the number of computing cycles that processor cores perform per second, and is measured in GHz (GHz-billion cycles per second).
For example, a 3.2 GHz processor performs 3.2 billion operations per second. In the early 1970s, processors passed the frequency of one megahertz (MHz), or running one million cycles per second, and around 2000, the gigahertz (GHz), equal to one billion hertz, was chosen to measure their frequency.
Sometimes, multiple instructions are completed in one cycle, and in some cases, an instruction may be processed in multiple cycles. Since different architectures and designs of each processor perform instructions in a different way, the processing power of their cores can be different depending on the architecture. In fact, without knowing the number of instructions processed per cycle (IPC) comparing the frequency of two processors is completely meaningless.
Suppose we have two processors; One is produced by company A and the other by company B, and the frequency of both of them is the same and equal to one GHz. If we have no other information, we may consider these two processors to be the same in terms of performance; But if the processor of company A completes one instruction in each cycle and the processor of company B can complete two instructions in each cycle. Obviously, the second processor will perform faster than the A processor.
In simpler terms, at the same frequency, a processor with a higher IPC can do more processing and is more powerful. So, to properly evaluate the performance of each processor, in addition to the frequency, you will also need the number of instructions it performs in each cycle.
Therefore, it is better to compare the frequency of each processor with the frequency of processors of the same series and generations with the same processor. It’s possible that a processor from five years ago with a high frequency will outperform a newer processor with a lower frequency, because newer architectures handle instructions more efficiently.
Intel’s X-series processors may outperform higher-frequency K-series processors because they split tasks between more cores and have larger caches; On the other hand, within the same generation of processors, a processor with a higher frequency usually performs better than a processor with a lower frequency in many applications. That is why the manufacturer company and processor generation are very important when comparing processors.
Base frequency and boost frequency: The base frequency of any processor is the minimum frequency that the processor works with when idle or when performing light processing; on the other hand, the boost frequency is a measure that shows how much the processor performs when performing heavier calculations or more demanding processes. can increase. Boost frequencies are automatically applied and limited by heat from heavy processing before the processor reaches unsafe levels of computing.
In fact, it is not possible to increase the frequency of a processor without physical limitations (mainly electricity and heat), and when the frequency reaches about 3 GHz, the power consumption increases disproportionately.
Another factor that affects the performance of the processor is the capacity of the processor’s cache memory or RAM; This type of RAM works much faster than the main RAM of the system due to being located near the processor and the processor uses it to temporarily store data and reduce the time of transferring data to/from the system memory.
Therefore, cache can also have a large impact on processor performance; The more RAM the processor has, the better its performance will be. Fortunately, nowadays all users can access benchmark tools and evaluate the performance of processors themselves, regardless of manufacturers’ claims.
Cache memory can be multi-layered and is indicated by the letter L. Usually, processors have up to three or four layers of cache memory, the first layer (L1) is faster than the second layer (L2), the second layer is faster than the third layer (L3), and the third layer is faster than the fourth layer (L4). . The cache memory usually offers up to several tens of megabytes of space to store, and the more this space is, the higher the price of the processor will be.
The cache memory is responsible for maintaining data; This memory has a higher speed than the RAM of the computer and therefore reduces the delay in the execution of commands; In fact, the processor first checks the cache memory to access the desired data, and if the desired data is not present in that memory, it goes to the RAM.
- Level one cache memory (L1) , which is called the first cache memory or internal cache; It is the closest memory to the processor and has high speed and smaller volume than other levels of cache memory, this memory stores the most important data needed for processing; Because the processor, when processing an instruction, first of all goes to the first level cache memory.
- Level two cache memory (L2) , which is called external cache memory, has a lower speed and a larger volume than L1, and according to the processor structure, it may be used jointly or separately. Unlike L1, L2 was placed on the motherboard in old computers, but today, in new processors, this memory is placed on the processor itself and has a lower delay than the next cache layer, L3.
- The L3 cache memory is a memory that is shared by all the cores in the processor and has a larger capacity than the L1 or L2 cache memory, but it is slower than the two.
- Like L3, L4 cache has a larger volume and lower speed than L1 or L2; L3 or L4 are usually shared.
The core is the processing unit of the processor that can independently perform or process all computing tasks. From this point of view, the core can be considered as a small processor in the whole central processing unit. This part of the processor consists of the same calculation and logical operation units (ALU), memory control (CU) and register (Register) that perform the process of processing instructions with the fetch-execution cycle.
In the beginning, processors worked with only one core, but today, processors are mostly multi-core, with at least two or more cores on an integrated circuit, processing two or more processes simultaneously. Note that each core can only execute one instruction at a time. Processors equipped with multiple cores execute sets of instructions or programs using parallel processing (Parallel Computing) faster than before. Of course, having more cores does not mean increasing the overall performance of the processor. Because many programs do not yet use parallel processing.
- Single-core processors: The oldest type of processor is a single-core processor that can execute only one command at a time and is not efficient for multitasking. In this processor, the start of a process requires the end of the previous operation, and if more than one program is executed, the performance of the processor will decrease significantly. The performance of a single-core processor is calculated by measuring its power and based on frequency.
- Dual-core processors: A dual-core processor consists of two strong cores and has the same performance as two single-core processors. The difference between this processor and a single-core processor is that it switches back and forth between a variable array of data streams, and if more threads or threads are running, a dual-core processor can handle multiple processing tasks more efficiently.
- Quad-core processors: A quad-core processor is an optimized model of a multi-core processor that divides the workload between cores and provides more effective multitasking capabilities by benefiting from four cores; Hence, it is more suitable for gamers and professional users.
- Six-core processors (Hexa-Core) : Another type of multi-core processor is a six-core processor that performs processes at a higher speed than four-core and two-core types. For example, Intel’s Core i7 processors have six cores and are suitable for everyday use.
- Octa-Core processors: Octa-core processors are developed with eight independent cores and offer better performance than previous types; These processors include a dual set of quad-core processors that divide different activities between different types. This means that in many cases, the minimum required cores are used for processing, and if there is an emergency or need, the other four cores are also used in performing calculations.
- Ten-core processors (Deca-Core): Ten-core processors consist of ten independent systems that are more powerful than other processors in executing and managing processes. These processors are faster than other types, perform multitasking in the best possible way, and more and more of them are released to the market day by day.
Difference between single-core and multi-core processing
In general, it can be said that the choice between a powerful single-core processor and a multi-core processor with normal power depends only on the way of use, and there is no pre-written version for everyone. The powerful performance of single-core processors is important for use in software applications that do not need or cannot use multiple cores. Having more cores doesn’t necessarily mean faster, but if a program is optimized to use multiple cores, it will run faster with more cores. In general, if you mostly use applications that are optimized for single-core processing, you probably won’t benefit from a processor with a large number of cores.
Let’s say you want to take 2 people from point A to B, of course a Lamborghini will do just fine, but if you want to transport 50 people, a bus can be a faster solution than multiple Lamborghini commutes. The same goes for single-core versus multi-core processing.
In recent years and with the advancement of technology, processor cores have become increasingly smaller, and as a result, more cores can be placed on a processor chip, and the operating system and software must also be optimized to use more cores to divide instructions and execute them simultaneously. allocate different If this is done correctly, we will see an impressive performance.
In traditional multi-core processors, all cores were implemented the same and had the same performance and power rating. The problem with these processors was that when the processor is idle or doing light processing, it is not possible to lower the energy consumption beyond a certain limit. This issue is not a concern in conditions of unlimited access to power sources, but it can be problematic in conditions where the system relies on batteries or a limited power source for processing.
This is where the concept of asymmetric processor design was born. For smartphones, Intel quickly adopted a solution that some cores are more powerful and provide better performance, and some cores are implemented in a low-consumption way; These cores are only good for running background tasks or running basic applications like reading and writing email or browsing the web.
High-powered cores automatically kick in when you launch a video game or when a heavy program needs more performance to do a specific task.
Although the combination of high-power and low-consumption cores in processors is not a new idea, using this combination in computers was not so common, at least until the release of the 12th generation Alder Lake processors by Intel.
In each model of Intel’s 12th generation processors, there are E cores (low consumption) and P cores (powerful); The ratio between these two types of cores can be different, but for example, in Alder Lake Core i9 series processors, eight cores are intended for heavy processing and eight cores for light processing. The i7 and i5 series have 8.4 and 6.4 designs for P and E cores, respectively.
There are many advantages to having a hybrid architecture approach in processor cores, and laptop users will benefit the most, because most daily tasks such as web browsing, etc., do not require intensive performance. If only low-power cores are involved, the computer or laptop will not heat up and the battery will last longer.
Low power cores are easy and cheap to produce, so using them to boost and free up powerful and advanced cores seems like a smart idea.
Even if you have your system connected to a power source, the presence of low-power cores will be efficient. For example, if you are engaged in gaming and this process requires all the power of the processor, powerful cores can meet this need, and low-power cores are also responsible for running background processes or programs such as Skype, etc.
At least in the case of Intel’s Alder Lake processors, the P and E cores are designed to not interfere with each other so that each can perform tasks independently. Unfortunately, since combining different processors is a relatively new concept for x86 processors, this fundamental change in the x86 architecture is fraught with problems.
Before the idea of hybrid cores (or the combination of powerful cores or P and low-consumption or E) was introduced, software developers did not see a reason to develop their products compatible with this architecture, so their software was not aware of the difference between low-consumption and high-consumption cores, and this caused In some cases, there may be reports of crashes or strange behavior of some software (such as Denuvo).
Processing threads are threads of instructions that are sent to the processor for processing; Each processor is normally capable of processing one instruction, which is called the main instruction, and if two instructions are sent to the processor, the second instruction is executed after the first instruction is executed. This process can slow down the speed and performance of the processor. In this regard, processor manufacturers divide each physical core into two virtual cores (Thread), each of these cores can run a separate processing thread, and each core, having two threads, can run two processing threads at the same time. .
Active processing versus passive processing
Active processing refers to the process that requires the user to manually set data to complete an instruction ; Common examples of active processing include motion design, 3D modeling, video editing, or gaming. In this type of processing, single-core performance and high core speed are very important, so we need fewer, but more powerful, cores to run such processes to benefit from smooth performance.
Passive processing , on the other hand, are instructions that can usually be easily executed in parallel and left alone , such as 3D rendering and video; Such processing requires processors with a large number of cores and a higher base frequency, such as AMD’s Threadripper series processors.
One of the influential factors in performing passive processing is the high number of threads and their ability to be used. In simple words, a thread is a set of data that is sent to the processor for processing from an application and allows the processor to perform several tasks at the same time in an efficient and fast way; In fact, it is because of the threads in the system that you can listen to music while surfing the web.
Threads are not physical components of the processor, but represent the amount of processing that the processor cores can do, and to execute several very intensive instructions simultaneously, you will need a processor with a large number of threads.
The number of threads in each processor is directly related to the number of cores; In fact, each core can usually have two threads and all processors have active threads that assign at least one thread to each process.
What is hypertrading or SMT?
Hyperthreading (Hyperthreading) in Intel processors and simultaneous multithreading (SMT) in AMD processors are concepts to show the process of dividing physical cores into virtual cores; In fact, these two features are a solution for scheduling and executing instructions that are sent to the processor without interruption.
Today, most processors are equipped with hyperthreading or SMT capability and run two threads per core. However, some low-end processors, such as Intel’s Celeron series or AMD’s Ryzen 3 series, do not support this feature and only have one thread per core. Even some high-end Intel processors come with disabled hyperthreading for various reasons such as market segmentation, so it is generally better to read the Cores & Threads description section before buying any processor. Check out.
Hyperthreading or simultaneous multithreading help to schedule instructions more effectively and use parts of the core that are currently inactive. At best, threads provide about 50% more performance compared to physical cores.
In general, if you only run active processing such as 3D modeling during the day, you probably won’t be using all of your CPU’s cores; Because this type of processing usually only runs on one or two cores, but for processing such as rendering that requires all the power of the processor cores and available threads, using hyperthreading or SMT can make a significant difference in performance.
CPU in gaming
Before the introduction of multi-core processors, computer games were developed for single-core systems, but after the introduction of the first dual-core processor in 2005 by AMD and the release of four, six and eight-core processors after that, there is no longer a limit to the help of more cores. did not have; Because the ability to execute several different operations at the same time was provided for the processors.
In order to have a satisfactory experience with the gaming system, every gamer should choose a balanced processor and GPU (we will examine the GPU and its function in a separate article) in a balanced way. If the processor has a weak or slow performance and cannot execute commands fast enough, the system graphics cannot use its maximum power; Of course, the opposite is also true. In such a situation, we say that the graphics has become a bottleneck.
What is a bottleneck?
In the field of computers, bottleneck (or bottleneck) is said to limit the performance of a component as a result of the difference in the maximum capabilities of two hardware components. Simply put, if the graphics unit receives instructions faster than the processor can send them, the unit will sit idle until the next set of instructions is ready, rendering fewer frames per second; In this situation, the level of graphics performance is limited due to processor limitations.
The same may happen in the opposite direction. If a powerful processor sends commands to it faster than the graphics unit can receive, the processor’s capabilities are limited by the poor performance of the graphics.
In fact, a system that consists of a suitable processor and graphics, provides a better and smoother performance to the user. Such a system is called a balanced system. In general, a balanced system is a system in which the hardware does not create bottlenecks (or bottlenecks) for the user’s desired processes and provides a better user experience without disproportionate use (too much or too little) of system components.
It is better to pay attention to a few points to set up a balanced system:
- You can’t set up a balanced system for an ideal gaming experience just by buying the most expensive processor and graphics available in the market.
- Butlink is not necessarily caused by the quality or oldness of the components and is directly related to the performance of the system hardware.
- Graphics botlinking is not specific to advanced systems, and balance is also very important in systems with low-end hardware.
- The creation of botlinks is not exclusive to the processor and graphics, but the interaction between these two components prevents this problem to a large extent.
Setting up a balanced system
In the case of gaming or graphic processing, when the graphics do not use their maximum power, the effect of processor power on improving the quality of the user’s gaming experience will be noticeable if there is a high coordination between the graphics unit and the processor; In addition, the type and model of the game are also two important factors in choosing hardware. Currently, quad-core processors can still be used to run various games, but hexa-core processors or more will definitely give you smoother performance. Today, multi-core processors for games such as first-person shooters (FPS) or online multiplayer games are a must for any gaming system.