PART 5. DUAL CORE CPU Soru&Cevap Ve TWEAKLER
Q&A on parameters and factors that control the performance, throughput and battery-life delivered by GS2's dual-core CPU, and some CPU tweaks
Q. "What is the basic hardware of GS2 that make all of us enjoy this phone so much and boast about benchmark scores to office-mates and friends?"
Processor: ARM Cortex-A9 MPCore processor on Exynos 4210 SoC (System on a Chip - ICs where all components are integrated into a single chip) and 45nm semi-conductor technology. Exynos 4210 is supposed to give 6.4GB/s memory bandwidth for heavy-weight ops such as full hd ***** encoding.
GPU: ARM Mali-400
Memory: LPDDR2 (may be DDR3)
Q. "What is the significance of bus frequency?"
A. Bus speed at its simplest form determines how fast the data should travel to and from memory. Memory throughput is directly proportional to bus frequency. In tasks that includes small amount of work on every element in a data sets, lower bus speed means longer the CPU has to wait for data to arrive from memory. Because, CPU spends only little time on each of these elements, and a slow bus cannot catch-up.
Advanced Micro-controller Bus Architecture (AMBA) is used as the on-chip bus in system-on-a-chip designs, like our device.
Q. "What is modifying bus frequency? How do I do it? Advantages?"
A. Stock behavior is dynamic bus frequency scaling, where in operating bus speed is dynamically calculated for each CPU frequency depending on the application/processs requirement. We can modify this behavior by setting static bus frequency scaling, specifying at what bus speed should each CPU frequency operate. Three values/levels are possible.
0 400 mhz
1 266 mhz
2 133 mhz
Sample bus frequency modification:
echo "0 0 0 1 1 1 2 2" > /sys/devices/system/cpu/cpu0/cpufreq/busfreq_static
echo "enabled" > /sys/devices/system/cpu/cpu0/cpufreq/busfreq_static
This means for first three higher CPU frequency steps, 400 mhz bus will be used.
Next three, 266 mhz
And last two, 133 mhz
Advantages of bus frequency modification: i) Saves battery by using low bus speeds on low frequencies and ii) Prevent overheating.
Q. "I experience some lags sometimes while playing HD *****s or playing heavy 3d games using static bus frequencies. Why?"
A. HD *****s and some games require a minimum of 400/266 mhz bus irrespective of the CPU frequencies being used during the run. To resolve, set higer bus for 500 mhz and higher frequencies or simply disable static bus frequency scaling to switch to default.
echo "disabled" > /sys/devices/system/cpu/cpu0/cpufreq/busfreq_static
Q. "Our phone CPU has two cores. How are they utilized? Are the two cores ON all the time?"
A. The stock behavior is Dynamic Hot Plug Mode where depending on the load, the second core is turned on. If the load can be handled by a single core, the second core is turned off dynamically. This behavior can be controlled by using Tegrak Second Core app from market if your kernel supports it. (Siyah, Lulz,etc supports this). Using this app you can set three modes :-
Dynamic Hot Plug Mode: Default mode. Second core is kicked in depending on the load, and kicked out when first core can handle the load alone.
Single Core Mode: Irrespective of the load, only first core is used always. This can lead to increased battery, but reduced performance.
Dual Core Mode: Irrespective of low loads, both the cores are always active. Increased performance, but reduced battery.
Recommendation: Use the stock hotplug mode during normal use. Switch to dual core mode only for benchmarking or playing some heavy 3d games.
Q. "OK, I'm using hot plug mode, still i want to control how often the second core kicks in. To make it more aggressive/more mild depending on my usage."
A. You can set UP & LOW thresholds for second core in Screen-On and Screen-Off states.
echo "70" > /sys/module/pm_hotplug/parameters/loadh
echo "25" > /sys/module/pm_hotplug/parameters/loadl
echo "90" > /sys/module/pm_hotplug/parameters/loadh_scroff
echo "35" > /sys/module/pm_hotplug/parameters/loadl_scroff
As you can see, when load > 70% second core becomes active and when load drops below 25%, second core is turned off.
During screen off, these values are 90 & 35 respectively. This helps in reducing unwanted kick-ins of second during screen-off state when music is playing, downloading, etc.
Q. "Like governors, is there a sampling rate/interval also at which the load on CPU is checked for crossing thresholds to turn second core ON?"
A. Yes there is. But it is set at kernel level in most kernels and can not be controlled at user level. Like you guessed, higher sampling rate could cause core 2 to kick in less often and thus save a little battery. In Siyah kernel though, these thresholds are configurable.
Q. "Advantages/Disadvantages of switching to Single Core/Dual Core modes?
A. Using only single core can save some battery, but can have some adverse effects too if there are some heavy tasks that require both cores too often: 3d games, full hd *****s, etc. So use it wisely.
Using dual core mode can reduce latency by a tiny bit on high loads, as compared to hot plugging. But hot plugging is intelligent enough to turn second core ON really fast when load demands it. Only first core (cpu0) can enter deep-idle (LPA), so using dual core mode in an idle system cause unwanted excess-power consumption.
Recommendation: Use Hot Plugging and tune thresholds (like mentioned above) for a better experience.
Q. "What are these modes: IDLE, LPA and AFTR?"
A. Between screen off and deep sleep states, there are some idle modes supported by cpuidle driver. They are IDLE aka Normal Idle, LPA aka Deep Idle and AFTR aka ARM Off Top Running. Race to idle by CPU is implemented for power management.
In IDLE state, CPU is not clocked anymore, but no hardware is powered down.
In deep idle (LPA),a state after IDLE, again, the cpu is not clocked anymore like we guessed but some parts of hardware are powered down. Deep idle brings in real power savings and there is no need of putting a hard limit to frequency during screen-off; using a screen-off profile. (Good practice is to use a governor with built in screen off profile, than using an user-configured screen-off profile by putting a hard limit on frequency). Deep idle is not used when device is entering deep sleep and also when device is woken from suspend/deep sleep. While entering/exiting DEEP IDLE, CPU is set statically to SLEEP_FREQ and is not clocked below or above until it exits this state.
AFTR is a patch to support Top=Off mode for deep idle. Level 2 cache keeps it data during this mode.
We can have IDLE or AFTR modes with LPA enabled or disabled. (Obviously it is not possible to have IDLE and AFTR together)
Q. "What idle modes are recommended for power saving? How do i change it"?
A. Recommended for power saving is to enable AFTR and LPA, ie value 3
echo "3" > /sys/module/cpuidle/parameters/enable_mask
Q. "What is sched_mc?"
A. Linaro team invented sched_mc or Schedule Multi Core to make process scheduling multi-core aware. ie, utilize both cores wisely to save power and balance performance. Even though sched_mc is sort of an alternative to cpu hot plugging, we can use sched_mc with the default hot plug mode.
0 : No power saving load balance, default in our exynos4210 Soc.
1 : Fill one thread/core/package first for long running threads. In our single-CPU dual-core device, multithreading does not come into picture, so load balancing is almost redundant to hotplugging.
2 : Also bias task wake-ups to semi-idle CPU package for power savings. (Bias new tasks to cpu1 if cpu0 is mostly filled with running tasks). This is 'overloading' CPU0 first.
Q. "What value is recommended for sched_mc?"
A. 1) If you find advantages to sched_mc, use sched_mc=1 for a possible battery saving. Anyhow since load-balancing is reduntant on hotplugging, it may not have any advantage on exynos chip.
2) For performance use 2. But do remember that loading CPU0 and leaving CPU1 can not do justice to hitting deep idle states sooner since second core can not enter deep idle. So extra performance or no performance, value 2 will drain some more battery, in the context of delayed didle.
3) To do justice to hotplugging, use value 0.
echo "0" /sys/devices/system/cpu/sched_mc_power_savings.
Q. "What is MALI aggressive policy on GPU?"
A. Mali aggressive scaling policy is simply lowering the up-threshold of GPU so that GPU doesn't jump to second frequency step too often. This makes more sense if lower step is under-clocked. In one release of Siyah, the threshold was changed to 55 from default 65.
Q. "What is tree rcu, fast nohz, jrcu?"
A. Read-Copy Update (RCU) is a synchronization mechanism added to Linux kernel. RCU improves scalability by allowing readers to execute concurrently with writers.
Tree RCU is a new implementation of original classic RCU to achieve more scalability as the number of CPUs increase. Tree RCU fixes a performance bug in classic RCU that results in massive lock contention on the internal RCU lock on systems with large number of CPUs.
Fast NoHz is an optimized version of the traditional Tree RCU. Many new kernels are using the Tickless NoHz design. This RCU is tailored and designed to work with the new NoHz kernel system.
JRCU mechanism in its simplest form, runs batch operations from a single CPU relieving other CPUs from this periodic responsibility. This is important for those real-time applications requiring full use of dedicated CPUs. For our dual core single CPU, JRCU can conflict with hot-plugging, hence we will have tree rcu (with or without CONFIG_RCU_FAST_NO_HZ) in our kernels.
Q. "What are SLAB, SLUB, SLQB?"
A. They're three memory allocation mechanisms.
Slab allocation is a memory management mechanism intended for the efficient memory allocation of kernel objects which displays the desirable property of eliminating fragmentation caused by allocations and de-allocations. SLAB is used to retain allocated memory that contains a data object of a certain type for reuse upon subsequent allocations of objects of the same type.
SLUB allocator promises better performance and scalability by dropping most of the queues and related overhead and simplifying the slab structure in general, while retaining the current slab allocator interface. SLUB offers to make alignment of objects and cleaning up of caches easier, as compared to SLAB.
SLQB - SLAB allocator with Queue. This is a slab allocator that focuses on per-CPU scaling. This memory allocator is designed for small number of CPUs system. This allocator is designed to be simple.
Note that SLUB is significant on a system with large number of CPUs. SLAB has the advantage of being simple.
Q. "Can i change the RCU synchronization mechanism & memory allocators?"
A. NO. They are set at compile time at kernel level, and are not configurable from user space.