The problem of energy optimization in multi-core systems (such as single-chip multiprocessors) where the individual energy demands of various processing elements are governed by instantaneous workload requirements is well defined in literature. The significance of the problem is underlined by the increasing prominence of multi-core systems that must operate under strict power/energy budget constraints, both in mobile applications and in cases where special cooling arrangements can be very expensive. A range of solutions have been proposed over the last few years, which are mostly based on static, off-line calculation of a limited set of operating points in the form of optimum voltage and frequency assignments, that are subsequently chosen according to actual demands. Still, to our best knowledge, none of these studies have demonstrated an on-line solution to complex, multi-variable energy optimization problem which allows dynamic adjustment of individual operating frequencies and supply voltages of multiple processing elements. This work presents the design and silicon implementation of an analog-based energy optimizer unit, which is capable of dynamically adjusting power supply and clock frequencies of multiple embedded cores, tailored to the instantaneous workload information (computational task) and fully adaptive to variations in process and temperature.
Our approach borrows from the basic principles of analog computation to continuously optimize the system-wide energy dissipation of multiple processing elements, converging on the global minima of the constrained optimization problem which are represented as stable operating points of a simple feedback loop. It is already well known that stable, approximate solutions of multi- variable optimization problems (such as gradient descent) can be obtained by using very compact analog circuits, e.g. resistive networks. The analogy between the energy minimization problem under timing constraints in a general task graph and the power minimization problem under Kirchhoff's current law constraints in an equivalent resistive network is exploited. The implementation of the on-line analog optimizer is then discussed. The realization of the blocks composing the system architecture is described, and circuit design issues are studied thoroughly.