Professional Documents
Culture Documents
HOME COMPUTE STORE CONNECT CONTROL CODE ANALYZE HPC ENTERPRISE HYPERSCALE CLOUD ISC16
Eventually, The Machine will move to a mix of DRAM and some form of non-volatile memory
when it moves from prototype to production, with DRAM being close to a system on chip (SoC)
compute complex and a massive pool of non-volatile memory being shared across thousands of
these SoCs using an optical interconnect that makes all of this memory look more or less local to
those SoCs.
While HP no doubt wants its own memristor memory technology to be used as the so-called
universal memory in the system, a kind of hybrid between memory and storage as we know it in
systems today, it cannot wait for that. And while HP chief technology officer Martin Fink has called
out phase change memory as a possible replacement for memristor memory until HP gets it sown
memristors ramped up, there is another option: The 3D XPoint memory that Intel and Micron
Technology unveiled three weeks ago has some of the same properties of memristors and PCM, in
that they all have persistent memory (unlike DRAM, which has to be powered up to keep its data)
and that they offer higher and cheaper capacity than DRAM even if they are a bit slower. As we
have learned throughout this Future Systems series, most of the people talking about future system
design expect NAND flash and maybe PCM memory to play a big part of the storage hierarchy over
the coming years, along with DRAM in various form factors and proximities to the processor. HP
would like to have a machine that just uses memristor technology, top to bottom. And it may even
get there.
The point is, and one that HP made clear last year, it is putting memory at the heart of the design of
The Machine, not compute, and making it suitable for a wide range of data-intensive applications,
both in terms of the amount of data and the bandwidth needed. But, as Faraboschi explained to the
ISC crowd, there is more to it than just slapping together a big pool of memory to make The
Machine, and some extra things will need to be done to make it suitable for HPC work.
The Machine is not being designed for HPC, per se, but for large data analytics, graph analytics,
1 of 7 07/22/2016 01:31 PM
Future Systems: How HP Will Adapt The Machine... http://www.nextplatform.com/2015/08/17/future-s...
and similar workloads, which happens to be converging to HPC to a certain extent, Faraboschi Similar Vein
said, adding that ultimately, the way The Machine will be architected might not be that different
from where HPC systems are already evolving to.
This is a curve that cannot hold, obviously. We could hold back Moores Law on processors, but Steps Toward
that seems unlikely because it does not make much business sense for the chip makers or their Programmability for
system customers, who rely on upgrades every year to keep the money flowing in. Non-Volatile Memory
This is why HP has been focusing on memristors, and it is also why the company is willing to
embrace other non-volatile memories in the short-term to get The Machine off the ground.
The big problem that HP is trying to solve with The Machine is that DRAM main memory is the
only part of the system that is held captive by the processor, the only element that has not been
disaggregated. (We would argue that putting PCI-Express controllers and Ethernet controllers onto
the die is perhaps not as useful in the disaggregated and hyperconverged system world that seems
to be emerging.) Companies dont want islands of volatile DRAM where they have to move data What The HP Breakup
around all the time, and they do want to be able to scale compute, memory, I/O, networking, and Means For Enterprises
storage all independently of each other, and swap them out on different upgrade schedules that are
tuned to their own technology changes if DRAM upgrades happen less frequently than CPU
upgrades, so be it. Design a machine that can accommodate that. The hyperconvergence part
means bringing all of the exploded pieces back into a rack and allowing customers to think at the
scale of one or multiple racks, not nodes.
2 of 7 07/22/2016 01:31 PM
Future Systems: How HP Will Adapt The Machine... http://www.nextplatform.com/2015/08/17/future-s...
If you look at the trajectory of this rack-scale computing, in a couple of years we are if you look
at our current Moonshot machines, you can pack 20,000 cores into a rack with a significant
amount of storage and memory And if you look at the projects for a couple of years, you will be
able to have 100,000 cores a combination of general purpose and acceleration cores inside of a
rack. This rack is going to become the unit that people will use as the building block for an exascale
computer rather than an individual server because when you are thinking at the rack scale there is
a lot more optimization that you can do than within an individual server. The problem remains
that memory is the last resource that is not dynamically reprovisioned.
With universal memory as HP is conceiving it, The Machine is talking some bits of scale up system
design and mixing it with scale out design.
The Machine has what HP is for the moment calling a shared something architecture, and like
modern cluster architectures for data analytics, data would be replicated around the nodes The
Machine. The optical interconnect HP is creating could possibly be used to lash SoCs together into
a vast shared memory machine, but that is not the point. Such a NUMA beast would be wildly
inefficient and would spend all of its time thrashing, moving data around for nodes that would
never get any work done because they would be waiting all the time. Instead, the machine has each
SoC run its own operating system in this case, a variant of Linux and it has been tweaked to
work with this universal memory and, importantly, has had a lot of its guts removed that are no
longer necessary because the architecture is cleaner than a modern NUMA server.
The funny bit is that Faraboschi said that persistence of data in this universal memory pool and
sharing across nodes is not just about breaking down the islands of memory in modern systems.
The reason you want persistence is not because you want to store you data and pick it up a year
later this is not about capacity but you want to be able to tolerate faults, he explained. There
are several technologies on the roadmap today that will be able to tolerate node rebooting or
hardware failure so you can restart your computation.
Next year, with the first prototype of The Machine, HP will be using DRAM technology for the
memory on the SoC and as the universal memory layer. This DRAM in the universal memory layer
will be on a separate power domain from the CPUs on the SoCs, so a power outage on the CPU (for
example, during a reboot) will not cause memory loss. When a non-volatile memory layer is added
in, this is when we can expect HP to start delivering its own Linux variant. This prototype will be
3 of 7 07/22/2016 01:31 PM
Future Systems: How HP Will Adapt The Machine... http://www.nextplatform.com/2015/08/17/future-s...
based on the Apollo system enclosures, and the compute and memory are physically distributed
but logically pooled. The SoC is in the front of the mock up shown above, which has its own
private memory; eventually this will be stacked high bandwidth memory, but for now it is just
DRAM. But the architecture is not the same as a normal SoC. The difference is that we have freed
memory from the CPU, and now one CPU can issue a memory operation that traverses the bridge
chip and the fabric, including optical interconnects, and can hit memory that sits in another part of
the physical infrastructure but belongs to the same logical pool, explains Faraboschi.
If you are coming from the scale out world, it is reasonable to think of universal storage as regular
storage that is 10X to 100X faster. If you come from the scale up world, you can think of this as a
little bit slower memory, but much larger than what you are used to.
This is exactly how other alternatives, including the new Memory1 flash-based and bit addressable
DIMMs from Diablo Technologies, are being described, generally speaking. And as we pointed out,
this is how the 3D XPoint memory which could be a variant of phase-change memory is also
being characterized.
The thing to remember is that universal memory is not just memory, but storage, and that means
you can do things like replicate data for high availability, use RAID data protection on chunks of it,
do hot swapping of components, and so forth. There is no reason why it cannot run the Hadoop
Distributed File System, Lustre, or GPFS, by the way.
Because The Machine is not a scale up box, a memory error in a chunk of this universal memory
does not bring the whole machine down. The architecture is more like a shared-nothing cluster in
that if a node fails, the cluster heals around it. In the case of The Machine, a node will just point to a
replica dataset and continue processing. The code for doing this is not yet done, of course, and this
is where modifications to the Linux kernel are necessary. And Faraboschi is pulling no punches
about why he thinks Linux needs to evolve.
One is the Linux community and please forgive me if you are hardcore Linux but Linux is
usually targeted at the lowest common denominator, he said. Anything that doesnt fit that model
has been really, really hard to get in. And I am talking about things like multicore, 64 bits, and you
name it. It went there, but it took a long time. We might to evolve at a faster pace than Linux
evolves. The second reason is that when you look at the data-centric world, things are different. I
will give you an example of protection. We have this big bubble of memory, and we dont want to
trust the individual operating system to access that memory, I dont want a malicious bug in an OS
or even a bug that a programmer did to go around and trash 300 TB of memory. That would be
kind of awkward. We want protection to move to the memory side and we want the location to
move memory side, and we want the memory to include a resource handler that is trusted at the
higher level entity. This notion just does not exist in Linux. We can morph Linux by doing some
magic with the page table location, and we are doing that next year.
Am architectural hint: The Machine will have a memory broker that controls all of the memory,
and each individual compute node will have to ask permission to exchange some keys and get
access to a big chunk of memory 1 TB, 10 TB, whatever that will be mapped into the local node
memory space, and then the node will parcel it out into smaller chunks.
4 of 7 07/22/2016 01:31 PM
Future Systems: How HP Will Adapt The Machine... http://www.nextplatform.com/2015/08/17/future-s...
heard this again and again at ISC and even earlier this year when IBM told us about how memory
bandwidth and integer performance were becoming more important to exascale systems than
floating point.
The biggest reason why The Machine might appeal to HPC users is that this persistent storage will
have very high bandwidth. The one thing that nobody wants to talk about, and that is the unloved
child of HPC, is defensive I/O, said Faraboschi, referring to the checkpointing at intervals during
the simulation and the restarting of calculations once this checkpointing is done. As we move to
larger and larger systems, our meantime to interrupt (MTTI) is going to go down to numbers that
are incompatible with the rates that we can do checkpointing. In supercomputers today, the file
systems goes into 1 TB/sec just to be able to do a dump in a time that would not completely
overwhelm the computation. It will need to rise to tens of terabytes per second with larger
machines, he said something that The Machine, with the right configuration, could handle.
While Faraboschi did not say this explicitly, the architecture of that optical interconnect that
creates the universal pool of memory and that gives access to it to SoCs hanging off it, will lend
itself to MPI operations and similar global array functions used in certain HPC codes. (Much as a
theoretical exascale system proposed by Intel researchers based on a massively scaled out
architecture with hierarchical storage and simple compute nodes does.) This is technical, but
Faraboschi explains it well:
Even if you are using MPI, at least logically you have the notion of very large shared data. Many
applications use a global array abstraction (quantum mechanics simulations is one example) that
end up replicating hundreds of gigabytes of data in all of the nodes. If you had universal memory,
it could be in one place, accessible to all nodes, and you could replicate it for performance. For
reverse time migration and other types of seismic analysis, you are also forced to do data domain
decomposition because you have to make sure you bring the data close to the CPU when you want
to process it. If you have very large universal memory, you can possibly store your propagation
and accumulation data in one place and bring in all the other data you care about.
Moreover, MPI could be tweaked so that collective operations (which account for up to 80 percent
of communication on the MPI protocol, could be done as in-memory shuffles on that universal
memory. Collectives are kind of a pain for everybody, even moreso when you are doing certain
kinds of algorithms like all-gather, which unfortunately is fairly common in matrix multiplication
and sparse matrices. Now if you have universal memory, you can think of doing in-memory
shuffles. Rather than having to do complicated things like recursive doubling, you can imagine
everybody posting their data in universal memory location and everyone else pulling it out
wherever it is needed.
Similarly, one-sided communication, a feature of MPI running over networks and servers
equipped with Remote Direct Memory Access (RDMA), can be tweaked on The Machines universal
memory such that you dont actually have to move any data, but rather just change pointers in the
universal memory. You might want to move data closer to a particular compute node for
performance reasons, but you wont have to.
There are a lot of details to be worked out, and HP is looking for supercomputer centers that want
to play with The Machines architecture on prototypes based on DRAM so when the real machine
comes along, with vastly more memory and node expansion, both the company and prospective
users will have some idea how HPC codes might do on such an architecture.
Share this:
5 of 7 07/22/2016 01:31 PM
Future Systems: How HP Will Adapt The Machine... http://www.nextplatform.com/2015/08/17/future-s...
Similar Vein
Do the SOCs on the machine have to be based on just a modified Harvard architecture, or can they
be based on a stack architecture. Im referring to the Burroughs stack architecture of days past
where everything ran in a stack, data in the data stack code in the code stack, and built into and
intrinsic to the design of the stack architecture based CPUs where hardware registers that
specifically pointed to the top, and the bottom of these respective stacks, and the Burroughs CPU
had plenty of these hardware pointers/registers . Any addressing out of the bounds of these CPU
pointers, top or bottom resulted in a hardware interrupt and the MCP promptly flagging and
flushing the offending code. All this continuous need to provide the necessary security protection
by purely software means has been unsuccessful as these security problems(overflow
vulnerabilities, etc.) appear to be endemic on the modified Harvard architecture based processors
that dominate the current market. Id love to see if anyone has taken the time to research the old
stack architectures on a FPGA and maybe have done some vulnerabilities stress testing using
Burroughs extended ALGOL, or some newer objected oriented languages and compared the results
of the stack architectures versus the modified Harvard architecture. If any architecture lent itself to
running object oriented code securely it was the stack architecture.
Reply
HP has a serious credibility problem. Lots of talk, no product. I dont want to hear anything more
about that company until they are actually shipping something of interest.
Reply
6 of 7 07/22/2016 01:31 PM
Future Systems: How HP Will Adapt The Machine... http://www.nextplatform.com/2015/08/17/future-s...
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment
Name *
Email *
Website
Post Comment
7 of 7 07/22/2016 01:31 PM