Moon Kim

Subscribe to Moon Kim: eMailAlertsEmail Alerts
Get Moon Kim: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Infrastructure 2.0 Journal

Infrastructure 2.0: Article

A Mainframe Grid Computing Infrastructure

Based on the IBM Grid Toolbox

In an on-demand business, the computing environment is integrated to allow systems to be seamlessly linked across the enterprise and across its entire range of customers, partners, and suppliers. It uses open standards so that different systems can work together and link with devices and applications across organizational and geographic boundaries. It is virtualized to hide the physical resources from the application level and make the best use of technology resources thereby minimizing complexity for users.

A good example of a virtualized model is a grid computing model. In this article, we describe the design of a grid based on IBM mainframes. The system under consideration is an experimental grid built using the IBM Grid Toolbox version 3 (IGT), which is a commercial derivative of the Globus Toolkit version 3 (GT3). It provides interoperability among zSeries and S/390 processor families and creates an environment that enables the sharing of underutilized computational resources. Typical mainframe systems are designed and provisioned with an average utilization target that always reserves extra capacity for future growth and handles unexpected business-critical job load demands. The unused capacity on these systems is designated as white space and can be leveraged for on demand use.

A number of clustering solutions for zSeries and S/390 systems, such as Sysplex and OS/390 Workload Manager, have been developed to share these underutilized re-sources. However, these clustering solutions work only among coupled systems that belong to a single family of processors. The zSeries and the earlier S/390 systems belong to multiple generations of processors, such as Generation 5 (G5), Generation 6 (G6), and zServer, and a typical data center includes a combination of those systems. A method to offer interoperability among systems of different generations is needed. The design we offer provides this capability.

Mainframe systems provide a level of isolation and security that is unparalleled on other platforms. For example, the zOS supports logical partitions and guarantees there is no leakage of information or execution resources between them. This allows for the opportunity to provide better isolation between tasks - something not available in grids based on other platforms. The isolation between grid tasks is based on whatever isolation mechanism an operating system on those platforms can provide. Usually, grid tasks are run as separate processes within the operating system, thus sharing resources controlled by the OS. This situation might result in intentional or accidental exposure, or corruption of the data of one task by another task. The design we propose exploits the capability of running multiple concurrent virtual machines on zSeries and S/390 systems to further isolate task execution in separate virtual machines.

Overview of the System
Figure 1 shows an infrastructure built on the Open Grid Services Architecture (OGSA). It offers services for automated setup and management of users via the Web.

As you can see, Figure 1 is a topological view of various data centers geographically dispersed around the world. Grid users interact with the system through a grid management center. The grid management service includes a job management service, as well as general administration and user management services. Each data center includes one or more zSeries or S/390 nodes. The nodes at a given data center may be homogeneous or heterogeneous. In Figure 1, each data center has a combination of different generation machines.

Figure 2 shows the mainframe nodes are partitioned via logical partitioning. Each logical partition, or LPAR, functions as a separate system with a host operating system and one or more applications. Further, each LPAR has one or more logical processors each of which represents all or part of a physical processor allocated to the partition. The logical processors of a partition may be either dedicated to the partition so the underlying processor resource is reserved for that partition, or shared with another partition so the underlying processor resource is potentially available to another partition.

In general, one or more of the LPARs are dedicated to applications regularly hosted at the data center. However, a special partition can be created for use by the grid infrastructure using the white space. This partition, the Grid LPAR, shares its virtual processors with other partitions at a lower priority than the other LPARs. This ensures that the grid use of the node does not impede the regular operation of the mainframe; it only uses the excess capacity of the mainframe.

Each Grid LPAR runs multiple Linux Virtual Machines (VM). One of those Linux VMs, the Manager Linux VM, acts as a manager and interfaces with the grid management center. The grid management system allocates jobs to the Grid LPARs based on the resources available in that LPAR. The grid management system also migrates jobs from one machine to another when the resources available in a given node are reduced beyond the requirements of the job(s) running on the system.

A Grid Service hosting environment is configured to run in this Manager Linux VM. In our experimental system, we used the OGSI (Open Grid Services Infrastructure) runtime environment provided by the IBM Grid Toolbox. In addition to the standard grid services, a modified version of the Globus Master Managed Job Factory Service is to be deployed. Besides the Manager Linux VM, additional Linux VM instances or Job VMs are configured. The IBM Grid Toolbox with a modified Local Managed Job Factory Service is deployed on these VMs.

The Globus Toolkit and the IBM Grid Toolbox
The Globus Toolkit version 3 (GT3) is an open source, open architecture project that provides a platform for developing grid services and grid applications, as well as a grid service runtime environment based on the GGF (Global Grid Forum) OGSI standard. It also provides a set of tools for a grid administrator to manage grid systems.

IBM Grid Toolbox for Multiplatforms v3.0 provides a fully integrated alternative to the open source distribution of Globus Toolkit 3.0 with additional IBM value-adds. It provides a GGF OGSI grid service runtime environment based on the embedded WebSphere 5.0.2 server. The embedded WebSphere server provides a robust and scalable environment to run grid services. IGT also provides additional features both in usability and services. It uses a simplified wizard-based installation mechanism thereby decreasing the cost of building and deploying grid infrastructures. It also provides an information center and a rich set of tutorials and samples to assist in the design of grid infrastructure and services.

IGT also includes additional grid services, such as an enhanced Registry service, a Common Resource Management (CRM) service, and a Policy Management service. A Web-based management UI (user interface) that facilitates the task of a grid administrator is also included.

The Policy Services in the IGTs enables administrators to define a set of business goals and to enforce a set of rules that allow their grid to meet these goals. In the IBM Grid Toolbox, a policy identifies the desired outcome for the interactions between different elements in the grid environment.

Components
Most of the components within the Manager Linux VM are based on the corresponding IGT/GT3 components and services as shown in Figure 3.

The Manager VM Hosting Environment is the equivalent of the Master Hosting Environment. The Virtual Host Environment Redirector is the same as the Globus Virtual Host Environment Redirector. It accepts all incoming SOAP messages and redirects them to the appropriate Job VM. The Job VM Factory Service is a modification of the Globus Master Managed Job Factory Service. The PortType of this service is an extension of the OGSI Factory PortType. The JobVMFactory PortType has an additional shutdownService operation. The Job VM Factory Service is responsible for exposing the virtual GRAM service to the outside world. It configures the Redirector to direct createService and shutdownService calls sent to it through the Job VM controller. The Job VM Launcher is the equivalent of the Globus Hosting EnvironmentStarter service. Instead of starting a new process, it starts a new Job VM and communicates with that VM.

The grid-mapfile is used to obtain the username corresponding to a particular subject DN (distinguished name). The Job VM Controller ensures that one Job VM is run for each subject DN on a node. When a request to resolve a URL is received, the Job VM Controller searches the Job VM Registry for a corresponding subject. The Job VM Registry is a database where information about active Job VMs on the node is maintained. It replaces the jobmanPortMapping file of Globus. If an entry is found in the Job VM Registry, the target URL is constructed and returned to the Redirector. If an entry does not exist in the Job VM registry, an idle Job VM is selected and the launch module is used to allocate the required resources to it and start it up. The target URL is returned to the Redirector after ensuring the Job VM is running. The active Job VM Registry is updated with this entry.

The Job VM Launcher is a shell script that implements a sequence of operations described later. It is invoked by the Job VM Controller to allocate resources to start a new Job VM. The Job VM Stopper is a shell script that implements a sequence of operations described later. It is invoked by the Job VM Controller to stop a Job VM and deallocate resources assigned to it. The Node Resource Information Provider Service is a specialized notification service providing raw data about a resource characteristic, the file system, the host system, etc.

Most of the components in a Job VM are based on the corresponding GT3 components and services as shown in Figure 4.

The Job VM Hosting Environment is the equivalent of the Globus User Hosting Environment with the modification that it runs differently in a Linux VM than in a Manager VM Hosting Environment.

The Managed Job Factory Service (MJFS) is the same as the Globus MJFS; it is responsible for instantiating a new Managed Job Service (MJS) when it receives a Cre-ateService request. The MJFS stays up for the life of the Job VM. The MJS is the same as the Globus MJS; it is an OGSI service that, when given a job request specification can submit a job to a local scheduler, monitor its status, and send notification. The MJS will start two File Streaming Factory Services (FSFS), one for the job's standard output (stdout) and one for the job's standard error (stderr). The MJS starts the initial set of File Stream Service (FSS) instances as specified in the job specification. The FSFSs Grid Service Handles (GSH) are available in the SDE (service data element) of the MJS, which will enable the client to start additional FSS instances of stdout/err or terminate existing FSS instances. The FSS is an OGSI service that when given a destination URL will stream from the local file the factory that was created to stream (stdout or stderr) to the destination URL. The VM Resource Information Provider Service is a specialized notification service providing raw data about the VMs file system, host system, etc.

Operations
The overall sequence of operations is as follows: the Manager VM exposes the capabilities and characteristics of the resources allocated to the LPAR and the current state of the Job VMs running within the LPAR. When a new job request is received, the Manager VM allocates the necessary resources to a predefined Linux VM and starts the VM. It passes the job request to this new Linux VM and returns to the grid management service a handle to communicate with this VM. The Linux VM executes the tasks. During this period, the grid management service can query for the status of the job and can retrieve the results of the job when ready. Upon completion of the job, the Manager VM shuts down the Job VM, cleans up the used resources, and reclaims those resources.

Figure 5 further details these interactions and relates them to the grid user actions. The user submits a job request to the grid management service; the job request contains information about the program to be executed, the data that the program must operate on, and the resources needed to execute the program. The grid management service queries the Manager VMs running on the different nodes for the availability of the required resources. Each Manager VM responds with the available resources on the corresponding node that it manages. The Grid Portal selects the appropriate node and submits the job request to the corresponding Manager VM. The Manager VM activates one of the preconfigured Linux VMs on the node and allocates the necessary resources to that VM.

The Manager Job then passes the job request details to the Job VM and starts execution of the job. The Manager VM returns a handle to the Job VM to the Grid Portal. The handle enables the Grid Portal to communicate directly with the Job VM. At this point, the job submission process is completed and you are informed of the successful submission of the job.

The Job VM continues the execution of the job. At any time during this process you can query the Grid Portal about the status of the submitted job. The Grid Portal in turn will query the Job VM for the status of the job.

The Job VM notifies the Grid Portal when the job is completed. The Grid Portal retrieves the results and instructs the Manager VM on the same S/390 node to shutdown the Job VM. The Manager VM deactivates the VM, cleans up the allocated resources, and puts them in the available list. Future extensions of this design will handle more elaborate cases. For example, the user can be replaced by an automated service or program, and instead of the execution of a single job on a single node, the task at hand might require multiple jobs that run simultaneously on multiple nodes.

Starting a Job VM

The startup command for a Job VM is as follows:

rexec -l vm_userid -p vm_password vm_hostname start tar-get_userid {-mem mem_size} {-proc proc_num}

This command executes a start script, passing it the specified arguments. The first argument specifies the user ID of the target Job VM. The subsequent arguments are optional and are used to indicate that additional resources are needed to process the request. That is, the Manager VM checks the resources defined for the Job VM to ensure that there are sufficient resources to process the request. If additional resources are desired, then those resources are requested in this command. For example, -mem specifies the memory size to be allocated, and ?proc specifies the number of virtual processors to be allocated.

The start script autologs the specified user ID and issues the appropriate commands to add resources, if needed. Then it IPLs (initial program load) the Job VM. If indicated in the rexec command that resources are needed, the start script issues the appropriate commands to add those resources to the Job VM. For example, if virtual storage is to be added to the Job VM, then a DIRMAINT command with a storage operand, such as DIRM FOR userid STORAGE 1G, is issued. As another example, if the virtual machines needed the maximum virtual storage size available, then a DIRMAINT command with a max store operation, such as DIRM FOR userid MAXSTOR 2048M, is issued.

Should a virtual processor be needed, a DIRMAINT command with a CPU operand, such as DIRM FOR userid CPU cpuaddr, is provided. If a filesystem space is added, issue a DIRMAINT command with an AMDISK operand, such as DIRM FOR userid AMDISK vaddr xxx. In this case, a RACF command is also used to define the disk to RACF. Such a command includes, for instance, RAC REDFINE VMMDISK userid.vaddr OWNER(userid).

In addition to adding the resources to the Job VM, the Job VM is IPL-ed. In one example, this includes reading a name file that is kept for the Job VM instance, autologging the Job VM instance based on the information, and booting up any disks relating to that instance. This completes the startup of the Job VM.

Starting the Job VM also starts the Linux instance configured in the VM. The Linux instance is itself configured to start the Globus container with a predefined set of grid services that includes the modified LMJFS. At this point, the Job VM is operational and ready to accept jobs.

Stopping a Job VM

The shutdown command is as follows:

rexec vm_userid -p vm_password vm_hostname shutdown tar-get_userid

In response to receiving the command, a shutdown command is issued to the Job VM. Any additional resources allocated to the Job VM are returned. For example, this can be accomplished by issuing the appropriate DIRMAINT/RACF commands, which depend on the type of resources to be returned. If the resource to be returned is virtual storage, then the DIRM FOR userid STORAGE 512M command is issued to return the virtual storage level to its original amount. Similarly, if virtual processors are to be returned, then a DIRM FOR userid CPU cpuaddr DELETE command is issued to delete a virtual processor. As for the delete the filesystem space, a DIRM FOR userid DMDISK vaddr command is issued.

Additionally, cleanup of the Job VM is performed. The cleanup includes removing old files and placing the Job VM back to its original image.

Benefits
Our design addresses the need for a grid solution for the hosted environment using technologies available today. It also addresses the strategic need to converge autonomic computing, OGSA, and systems design to enhance performance with the optimization and efficiencies that help data centers reduce costs through a pooled set of resources and the ability to host capacity on-demand for a select set of grid applications. The design is set to link mainframe servers from across the company (or company to company) into one, highly utilized grid to help cut application runtime and bring the results even faster. Our design also allows reaching out and grabbing mainframe white space from around the world when needed and giving it back when it is not needed. Finally, it provides better isolation between grid jobs, thus enhancing privacy and security in a grid environment.

Increase in Utilization

Our design provides interoperability among the different processor families. It uses unused or excessive mainframe computing resources and operates independently while the system is in use. While mainframe Sysplex works within the same family of processors and operating systems, our works with all families and operating systems. Thus it goes beyond the capabilities that Sysplex and Geoplex provide in clustering S/390 systems belonging to only one family of processors.

A vast quantity of computing power is wasted due to the underutilization of resources. In general, planning and sizing for computing requirements are based on peak demand and statistically the actual utilization is in the order of 60%. Harnessing the unutilized compute power will provide immediate economic benefits to any orga-nization that has a large installed base of servers. White space is defined as the unutilized capacity or cycles on S/390 or z/Series machines. Basically, users on a VM or MVS system only use part of the maximum capacity of the systems so there is room for more workload. In our proposed design, white space is utilized by adding Linux virtual machines.

Isolation Between Jobs

In general, the isolation between grid tasks is based on whatever isolation mechanism an operating system on those platforms can provide. Usually grid tasks are run as separate processes within the operating system, thus sharing resources controlled by the OS. This situation might result in intentional or accidental exposure or corruption of the data of one task by the other task.

Our design exploits the capability of running multiple concurrent virtual machines on a LPAR of a mainframe node to execute each task in an individual virtual machine.

In our design, the isolation mechanisms of GT3 have been replaced by a different model. In GT3, the isolation is based on running a separate Java VM for each user that hosts a Local Managed Job Factory Service (LMJFS). In this design, instead of running multiple UHE instances in the same Linux instance, each UHE instance is executed in a separate Linux instance. Consequently, jobs belonging to different users never share resources.

Summary
Our design provides interoperability among the different mainframe processor families. It uses a part of the mainframe's (S/390 and zSeries system family) unused or excessive computing resources via the logical partitioning (LPAR), and operates independently while the system is in use.

The design is based on porting existing grid computing standard architecture components to the zVM virtual Linux environment, while adding automated setup/configuration features and exposing the resources available on the IBM mainframe nodes through the grid environment's information indexing infrastructure.

The porting of the Globus Toolkit to the zLinux environment has been completed. The automated configuration of each Manager Linux VM through a registration portal is accomplished. A number of business and engineering applications have been tested on the prototype system. The mechanisms to dynamically start and allocate resources to the Job VM have been designed, implemented, and tested. The next step is to integrate these mechanisms with the modified Grid Resource Management services to complete the implementation of the design.

This mainframe grid infrastructure provides IBM increased flexibility and enables the user to tap some of the data center white space computing power by interconnecting mainframes in Poughkeepsie, New York; Boulder, Colorado; London; Tokyo; and Sydney, Australia.

References

  • I. Foster, C. Kesselman, S. Tuecke, "The Anatomy of the Grid: Enabling Scalable Virtual Organizations," International Journal Supercomputer Applications, 15(3), 2001.
  • IBM Grid Toolbox version 3 for Multiplatforms: www-1.ibm.com/grid/solutions/grid_toolbox.shtml
  • Globus Toolkit Version 3: www-unix.globus.org/toolkit/download.html
  • IBM, z/Architecture Principles of Operation PDF files, SA22-7832-00 December, 2000: www-1.ibm.com/servers/eserver/zseries/zos/bkserv/zswpdf/zarchpops.html
  • I. Foster, C. Kesselman, J. Nick, S. Tuecke, "The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration," Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002, www.globus.org/research/papers/ogsa.pdf
  • I. Foster, C. Kesselman, J. Nick, S. Tuecke, "Grid Services for Distributed System Integration," IEEE Computer Magazine, 35(6), 2002.
  • GT3 GRAM Architecture: www-unix.globus.org/developer/gram-architecture.html
  • IBM zVM Directory Maintenance Facility Function Level 410 Command, Version 4, Release 3.0, SC24-60025-03, October 2002, www.vm.ibm.com/pubs/pdf/hcsl4a20.pdf
  • IBM Program Directory for Resource Access Control Facility (RACF) Feature for z/VM, Version 1 Release 10.0, May 2002, www-1.ibm.com/servers/eserver/zseries/zos/racf/pdf/RACFZVM_430.PDF
  • More Stories By Moon Kim

    Moon J. Kim is an IBM senior technical staff member and program director responsible for the development of IGA strategic infrastructures and businesses. He has developed products and solutions such as the grid technology, the advanced Web system, and broadband systems and devices. He was a key architect of memory and I/O systems and holds many patents in this area. He is an IBM Master Inventor and has published numberous books and papers.

    More Stories By Dikran S. Meliksetian

    Dikran S. Meliksetian works with the Internet Technology team at IBM and is involved in the design and development of advanced content management applications. He is a senior technical staff member and is engaged in the design and development of the IBM internal grid based on industry standards.

    More Stories By Colm Malone

    Colm Malone is a researcher in the IS department at the Thomas J. Watson Research Center. He has a BS in computer engineering from Trinity College in Dublin. He received a Chairman's Award for developing the Linux Client for e-business (an image used throughout IBM) and transitioning this effort to IBM's Global Services division. Recently his work has involved the area of Grid Computing, specifically leveraging the computing potential of the many Linux platforms within IBM.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.