Thursday, October 6, 2016

VMware Validated Design for SDDC 3.0 – Now Available!

Reposting from VMware Consulting Blog:

I mentioned all the fun details on the VMware Validated Design in my previous blog post. I am happy to report that we have just released the next revision of it, version 3.0. This takes what everyone already knew and loved about the previous version—and made it better!

In case you have not heard of VMware Validated Designs, they are a construct used to build a reference design that:

  • Is built by expert architects who have many years of experience with the products, as well as integrations
  • Allows repeatable deployment of the end solution, which has been tested to scale
  • Integrates with the development cycle, so that if an issue is identified with the integrations and scale testing, it can be quickly identified and fixed by the developers before the products are released

All in all, this is an amazing project that I am excited to have worked on, and I am happy to finally talk about it publicly!

What’s New with the VMware Validated Design for SDDC 3.0?

There are quite a lot of changes in this version of the design. I am not going to go into every detail in this blog, but here is an overview of the major ones:
  • Full Dual Region Support—Previously, in the VMware Validated Design, although there was mention made of having dual sites, there was only implementation guidance for a single site. In this release we have full guidance and support on configuring a dual region environment.
  • Disaster Recovery Guidance—With the addition of dual region support, guidance is needed for disaster recovery. This includes installation, configuration, and operational guidance for VMware Site Recovery Manager, and vSphere Replication. Operationally, plans are created to not only allow for failover and failback of the management components between sites, but also to test these plans as well.
  • Reduced minimum footprint with a 2-pod design —In the prior versions of the VMware Validated design, we focused on a 3-pod architecture.  This architecture used 12 ESXi hosts as a minimum recommended architecture:
    • 4 for management
    • 4 for compute
    • 4 for the NSX Edge cluster

      In this release the default configuration is to use a 2-pod design which collapses the compute and Edge clusters. This allows for the minimum footprint to be 8 ESXi hosts:
    • 4 for management
    • 4 for shared Edge and compute functions

      This marks a significant reduction in size for small or proof-of-concept installations, which can be later expanded to a full 3-pod design if required.
  • Updated bill of materials—The bill of materials has been updated to include new versions of many software components, including NSX for vSphere and vRealize Log Insight. In addition, Site Recovery Manager and vSphere Replication have been added to support the new design.
  • Upgrade Guidance—As a result of the upgraded bill of materials, guidance has been provided for any component which needs upgrading as a result of this revision. This guidance will continue to grow as products are released and incorporated into the design.

The good news is that the actual architecture has not changed significantly. As always, if a particular component design does not fit the business or technical requirements for whatever reason, it can be swapped out for another similar component. Remember, the VMware Validated Design for SDDC is one way of putting an architecture together that has been rigorously tested to ensure stability, scalability, and compatibility. Our design has been created to ensure the desired outcome will be achieved in a scalable and supported fashion.

Let’s take a more in-depth look at some of the changes.

Virtualized Infrastructure

The SDDC virtual infrastructure has not changed significantly. Each site consists of a single region, which can be expanded. Each region includes:

  • A management pod
  • A shared edge and compute pod

  • This is a standard design practice that has been tested in many customer environments. The following is the purpose of each pod.

    Management Pod

    Management pods run the virtual machines that manage the SDDC. These virtual machines host:
    • vCenter Server
    • NSX Manager
    • NSX Controller
    • vRealize Operations
    • vRealize Log Insight
    • vRealize Automation
    • Site Recovery Manager
    • And other shared management components

    All management, monitoring, and infrastructure services are provisioned to a vCenter Server High Availability cluster which provides high availability for these critical services. Permissions on the management cluster limit access to only administrators. This limitation protects the virtual machines that are running the management, monitoring, and infrastructure services.

    Shared Edge and Compute Pod

    The shared edge and compute pod runs the required NSX services to enable north-south routing between the SDDC and the external network and east-west routing inside the SDDC. This pod also hosts the SDDC tenant virtual machines (sometimes referred to as workloads or payloads). As the SDDC grows, additional compute-only pods can be added to support a mix of different types of workloads for different types of SLAs.

    Disaster Recovery and Data Protection

    Nobody wants a disaster to occur, but in the worst case in case something does happen, you need to be prepared. The VMware Validated Design for SDDC 3.0, includes guidance on using VMware Products and technologies for both data protection and disaster recovery.

    Data Protection Architecture

    VMware Data protection is used as a backup solution for the architecture. It allows the virtual machines involved in the solution to be backed up and restored. This allows you to meet many company policies for recovery as well as data retention. The design goes across both regions, and looks as follows:

    Disaster Recovery

    In addition to back ups, the design includes guidance on using Site Recovery Manager to back up the configuration. This includes a design that is used for both regions, and includes guidance on using vSphere Replication to replicate the data between sites. It also details how to create protection groups as well as recovery plans to ensure the management components are failed over between sites, including vRealize Automation and vRealize Operations Manager VMs where appropriate.

    The architecture is shown as follows:

    The Cloud

    Of course, no SDDC is complete without a cloud platform and the design still includes familiar guidance on installation of the cloud components as well. vRealize Automation is definitely a part of the design and has not significantly changed, other than adding multiple region support. It is a big piece but I did want to show the conceptual design of the architecture here because it provides a high level overview of the components, user types, and operations in workload provisioning.

    The beauty here is that the design has been tried and tested to scale in the Validated design. This will allow for issues to be identified and fixed before the platform has been deployed.

    Monitoring and Operational Procedures

    Finally, last but not least, what design is complete without proper monitoring and operational procedures? The VMware Validated Design for SDDC includes a great design for both vRealize Operations Manager as well as vRealize Log Insight. In addition, it also goes into all the different practices for being able to backup, restore, and operate the actual cloud that has been built. It doesn’t go as far as a formal operational transformation for the business, but it does a great job of showing many standard practices can be used as a basis for defining what you—as a business owner—need in order to operate a cloud.

    To show a bit of the design, vRealize Operations Manager contains functional elements that collaborate for data analysis and storage, and supports the creation of clusters of nodes with different roles:

    Overall, this is a really powerful platform that revolutionizes the way that you see the environment.

    Download It Now!

    Hopefully, this overview of the changes in the new VMware Validated Design for SDDC 3.0 has been useful. There is much more to the design than just the few items I’ve told you about in this blog, so I encourage you to check out the Validated Designs webpage for more details.

    In addition—if you are interested—VMware Professional Services are available to help with the installation and configuration of a VMware Validated Design as well.

    I hope this helps you in your architectural design discussions to show that integration stories are not only possible, but can make your experience deploying an SDDC much easier.

    Look for myself and other folks from the Professional Services Engineering team and Integrated Systems Business Unit from VMware at VMworld Europe. We are happy to answer any questions you have about VMware Validated Designs!

    Friday, July 22, 2016

    VMware Validated Design for SDDC 2.0 – Now Available!!

    Posted originally on the VMware Consulting blog:


    Recently I have been involved in a rather cool project inside VMware, aimed at validating and integrating all the different VMware products. The most interesting customer cases I see are related to this work because oftentimes products work independently without issuebut together can create unique problems.

    To be honest, it is really difficult to solve some of the problems when integrating many products together. Whether we are talking about integrating a ticketing system, building a custom dashboard for vRealize Operations Manager, or even building a validation/integration plan for Virtual SAN to add to existing processes, there is always the question, “What would the experts recommend?”

    The goal of this project is to provide a reference design for our products, called a VMware Validated Design. The design is a construct that:
    • Is built by expert architects who have many years of experience with the products as well as the integrations   
    • Allow repeatable deployment of the end solution, which has been tested to scale
    • Integrates with the development cycle, so if there is an issue with the integration and scale testing, it can be identified quickly and fixed by the developers before the products are released.

    All in all, this has been an amazing project that I’ve been excited to work on, and I am happy to be able to finally talk about it publicly!

    Introducing the VMware Validated Design for SDDC 2.0

    The first of these designs—under development for some time—is the VMware Validated Design for SDDC (Software-Defined Data Center). The first release was not available to the public and only internal to VMware, but on July 21, 2016, version 2.0 was released and is now available to everyone! This design builds not only the foundation for a solid SDDC infrastructure platform using VMware vSphere, Virtual SAN, and VMware NSX, but it builds on that foundation using the vRealize product suite (vRealize Operations Manager, vRealize Log Insight, vRealize Orchestrator, and vRealize Automation).

    The VMware Validated Design for SDDC outcome requires a system that enables an IT organization to automate the provisioning of common, repeatable requests and to respond to business needs with more agility and predictability. Traditionally, this has been referred to as Infrastructure-as-a-Service (IaaS); however, the VMware Validated Design for SDDC extends the typical IAAS solution to include a broader and more complete IT solution.

    The architecture is based on a number of layers and modules, which allows interchangeable components to be part of the end solution or outcome, such as the SDDC. If a particular component design does not fit the business or technical requirements for whatever reason, it should be able to be swapped out for another similar component. The VMware Validated Design for SDDC is one way of putting an architecture together that has been rigorously tested to ensure stability, scalability, and compatibility. Ultimately, however, the system is designed to ensure the desired outcome will be achieved.

    The conceptual design is shown in the following diagram:

    As you can see, the design brings a lot more than just implementation details. It includes many common “day two” operational tasks such as management and monitoring functions, business continuity, and security.

    To simplify such a complex design, it has been broken up into:
    • A high-level Architecture Design
    • A Detailed Design with all the design decisions included
    • Implementation guidance.

    Let’s take an in-depth look.

    Virtualized Infrastructure

    The SDDC virtual infrastructure consists of a single region, which can be expanded. Each region includes a management pod, an edge pod, and a compute pod.

    This is a standard design practice and has been tested in many customer environments. The purpose of each pod is as follows.

    Management Pod

    Management pods run the virtual machines that manage the SDDC. These virtual machines host vCenter Server, NSX Manager, NSX Controller, vRealize Operations, vRealize Log Insight, vRealize Automation, Site Recovery Manager, and other shared management components. All management, monitoring, and infrastructure services are provisioned to a vCenter Server High Availability cluster, which provides high availability for these critical services. Permissions on the management cluster limit access to administrators only. This protects the virtual machines running the management, monitoring, and infrastructure services.

    Edge Pod

    Edge pods provide these main functions:

    • Support on-ramp and off-ramp connectivity to physical networks
    • Connect with VLANs in the physical world
    • Optionally host centralized physical services
    Edge pods connect the virtual networks (overlay networks) provided by NSX for vSphere and the external networks. Using edge pods reduces costs and scales well as demands for external connectivity change.

    Compute Pod

    Compute pods host the SDDC tenant virtual machines (sometimes referred to as workloads or payloads). An SDDC can mix different types of compute pods and provide separate compute pools for different types of SLAs. 

    Software-Defined? Yes, please! (Virtual SAN and VMware NSX Included) 

    As a part of the above design, you can see that it is truly software defined with both VMware NSX and Virtual SAN parts of the design. I am not going to lie, I am passionate about Virtual SAN as I have been working with it for some time and, to be frank, it is amazing. Here are some details about the design for Virtual SAN and NSX pieces that are included in the design:

    Virtual SAN

    Virtual SAN is a new technology compared to vSphere. Over the releases, some amazing features have been added, and it is included here due to the benefits it gives to the operational structure. The shared storage design selects the appropriate storage device for each type of cluster:
    • Management clusters use Virtual SAN for primary storage and NFS for secondary storage.
    • Edge clusters use Virtual SAN storage.
    • Compute clusters can use FC/FCoE, iSCSI, NFS, or Virtual SAN storage. At this stage, this design gives no specific guidance for the compute cluster.
    This allows for flexibility rather than a blanket solution for each cluster. The following depicts the logical design:

    VMware NSX

    The VMware Validated Design for SDDC implements software-defined networking by using VMware NSX for vSphere. What I like a lot about NSX is that in much the same way server virtualization revolutionized how Virtual Machines are managed, it is doing the same thing for virtual networks..

    This results in a transformative approach to networking that not only enables data center managers to achieve orders of magnitude better agility and economics, but also supports a vastly simplified operational model for the underlying physical network. NSX for vSphere is a non-disruptive solution because it can be deployed on any IP network, including existing traditional networking models and next-generation fabric architectures, from any vendor.

    The design looks like the following:

    From my experience, when administrators provision workloads, network management is one of the most time-consuming tasks. Most of the time spent provisioning networks is consumed configuring individual components in the physical infrastructure and verifying that network changes do not affect other devices that are using the same networking infrastructure.

    The need to pre-provision and configure networks is a major constraint to cloud deployments where speed, agility, and flexibility are critical requirements. Pre-provisioned physical networks allow for the rapid creation of virtual networks and faster deployment times of workloads utilizing the virtual network. This works well as long as the physical network you need is already available on the host where the workload is to be deployed. However, if the network is not available on a given host, you must find a host with the available network and spare capacity to run your workload in your environment.

    Getting around this bottleneck requires a decoupling of virtual networks from their physical counterparts. This, in turn, requires that you programmatically recreate all physical networking attributes required by workloads in the virtualized environment. You can provision networks more rapidly because network virtualization supports the creation of virtual networks without modification of the physical network infrastructure. 

    The Cloud

    Of course, no SDDC is complete without a cloud platform. vRealize Automation is definitely a part of the design. It is a big piece, so I wanted to show the conceptual design of the architecture here because it provides a high-level overview of the components, user types, and operations in workload provisioning.

    For anyone who is unfamiliar with it, the Cloud Management Platform consists of the following design element and components.

    Design Element
    Design Components
    ·       Cloud administrators: Tenant, group, fabric, infrastructure, service, and other administrators as defined by business policies and organizational structure. 
    ·       Cloud (or tenant) users: Users within an organization who can provision virtual machines and directly perform operations on them at the operating system level.
    Tools and supporting infrastructure
    Building blocks that provide the foundation of the cloud:
    ·       VM templates and blueprints: VM templates are used to author the blueprints that tenants (end users) use to provision their cloud workloads.
    Provisioning infrastructure
    On-premises and off-premises resources, which together form a hybrid cloud:
    ·       Internal Virtual Resources: Supported hypervisors and associated management tools
    ·       External Cloud Resources: Supported cloud providers and associated APIs
    Cloud management portal
    A portal that provides self-service capabilities for users to administer, provision and manage workloads:
    ·       vRealize Automation portal, Admin access: The default root tenant portal URL used to set up and administer tenants and global configuration options.
    ·       vRealize Automation portal, Tenant access: Refers to a subtenant and is accessed using an appended tenant identifier.

    The advantage here is that it has been tried, tested, and loaded into the validated design to ensure issues are correctly identified and fixed before the platform is deployed.

    Monitoring and Operational Procedures

    Finally, having new monitoring and operational procedures in place is becoming a hard requirement for many businesses. The VMware Validated Design for SDDC includes a great design for both vRealize Operations Manager as well as vRealize Log Insight. In addition, it goes into all the different practices to back up, restore, and operate the actual cloud that has been built. It doesn’t go as far as a formal operational transformation for the business, but it does a great job showing many standard practices that can be used as a basis for defining what you, as a business owner, need in order to operate the cloud.

    The following illustrates part of the design showing how vRealize Operations Manager contains functional elements that collaborate for data analysis and storage, and support creating clusters of nodes with different roles: 

    Overall, this is a really powerful platform that will revolutionize the way you see the environment.

    Download It Now!

    Of course there is much more to the design than just the few pieces I have mentioned, but I encourage you to look here for more details. To download documentation, visit: If you are interested, VMware Professional Services are also available to help with the installation and configuration of VMware Validated Design as well.

    I look forward to future updates that further expand this design (including use cases that allow for granular customization of the design), and also for other designs that address different IT outcomes. Look for those being released, as well.

    I hope this helps you during your architectural design discussions and has demonstrated that the integration story is not only possible, but can make your experience deploying an SDDC much easier.

    Look for me and other folks on the VMware Professional Services Engineering team as well as the Integrated Systems Business Unit at VMworld, as well as other customer events such as vMUGs and vForums. We are happy to answer any questions you may have about the VMware Validated Designs!

    Thursday, May 5, 2016

    Virtualization and VMware Virtual SAN … the Old Married Couple

    Don’t Mistake These Hyper-Converged Infrastructure Technologies as Mutually Exclusive



    I have not posted many blogs recently as I’ve been in South Africa. I have however been hard at work on the latest release of VMware vSphere 6.0 Update 2 and VMware Virtual SAN 6.2. Some amazing features are included that will make life a lot easier and add some exciting new functionality to your hyper-converged infrastructure. I will not get into these features in this post, because I want to talk about one of the bigger non-technical questions that I get from customers and consultants alike. This is not one that is directly tied to the technology or architecture of the products. It is the idea that you can go into an environment and just do Virtual SAN, which from my experience is not true. I would love to know if your thoughts and experiences have shown you the same thing.

    Let me first tell those of you who are unaware of Virtual SAN that I am not going to go into great depth about the technology. The key is that, as a platform, it is hyper-converged, meaning it is included with the ESXi hypervisor. This makes it radically simple to actually configure—and, more importantly, use—once it is up and running.

    My hypothesis is that 80 to 90% of what you have to do to design for Virtual SAN focusses on the Virtualization design, and not so much on Virtual SAN.  This is not to say the Virtual SAN design is not important, but virtualization has to be integral to the design when you are building for it. To prove this, take a look at what the standard tasks are when creating the design for the environment:

    1. Hardware selection, racking, configuration of the physical hosts
    2. Selection and configuration of the physical network
    3. Software installation of the VMware ESXi hosts and VMware vCenter server
    4. Configuration of the ESXi hosts
      • Networking (For management traffic, and for VMware vSphere vMotion, at a minimum)
      • Disks
      • Features (VMware vSphere High Availability, VMware vSphere Distributed Resource Scheduler, VMware vSphere vMotion, at a minimum)
    5. Validation and testing of the configuration
    If I add the Virtual SAN-specific tasks in, you have a holistic view of what is required in most greenfield configurations:

    1. Configuration of the Virtual SAN network 
    2. Turning on Virtual SAN 
    3. Creating new policies (optional, as the default is in place once configured)
    4. Testing Virtual SAN
    As you can see, my first point shows that the majority of the work is actually virtualization and not Virtual SAN. In fact, as I write this, I am even more convinced of my hypothesis. The first three tasks alone are really the heavy hitters for time spent. As a consultant or architect, you need to focus on these tasks more than anything. Notice above where I mention “configure” in regards to Virtual SAN, and not installation; this is because it is already a hyper-converged element installed with ESXi. Once you get the environment up and running with ESXi hosts installed, Virtual SAN needs no further installation, simply configuration. You turn it on it with a simple wizard, and, as long as you have focused on the supportability of the hardware and the underlying design, you will be up and running quickly. Virtual SAN is that easy.

    Many of the arguments I get are interesting as well. Some of my favourites include:

    • “The customer has already selected hardware.”
    • “I don’t care about hardware.”
    • “Let’s just assume that the hardware is there.”
    • “They will be using existing hardware.”
    My response is always that you should care a great deal about the hardware. In fact, this is by far the most important part of a Virtual SAN engagement. With Virtual SAN, if the hardware is not on the VMware compatibility list, then it is not supported. By not caring about hardware, you risk data loss and the loss of all VMware support.

    If the hardware is already chosen, you should ensure that the hardware being proposed, added, or assumed as in place is proper. Get the bill of materials or the quote, and go over it line-by-line if that’s what’s needed to ensure that it is all supported.

    Although the hardware selection is slightly stricter than with an average design, it is much the same as any traditional virtualization engagement in how you come to the situation. Virtual SAN Ready nodes are a great approach and make this much quicker and simpler, as they offer a variety of pre-configured hardware to meet the needs of Virtual SAN. Along with the Virtual SAN TCO Calculator it makes the painful process of hardware selection a lot easier.

    Another argument I hear is “If I am just doing Virtual SAN, that is not enough time.”

    Yes, it is. It really, really is. I have been a part of multiple engagements for which the first five tasks above are already completely done. All we have to do is come in and turn on Virtual SAN. In Virtual SAN 6.2, this is made really easy with the new wizard:

    Even with the inevitable network issues (not lying here; every single time there is a problem with networking), environmental validation, performance testing, failure testing, testing virtual machine creation workflows, I have never seen it take more than a week to do this piece for a single cluster regardless of size of configuration. In many cases, after three days, everything is up and running and it is purely customer validation that is taking place. As a consultant or architect, don’t be afraid of the questions customers ask in regards to performance and failures. Virtual SAN provides mechanisms to easily test the environment as well as see as what “normal” is.

    Finally, here are two other arguments I hear frequently:

    • “We have never done this before.”
    • “We don’t have the skillset.”
    These claims are probably not 100% accurate. If you have used VMware, or you are a VMware administrator, you are probably aware of the majority of what you have to do here. For Virtual SAN, specifically, this is where the knowledge needs to be grown. I suggest a training, or a review of VMworld presentations for Virtual SAN, to get familiar with this piece of technology and its related terminology. VMware offers training that will get you up to speed on hyper-converged infrastructure technologies, and the new features of VMware vSphere 6.0 Update Manager 2 and Virtual SAN 6.2.
    For more information about free learnings, check out the courses below:

    In addition, most of the best practices you will see are not unfamiliar since they are vCenter- or ESXi-related. Virtual SAN Health gives an amazing overview that is frequently refreshed, so any issues you may be seeing are reported here; this also takes a lot of the guess work out of the configuration tasks as you can see from the screenshot below, as many, if not all of, the common misconfigurations are shown.

    In any case, I hope I have made the argument that Virtual SAN is mostly a virtualization design that just doesn’t use traditional SANs for storage.  Hyper-converged infrastructure is truly bringing change to many customers. This is, of course, just my opinion, and I will let you judge for yourself.

    Virtual SAN has quickly become one of my favourite new technologies that I have worked with in my time at VMware, and I am definitely passionate about people using it to change the way they do business. I hope this helps in any engagements that you are planning as well as to prioritize and give a new perspective to how infrastructure is being designed.

    Wednesday, January 27, 2016

    A constant state of change...

    I struggled with whether I wanted to write something formally on this or not, but the benefits I think outweigh the drawbacks at this point.  As I am sure has flooded most people's facebook in the last couple of days if you know me, as well as the news in the IT industry if not, that there have been lots of changes in the industry due to a multitude of factors, economy, strategy, consolidation, the market being bullish, or whatnot.

    Whether there is truth to this or not really makes no difference.  The fact is that the office that I have personally worked out of for over 10 years now, was announced as being closed after a 12 year run. I have been a part of this community at my office for the majority of my IT career, minus a couple of years out of college at Microsoft.  I call it a community because that is truly what it came to be to me after all these years....definitely much more than a workplace. Out of the staff that was here, the majority of people were let go as a result of this action. I don't know any details but from what I have heard everyone was treated incredibly well, which is great to hear.

    For me personally, I was lucky enough to have the opportunity to be able to move to a team around three years ago now, which is remote home office. Although I did still maintain a desk at the office, it also means that I maintained a lot of the relationships that I had prior to moving. I get to move home, and have the shortest commute ever 15 feet from my bed. Seeing all of this happen in front of me was by far one of the hardest things I have gone through in my career.  I honestly think that for myself I have a case of survivors guilt as a result, however the initial shock is gone and I was finally able to get a full nights sleep last night. I am sure I will be in need of human interaction sooner rather than later.

    I know that there have been way to many inspirational quotes posted at least to my feeds, so I won't preach.  What I will say is that any time I have seen an event like this that although it is painful at the time it ends up working out.  Take some time to let it sink in, recharge, and use the amazing amount of knowledge that you gained from the experiences to go forward.

    To everyone from the office, I will miss everything from the numerous charity events we did together as a community - Big Bike being one of our favourite:

    To things such as winning Diane the Dragon as a team at our last Christmas party:

    I definitely will miss it all.  I am sure that we will remain friends long after this week...I learned so much from some of you which has lead to myself growing personally and technically.  I wish everyone the best...and never stop growing!

    So I guess I will end this post by saying ... "So Long, and Thanks for all the Fish!"

    Tuesday, January 19, 2016

    Virtual SAN Stretch Clusters – Real World Design Practices (Part 2)

    This is the second part of a two blog series as there was just too much detail for a single blog. For part 1 see (

    As I mentioned at the beginning of the last blog, I want to start off by saying that all of the details here are based on my own personal experiences. It is not meant to be a comprehensive guide to setting up stretch clustering for Virtual SAN, but rather a set of pointers to show the type of detail most commonly asked for. 
    Hopefully it will help you prepare for any projects of this type.

    Continuing on with the configuration, the next set of questions regarded networking!

    Networking, Networking, Networking

    With sizing and configuration behind us, the next step was to enable Virtual SAN and set up the stretch clustering. As soon as we turned it on, however, we got the infamous “Misconfiguration Detected” message for the networking.

    In almost all engagements I have been a part of, this has been a problem, even though the networking team said it was already set up and configured. This always becomes a fight, but it gets easier with the new Health UI Interface and multicast checks. Generally, when multicast is not configured properly, you will see something similar to the screenshot shown below.

    It definitely makes the process of going to the networking team easier. The added bonus is there are no messy command line syntaxes needed to validate the configuration. I can honestly say the health interface for Virtual SAN is one of the best features introduced for Virtual SAN!

    Once we had this configured properly the cluster came online and we were able to configure the cluster, including stretch clustering, the proper vSphere high availability settings and the affinity rules.

    The final question that came up on the networking side was about the recommendation that L3 is the preferred communication mechanism to the witness host. The big issue when using L2 is the potential that traffic could be redirected through the witness in the case of a failure, which has a substantially lower bandwidth requirement. A great description of this concern is in the networking section of the Stretched Cluster Deployment Guide.

    In any case, the networking configuration is definitely more complex in stretched clustering because the networking across multiple sites. Therefore, it is imperative that it is configured correctly, not only to ensure that performance is at peak levels, but to ensure there is no unexpected behavior in the event of a failure. 

    High Availability and Provisioning

    All of this talk finally led to the conversation about availability. The beautiful thing about Virtual SAN is that with the “failures to tolerate” setting, you can ensure there are between one and three copies of the data available, depending on what is configured in the policy. Gone are the long conversations of trying to design this into a solution with proprietary hardware or software.

    A difference with  stretch clustering is that the maximum “failures to tolerate” is one. This is because we have three fault domains: the two sites and the witness. Logically, when you look at it, it makes sense: more than that is not possible with only three fault domains. The idea here is that there is a full copy of the virtual machine data at each site. This allows for failover in case an entire site fails as components are stored according to site boundaries.

    Of course, high availability (HA) needs to be aware of this. The way this is configured from a vSphere HA perspective is to assign the percentage of cluster resources allocation policy and set both CPU and memory to 50 percent:

    This may seem like a LOT of resources, but when you think of it from a site perspective, it makes sense; if you have an entire site fail, resources in the failed site will be able to restart without issues.

    The question came up as to whether or not we allow more than 50 percent to be assigned. Yes, we can set it to use more than half consumed, but there might be an issue if there is a failure, as all virtual machines may not start back up. This is why it is recommended that 50 percent of resources be reserved. If you do want to configure a utilization of more than 50 percent of the resources for virtual machines, it is still possible, but not recommended. This configuration generally consists of setting a priority on the most important virtual machines so HA will start up as many as possible, starting with the most critical ones. Personally, I recommend not setting above 50 percent for a stretch cluster.

    An additional question came up about using host and virtual machine affinity rules to control the placement of virtual machines. Unfortunately, the assignment to these groups is not easy during provisioning process and did not fit easily into the virtual machine provisioning practices that were used in the environment. vSphere Distributed Resource Scheduler (DRS) does a good job ensuring balance, but more control was needed rather than just relying on DRS to balance the load. The end goal was that during provisioning, placement in the appropriate site could be done automatically by staff.

    This discussion boiled down to the need for a change to provisioning practices. Currently, it is a manual configuration change, but it is possible to use automation such as vRealize Orchestrator to automate deployment appropriately. This is something to keep in mind when working with customers to design a stretch cluster, as changes to provisioning practices may be needed.

    Failure Testing

    Finally, after days of configuration and design decisions, we were ready to test failures. This is always interesting because the conversation always varies between customers. Some require very strict testing and want to test every scenario possible, while others are OK doing less. After talking it over we decided on the following plan:
    • Host failure in the secondary site
    • Host failure in the primary site
    • Witness failure (both network and host)
    • Full site failure
    • Network failures
      • Witness to site
      • Site to site
    • Disk failure simulation
    • Maintenance mode testing 

    This was a good balance of tests to show exactly what the different failures look like. Prior to starting, I always go over the health status windows for Virtual SAN as it updates very quickly to show exactly what is happening in the cluster.

    The customer was really excited about how seamlessly Virtual SAN handles errors. The key is to operationally prepare and ensure the comfort level is high with handling the worst-case scenario. When starting off, host and network failures are always very similar in appearance, but showing this is important; so I suggested running through several similar tests just to ensure that tests are accurate.

    As an example, one of the most common failure tests requested (which many organizations don’t test properly) is simulating what happens if a disk fails in a disk group. Simply pulling a disk out of the server does not replicate what would happen if a disk actually fails, as a completely different mechanism is used to detect this. You can use the following commands to properly simulate a disk actually failing by injecting an error.  Follow these steps:
      1. Identify the disk device in which you want to inject the error. You can do this by using a combination of the Virtual SAN Health User Interface, and running the following command from an ESXi host and noting down the naa.<ID> (where <ID> is a string of characters) for the disk:

        esxcli vsan storage list

      2. Navigate to /usr/lib/vmware/vsan/bin/ on the ESXi host.
      3. Inject a permanent device error to the chosen device by running:

        python vsanDiskFaultInjection.pyc -p -d <>

      4. Check the Virtual SAN Health User Interface. The disk will show as failed, and the components will be relocated to other locations.
      5. Once the re-sync operations are complete, remove the permanent device error by running:

        python vsanDiskFaultInjection.pyc -c -d <>

      6. Once completed, remove the disk from the disk group and uncheck the option to migrate data. (This is not a strict requirement because data has already been migrated as the disk officially failed.)
      7. Add the disk back to the disk group.
      8. Once this is complete, all warnings should be gone from the health status of Virtual SAN.

        : Be sure to acknowledge and reset any alarms to green.

    After performing all the tests in the above list, the customer had a very good feeling about the Virtual SAN implementation and their ability to operationally handle a failure should one occur.

    Performance Testing

    Last, but not least, was performance testing. Unfortunately, while I was onsite for this one, the 10G networking was not available. I would not recommend using a gigabit network for most configurations, but since we were not yet in full production mode, we did go through many of the performance tests to get a baseline. We got an excellent baseline of what the performance would look like with the gigabit network.

    Briefly, because I could write an entire book on performance testing, the quickest and easiest way to test performance is with the Proactive Tests menu which is included in Virtual SAN 6.1:

    It provides a really good mechanism to test different types of workloads that are most common – all the way from a basic test, to a stress test. In addition, using IOmeter for testing (based on environmental characteristics) can be very useful. 

    In this case, to give you an idea of performance test results, we were pretty consistently getting a peak of around 30,000 IOPS with the gigabit network with 10 hosts in the cluster. Subsequently, I have been told that once the 10G network was in place, this actually jumped up to a peak of 160,000 IOPS for the same 10 hosts. Pretty amazing to be honest.

    I will not get into the ins and outs of testing, as it very much depends on the area you are testing. I did want to show, however, that it is much easier to perform a lot of the testing this way than it was using the previous command line method. 

    One final note I want to add in the performance testing area is that one of the key things (other than pure “my VM goes THISSSS fast” type tests), is to test the performance of rebalancing in the case of maintenance mode, or failure scenarios. This can be done from the Resyncing Components Menu: 

    Boring by default perhaps, but when you either migrate data in maintenance mode, or change a storage policy, you can see what the impact will be to resync components. It will either show when creating an additional disk stripe for a disk, or when fully migrating data off the host when going into maintenance mode. The compliance screen will look like this:

    This represents a significant amount of time, and is incredibly useful when testing normal workloads such as when data is migrated during the enter maintenance mode workflow. Full migrations of data can be incredibly expensive, especially if the disks are large, or if you are using gigabit rather than 10G networks. Oftentimes, convergence can take a significant amount of time and bandwidth, so this allows customers to plan for the amount of data to be moved while in or maintenance mode, or in the case of a failure.

    Well, that is what I have for this blog post. Again, this is obviously not a conclusive list of all decision points or anything like that; it’s just where we had the most discussions that I wanted to share. I hope this gives you an idea of the challenges we faced, and can help you prepare for the decisions you may face when implementing stretch clustering for Virtual SAN. This is truly a pretty cool feature and will provide an excellent addition to the ways business continuity and disaster recovery plans can be designed for an environment.

    Thursday, January 7, 2016

    Virtual SAN Stretch Clusters – Real World Design Practices (Part 1)

    (Also available on the VMware Consulting Blog:

    This is part one of a two blog series as there was just too much detail for a single blog. I want to start off by saying that all of the details here are based on my own personal experiences. It is not meant to be a comprehensive guide for setting up stretch clustering for Virtual SAN, but a set of pointers to show the type of detail that is most commonly asked for. Hopefully it will help prepare you for any projects that you are working on.

    Most recently in my day-to-day work I was asked to travel to a customer site to help with a Virtual SAN implementation. It was not until I got on site that I was told that the idea for the design was to use the new stretch clustering functionality that VMware added to the Virtual SAN 6.1 release. This functionality has been discussed by other folks in their blogs, so I will not reiterate much of the detail from them here. In addition, the implementation is very thoroughly documented by the amazing Cormac Hogan in the Stretched Cluster Deployment Guide.

    What this blog is meant to be is a guide to some of the most important design decisions that need to be made. I will focus on the most recent project I was part of; however, the design decisions are pretty universal. I hope that the detail will help people avoid issues such as the ones we ran into while implementing the solution.

    A Bit of Background

    For anyone not aware of stretch clustering functionality, I wanted to provide a brief overview. Most of the details you already know about Virtual SAN still remain true. What it really amounts to is a configuration that allows two sites of hosts connected with a low latency link to participate in a virtual SAN cluster, together with an ESXi host or witness appliance that exists at a third site. This cluster is an active/active configuration that provides a new level of redundancy, such that if one of the two sites has a failure, the other site will immediately be able to recover virtual machines at the failed site using VMware High Availability. 

    The configuration looks like this:

    This is accomplished by using fault domains and is configured directly from the fault domain configuration page for the cluster: 

    Each site is its own fault domain which is why the witness is required. The witness functions as the third fault domain and is used to host the witness components for the virtual machines in both sites. In Virtual SAN Stretched Clusters, there is only one witness host in any configuration. 

    For deployments that manage multiple stretched clusters, each cluster must have its own unique witness host. 

    The nomenclature used to describe a Virtual SAN Stretched Cluster configuration is X+Y+Z, where X is the number of ESXi hosts at data site A, Y is the number of ESXi hosts at data site B, and Z is the number of witness hosts at site C. 

    Finally, with stretch clustering, the current maximum configuration is 31 nodes (15 + 15 + 1 = 31 nodes). The minimum supported configuration is 1 + 1 + 1 = 3 nodes. This can be configured as a two-host virtual SAN cluster, with the witness appliance as the third node.

    With all these considerations, let’s take a look at a few of the design decisions and issues we ran into.

    Hosts, Sites and Disk Group Sizing

    The first question that came upas it almost always doesis about sizing. This customer initially used the Virtual SAN TCO Calculator for sizing and the hardware was already delivered. Sounds simple, right? Well perhaps, but it does get more complex when talking about a stretch cluster. The questions that came up regarded the number of hosts per site, as well as how the disk groups should be configured. 

    Starting off with the hosts, one of the big things discussed was the possibility of having more hosts in the primary site than in the secondary. For stretch clusters, an identical number of hosts in each site is a requirement. This makes it a lot easier from a decision standpoint, and when you look closer the reason becomes obvious: with a stretched cluster, you have the ability to fail over an entire site. Therefore, it is logical to have identical host footprints. 

    With disk groups, however, the decision point is a little more complex. Normally, my recommendation here is to keep everything uniform. Thus, if you have 2 solid state disks and 10 magnetic disks, you would configure 2 disk groups with 5 disks each. This prevents unbalanced utilization of any one component type, regardless of whether it is a disk, disk group, host, network port, etc. To be honest, it also greatly simplifies much of the design, as each host/disk group can expect an equal amount of love from vSphere DRS. 

    In this configuration, though, it was not so clear because one additional disk was available, so the division of disks cannot be equal. After some debate, we decided to keep one disk as a “hot spare,” so there was an equal number of disk groups—and disks per disk group—on all hosts. This turned out to be a good thing; see the next section for details.

    In the end, much of this is the standard approach to Virtual SAN configuration, so other than site sizing, there was nothing really unexpected. 

    Booting ESXi from SD or USB 

    I don’t want to get too in-depth on this, but briefly, when you boot an ESXi 6.0 host from a USB device or SD card, Virtual SAN trace logs are written to RAMdisk, and the logs are not persistent. This actually serves to preserve the life of the device as the amount of data being written can be substantial. When running in this configuration these logs are automatically offloaded to persistent media during shutdown or system crash (PANIC). If you have more than 512 GB of RAM in the hosts, you are unlikely to have enough space to store this volume of data because these devices are not generally this large. Therefore, logs, Virtual SAN trace logs, or core dumps may be lost or corrupted because of insufficient space, and the ability to troubleshoot failures will be greatly limited.

    So, in these cases it is recommended to configure a drive for the core dump and scratch partitions. This is also the only supported method for handling Virtual SAN traces when booting an ESXi from a USB stick or SD card. 

    That being said, when we were in the process of configuring the hosts in this environment, we saw the “No datastores have been configured” warning message pop up – meaning persistent storage had not been configured. This triggered the whole discussion; the error is similar to the one in the vSphere Web Client in this screenshot:

    In the vSphere Client, this error also comes up when you click to the Configuration tab:

    The spare disk turned out to be useful because we were able to use it to configure the ESXi scratch dump and core dump partitions. This is not to say we were seeing crashes, or even expected to; in fact, we saw no unexpected behavior in the environment up to this point. Rather, since this was a new environment, we wanted to ensure we’d have the ability to quickly diagnose any issue, and having this configured up-front saves significant time in support. This is of course speaking from first-hand experience.

    In addition, syslog was set up to export logs to an external source at this time. Whether using the syslog service that is included with vSphere, or vRealize Log Insight (amazing tool if you have not used it), we were sure to have the environment set up to quickly identify the source of any problem that might arise. 

    For more details on this, see the following KB articles for instructions:

    I guess the lesson here is that when you are designing your virtual SAN cluster, make sure you remember that having persistence available for logs, traces and core dumps is a best practice. If you have a large memory configuration, this is the easiest way to install ESXi and the scratch/core dump partitions to a hard drive. This also simplifies post-installation tasks, and will ensure you can collect all the information support might require to diagnose issues.

    Witness Host Placement

    The witness host was the next piece we designed. Officially, the witness must be in a distinct third site in order to properly detect failures. It can either be a full host or a virtual appliance residing outside of the virtual SAN cluster. The cool thing is that if you use an appliance, it actually appears differently in the Web client:

    For the witness host in this case, we decided to use the witness appliance rather than a full host. This way, it could be migrated easily because the networking was not set up to the third site yet. As a result, for the initial implementation while I was onsite, the witness was local to one of the sites, and would be migrated as soon as the networking was set up. This is definitely not a recommended configuration, but for testing—or for a non-production proof-of-concept—it does work. Keep in mind, that a site failure may not be properly detected unless the cluster is properly configured. 

    With this, I conclude Part 1 of this blog series; hopefully, you have found this useful. Stay tuned for Part 2!