Thursday, December 4, 2014

Software-Defined Storage – A new way of thinking

There is certainly a ton of different blogs out there that talk about Software-Defined Storage, and in particular Virtual SAN. My goal in this post is not to rehash much of the same old information, but to provide insights to my experiences. 

Most recently I was challenged with getting up to speed with Virtual SAN and developing an architecture design for it. Having only heard the marketing details at first it seemed pretty intimidating however it truly does live up to all the hype about being “radically simple”. It most definitely changed the way that I thought about storage. What I have found is that the more that I work with Virtual SAN the less concerned I become with the underlying storage. 

This is a bit foreign for me because if being a part of support has taught me anything, (I mean other than following instructions to the letter is important), the biggest lesson I can teach is that for the best performance of your environment the array needs to be correctly configured. All sorts of issues can occur otherwise. 

To this end, I remember first being skeptical because focusing on policies seemed so foreign. After having used it and tested it in customer environments, I can honestly say that my mind was very much changed at the absolute power that it gives an administrator. I say this as if it is something that happened in an instant, however this all happened over the course of a couple of months. At the time, I was involved with several customer projects with it and I saw that in every case there was a distinct set of things that always happen. From these experiences I was able to build the following workflow, which could be used in working through a Virtual SAN design


I say this as if it is something that happened in an instant, however this all happened over the course of a couple of months. At the time, I was involved with several customer projects with it and I saw that in every case there was a distinct set of things that always happen. From these experiences I was able to build the following workflow, which could be used in working through a Virtual SAN design:


In looking further at it I generally break this flow chart down into a couple of different areas:
  1. Hardware Selection – In absolutely every environment I have worked in there has always been a hardware problem. I would guess that 75% of the problems I have seen in implementing Virtual SAN have been as a result of hardware selection or configuration. This includes things such as non-supported devices or incorrect firmware/drivers.

    Note: VMware does not provide support for devices that are not on the Virtual SAN Compatibility List.  Be sure that when selecting hardware that it is on the list!
  2. Software Configuration – The configuration is simple, rarely have I seen questions on actually turning it on. You merely click a check box, and it will configure itself (assuming of course that the underlying configuration is correct). If it is not the result can be mixed for example if the networking is not configured correctly, or if the disks have not been presented properly.  
  3. Storage Policy – The storage policy is at first a huge decision point. This is what gives Virtual SAN its power, the ability to configure what happens with the Virtual Machine for performance and availability characteristics. 
  4. Monitoring / Performance Testing / Failure Testing – the final area, is in regards to how you are supposed to monitor, and test the configuration.  


All of these things should really be taken into account in any design for Virtual SAN, or the design is not really complete. Now, I could talk through a lot of this for hours. Rather than doing that I thought it would be better to post my three top gotchas and lessons learned from the projects I have been involved with.  

Top 3 Gotchas from PSE

Here are my top three gotchas that I have run into with Virtual SAN: 

  1. Network Configuration – No matter what the networking team says, always validate the configuration.  The “Misconfiguration detected” error, is by far the most common thing I have seen:



    Normally this means that either the port group has not been successfully configured for Virtual SAN or that Multicast has not been setup properly. If I were to guess, most of the issues I have seen are as a result of multicast setup.  On Cisco switches, unless an IGMP Snooping Carrier has been configured OR IGMP snooping has been explicitly disabled on the ports used for Virtual SAN it will generally fail. Having it in the default configuration means that it is simply not configured and therefore even if the network admin says that it is configured properly double check it to avoid any pain.
  2. Network Speed – Although 1GB networking is supported and I have seen it operate effectively for small environments, 10GB networking is highly recommended for most configurations. I don’t just say this because the documentation says so. From experience, what it really comes down to here is not the regular every day usage of Virtual SAN. Where people run into problems rather is when an issue occurs, such as during failures or periods of heavy virtual machine creation. Replication traffic during these periods can be substantial and cause huge performance degradation while they are occurring. The only way to know is to test what happens during a failure, or during a peek provisioning cycle.  This is critical as this tells you what the expected performance will be. When in doubt, always use 10GB Networking.
  3. Storage Adapter Choice – Although seemingly simple, the queue depth of the controller should be greater than 256 to ensure the best performance. This is not as much of an issue now as it was several months ago because the VMware Virtual SAN compatibility list should no longer have any cards that are under 256 queue depth in it anymore. Be sure to verify though. As an example there was one card when first released that artificially limited the queue depth of the card in the driver software. Performance was dramatically impacted until an updated driver was released. 

Top 3 Lessons Learned

The lessons learned have come with a price of a half or full day in which we were troubleshooting issues. Here are my lessons learned:
  1. Always Verify Firmware/Driver Versions – This one always seems to be overlooked but I am stating it because of experiences on site with customers. The one example comes to mind where we had three identical servers, bought and shipped in the same order that we were using to configure Virtual SAN. Two of them worked fine, the third just wouldn’t cooperate, no matter what we did.  After investigating for several hours we found that not only would Virtual SAN not configure, but all drives attached to that host were read only. Looking at the utility that was provided with the actual card itself showed that the card was a revision behind on the firmware. As soon as we upgraded the firmware (long story short turns out reading documentation is not one of my strong suits...for that firmware update we struggled with it until we realized that a COLD power off was required...) it came online and everything was working brilliantly.
  2. Pass-through/RAID0 Controller Configuration – It is almost always recommended to use a pass through controller such that Virtual SAN is the owner of the drives and can have full control of them. In many cases there is only RAID0 mode. Proper configuration of this is required to avoid any problems and to maximize performance for Virtual SAN. First, ensure any controller caching is set to 100% Read Cache. Secondly configure each drive as its own “array” and not a giant array of disks. This will ensure it is setup properly. As an example of incorrect configuration that can cause unnecessary overhead, several times I have seen all disks in a RAID configuration at the controller. This shows up as a single disk to the operating system (ESXi in this case) which is not desired. To fix this you have to go into the controller and configure it correctly, but you also have to ensure that the partition table (if previously configured) is removed, which can in many cases involve a zero out of the drive if there is not an option to remove the header.
  3. Performance Testing – The lesson learned here is that you can do an infinite amount of testing…where do you stop or even where do you start. Wade Holmes from the Virtual SAN technical marketing team at VMware has an amazing blog series on this that I highly recommend reviewing for guidance here. His methodology allows for both basic and more in-depth testing to be done for your Virtual SAN configuration.
I hope that these pointers help in your evaluation and implementation of Virtual SAN. Before diving head first in to anything, I always like to make sure that I am informed about the subject matter.  Virtual SAN is no different. To be successful you need to make sure you have genuine subject matter expertise for the design, whether that be in-house or by contacting a professional services organization. Remember, VMware is happy to be your trusted advisor if you need assistance with Virtual SAN or any other of our products!

Thursday, November 13, 2014

SSL Certificate Automation Tool - Past Blogs

I wanted to post these here as they were my first initial Blog Posts, created prior to this blog existing. I was heavy into the development and testing of the Tools until I transitioned into Professional Services Engineering.  The following should still be of great use though, as the process is still incredibly complex.  I hope to see everything get much simpler in the future.

===================
Originally Posted: April 4, 2013

Introducing the vCenter Certificate Automation Tool 1.0

Fresh out of development today VMware has a new tool to help everyone with the implementation of custom certificates. The vCenter Certificate Automation Tool 1.0, will help customers update certificates needed for running vCenter Server and supporting components. This is primarily of interest to customers who use custom certificates either generated internally from Corporate CAs, or from public CA’s like VeriSign.

To add a little background information various components within vSphere and the vCenter platform use certificates for identifying themselves as well as for secure communication with external software entities (browsers, API clients).  These can broadly be classified into the following categories:


  1. Secure token Service Certificate – Certificate used by vCenter Single Sign On (SSO) for encryption tokens
  2. Solution User Certificates – Certificates used by each solution to identify themselves as users to SSO
  3. SSL Certificates  – certificates needed for SSL communication for the UI and API layer
  4. Host Certificates – These certificates are deployed in each ESXi host and used for secure vCenter to ESXi communication.


Note: The new certificate tool automates the updating of certificates in the management layer only (a, b, c above). This tool does NOT handle replacement of certificates in ESXi hosts.

The vCenter Certificate Automation Tool aims to automate the process of uploading certificates and restarting the following components within the vCenter Platform:


  • vCenter Server
  • vCenter Single Sign On
  • vCenter Inventory Service
  • vSphere Web Client
  • vCenter Log Browser
  • VMware Update Manager (VUM)
  • vCenter Orchestrator (VCO)

For more information on how to download, install, and use the tool, refer to KB article: Deploying and Using the SSL Certificate Automation Tool (2041600).

======================
Originally Posted: May 21, 2013

SSL Certificate Automation Tool version 1.0.1


Last month we announced a new SSL Certificate Automation tool to help everyone with the implementation of custom certificates. Yesterday, we released the second version of it (version 1.0.1). This is a minor update which aims to simplify the replacement of certificates further by adding Certificate Signing Request (CSR) functionality to the tool. This functionality allows a user to quickly generate certificate requests (and consequently the private keys) for submission to the Certificate Authority.  The CSR functionality was the largest portion of manual steps, and as a result the update reduces the number of steps by over 15.

In addition, there are several minor bug fixes which were fixed which impacted tool functionality.

For further details and to download the latest version of the SSL tool see: Deploying and Using the SSL Certificate Automation Tool (2041600).

We hope these additions provide useful for everyone!

======================

Look forward to more to come!

First Post - About Me - Why not start now right?

First Post.  I figure I would start off on this blog by getting some detail about myself posted.  As many of you may know, I have been working at VMware for 9 years as of the end of November, 2014.  It definitely has been a journey, to say the least.

I started my journey at VMware in 2005 (yes...seems so weird to say that...crazy how time flies) after having previously worked for Microsoft for a couple of years.  At the time, ESX (I think it was version 2.5...can anyone say MUI) was not really something that was widely used or known in the industry.  I, in fact, started out working with VMware Workstation 4.x as well as GSX 3.x (good old GSX), back when the virtualization technology spectrum was much smaller.

Within about a year of me starting at VMware I had been moved over to the enterprise side of the house supporting ESX and VirtualCenter.  Back in those days, there was no specialties at all, merely fellow engineers who knew storage, or were proficient at networking.  Definitely talk about being thrown into the deep end of the pool.  I remember distinctly being blown away when I saw my first VMotion, so much so that I decided to go more of the Management side of technology rather than the infrastructure side.

Eventually the team was broken up into specialties, of which I ended up being part of the System Operations side of the house, supporting vCenter, ESXi, Certificates, VMotion, DRS, HA...to just name a few of the different technologies.

This lead me to a program that allowed me to feed back all of the data from customers into the development teams, and greatly improve the issues which I saw.  Although this was great and I truly enjoy troubleshooting these things to this day, I was eventually offered my current position in the VMware Global Center of Excellence, and subsequently Professional Services Engineering.

Although we are still under the same senior leadership, we have a completely different focus in that we are focused on developing designs, collecting and curating knowledge, and enablement for Professional Services here at VMware. In particular, I am focused with core virtualization elements, including vSphere, Virtual SAN, and Health Check Services.

Though I have only been here for a year and a half now, I have truly grown to appreciate the complexities of all the different products.  From Cloud, to Operations Management the amount that I see about the different offerings that VMware has, is crazy and I truly sometimes even now feel lost in how big it actually all is.

In this blog I intend to discuss anything interesting that I come across,  I hope you enjoy it.