Sunday, May 29, 2016

Serverless, NoOps, and Silver Bullets

In the aftermath of serverlessconf, Twitter was abuzz with the #serverless tag and it didn't take long for the usual NoOps nonsense to follow (Charity Major's aptly named "Serverlessness, NoOps and the Tooth Fairy" session notwithstanding) .

When you look at operations as the traditional combination of all activities necessary for the delivery of a product or service to a customer, "serverless" addresses the provisioning of hardware, operating system and, to an extent, middleware.

Even when we ignore the reality that many of the services used on the enterprise will still run in systems that are nowhere close to cloud-readiness and containerization, approaches like Docker will only take you so far.

Once you virtualize and containerize what does make sense, there are still going to be applications running on top of the whole stack. They will still need to be deployed, configured, and managed by dedicated operations teams. I will wrote my expanded thoughts on the topic a couple of months ago.

One may argue that a well-written cloud-ready application shoud be able to take remedial action proactively, but those are certainly not the kind of applications showing up on conference stages. Switching from RESTful methods deployed on PaaS to event listeners in AWS Lambda will not make the resulting application self-healing.

Whereas I do appreciate the "cattle-not-pets" philosophy and the disposability in a 12-factor app , I have actually worked as a site realiability engineer for a couple of years and we still needed to monitor and correct situations where we had cattle head dying too frequently, which often caused SLA-busting disruptions to end users expecting 5 9's reliability.

#NoTools, #NoMethod

Leaving the NoOps vs /DevOps bone aside, when I look at event-based programming models such as AWS Lamba and IBM OpenWhisk, and put them in contrast with software development cycles, I start to wonder whether development shops have fully understood the model's overall readiness beyond prototyping.

What is the reality of design, development tooling, unit-testing practices, verification cycles, deployment, troubleshooting, and operations? As an example, when I look at OpenWhisk,  I see NodeJS, Swift and ... wait for it... Docker. There is your server in serverless, unless you are keen on retooling your entire shop around one of those two programming languages.

At the peril of offering anecdotes in lieu of an actual study, some of the discussions on unit testing for event handlers can go from clunky to casually redirecting developers towards functional testing. And that should be the most basic material after debugging, which is also something conspicuously absent.

Progress is progress and the lack of a complete solution should bever be a reason to shy away from innovation, but at the same time we have to be transparent about the challenges and benefits.

If the vision takes a sizable number of tinkerers building skunkworks on the new platforms, that is all good, but we have to realize there is also an equally sizable number of shops out there looking for the next silver bullet. These shops will be quick to blame their failures on the hype rather than on their own lack of understanding of the total cost of development and operations of a cloud-based offering.

Click-baiting of dead development methods is well and alive for a reason, until you realize the big development costs depend more on the Big and Complex stuff than on how much time developers spend tending to pet servers under their desk.

As the serverless drumbeat continues, it remains to be seen whether we will witness an accompanying wave of serious discipline prescribing the entire method before another one is put out as the next big thing.

The obvious next step would be codeless code, which is incidentally the name of one of my favorite blogs. It contains hundreds of impossibly well-written and well-thought out material about software development, including this very appropriate cautionary tale on the perils of moving up the stack the concerns without understanding how the lower layers work.


Friday, January 1, 2016

AspectJ for Java logging: Benchmarking aspect instrumentation - part 8

While preparing to introduce the Java logging Aspect covered in this series to a new project, I wanted to quantify the potential overhead in a formal way since my earlier exercises showed negligible impact but were not documented.

While searching for some sort of already existing benchmarking test, I came across this blog posting, which was not quite a formal benchmark, but had some well-written code that would fit my needs. A big thanks for the folks at Takipi for the effort in there.

The required modifications were simple, including the creation of a new test class matching the existing test hierarchy and then creation of a simple logging aspect to recreate the explicit logging call found in the other tests.

My merge request has not been accepted at the master branch as of this writing, but you can see the updated tree in my own branch of the fork at GitHub: https://github.com/nastacio/the-logging-olympics/tree/nastacio-aspectj.

The results showed no consistent impact, which is to be expected since the AspectJ compiler pretty much inserts the same bytecode you would get from typing the advice in the pointcut yourself.

Saturday, December 5, 2015

Receding from the clouds, time to land


Starting on the first day of 2016, I will be back from the clouds and into the world of software development.

By then, I will have completed two years on the operations floor for the largest Cloud Foundry deployment(s) in the world, a detour from my original mission as Agile scrum master for the DevOps team.

Looking back...

During these two years I immersed myself in the daily operations for Bluemix, ranging from manning pager schedules (by the way, PagerDuty rocks) , from jockeying critsits in war rooms, to becoming its operations engineering lead, tasked with the role of bridging the gap between the system operators and the teams writing the tools used by the operations team.

In that period I acquired a new-found respect for Bash scripting and its ability to shame most scripted languages on the operations floor (I am giving a pass to Python, which may not be as efficient in some cases, but can never be shamed) .

It was also a great opportunity to get exposure to all the cool developments in PaaS market, where IBM and Pivotal keep on partnering in the Cloud Foundry foundation. I will sorely miss many people since I am going all the way back to development and somewhat far from some of the greatest cloud thinkers in the company. There is no avoiding the clouds, but it is one thing to take the occasional flight as a passenger and another thing entirely to be piloting the planes.

It was also my first opportunity to attempt using a MacPro as a regular workstation (that is what Pivotal recommends to people working with Cloud Foundry source code) . Very sleek, awesome screen, if not a wrist-hurting experience paired with the bizarre absence of essential keyboard keys, which made the usage of a bash shell a tolerable, rather than great, experience.

This environment taught me everything I wanted to know about operations, including the answers I didn't know I wanted to know. Whereas in development there are webs and flows that allow you to control your schedule even if you decide to work very hard, operations is a true crucible of skills, availability and stamina. It is also a vicious educator when you don't treat it with the respect it demands. There is no hiding, there is no winging it, every mistake is made in plain sight, recorded a dozen times over, charted, alerted, and felt everywhere by hundreds of thousands of users.

Leaving Bluemix was a very personal decision and in as much as there have been challenging times (I particularly remember staying awake for 42 hours at one point, spanning nearly two consecutive nights) the defining moments happened away from those times, on every rare occasion I had the opportunity to create one of our more complex engineering tools (my favorite was a Java RESTful webservice integration with the PagerDuty Webhook API) and remind myself of how much I enjoyed software development.

These moments were too rare and, coincidentally, two months after I made my decision, came the invitation to return to development, which I will cover in a moment, but not before I take yet another moment to thank the entire Bluemix team for the welcome, for the opportunity, for the trust, for the stories, and for the lessons learned.



...to look forward

Watson Health will be my new home, more specifically working in the Watson Genomic Analytics project.

In the first interview, it was clear this was somewhat more challenging than previous experiences. I am used to hearing about (and advocating for) use cases and stories and sprints, often telling people to focus on how the final solution can be used to make user's lives simpler or better.

Then I read announcements like this, which is a confirmation of earlier impressions on what the context of the work will be, on where the expectations will be drawn and what success will mean for everyone involved. I am still a geek at heart, but I feel like those 0's and 1's are about to take on a whole new meaning.

I am looking forward to the opportunity to meet my new colleagues, to work more closely with our research team, to be exposed to the various partnerships being established with clinics and hospitals, and to contribute whatever is possible to meet what many consider to be the ultimate use cases.

I will write more as I learn more, and there is a lot to learn.

Friday, December 19, 2014

Locating the organization, space, and identifier for an application inside Cloud Foundry

As a DevOps engineer for our Cloud Foundry deployment, I often need to translate application hostnames to the internal Cloud Foundry application identifier, or map the application name to its parent space and organization so that I can invoke commands like “cf event appname” or “cf app appname”.

Though we use the Admin UI heavily, it is often the case we also need the translation as part of other scripted actions, so I wanted to share these two common techniques:

From hostname to application name or application id

This entire sequence can be performed using the CloudFoundry CLI and assumes you have enough privileges to access all organizations and spaces involved in the request. For instance, let's assume an application with this hostname dnastacio2-java.mycfdomain.com.

Cloud Foundry designates the first part of the hostname to be the application route and the remainder of the hostname to be the application domain.
  1. Locate the domain associated with the hostname.

    cf curl /v2/domains?q="name:appdomain"

    e.g.

    cf curl /v2/domains?q="host:mycfdomain.com"

       ...
       "resources": [
          {
             "metadata": {
                "guid": "f4b90d7e-2cd3-4d30-b200-f28bbaf6be20",
                …
             },

  2. Locate the applications associated with the route. Notice that there can be more than one application sharing the same route:

    cf curl /v2/routes?q="host:hostname"&q="domain_guid:$domain_guid"

    e.g.

    cf curl /v2/routes?q="host:dnastacio2-java"&q="domain_guid:f4b90d7e-2cd3-4d30-b200-f28bbaf6be20"

       ...
       "resources": [
          {
             "metadata": {
                …
             },
             "entity": {
                "host": "dnastacio2-java",
  3. ... 
                "apps_url": "/v2/routes/2b55b629-dd7c-4376-bf2e-831d9a2f03d2/apps" ...

  4. Now you have the URL for the Cloud Foundry applications associated with the hostname, so you can get the details for each application.

    If you only need the application identifier, it will be next to the “url” field for each match.

    If you also needed the organization and space for the application, hold on to the entire “url” field for each match and move to the next section.


    cf curl /v2/routes/2b55b629-dd7c-4376-bf2e-831d9a2f03d2/apps
    {
  5. ...
          {
             "metadata": {
                "guid": "ff95e504-a057-4861-8023-327cf83e8908",
                "url": "/v2/apps/ff95e504-a057-4861-8023-327cf83e8908",  ...
             },
             "entity": {
                "name": "dnastacio2-java",

From application name or application id to organization and space

Once again, this entire sequence can be performed using the Cloud Foundry CLI and assumes you have enough privileges to access the organization and space:
  1. Identify the application id (if you don’t already have it) and the owning space.

    If you already have the application identifier, you can use this command:

    cf curl /v2/apps/applicationid

    If you only have the application name, you need an extra step, which can surface multiple applications:

    cf curl /v2/apps?q="name:appname"

    e.g.

    cf curl /v2/apps?q="name:dnastacio2"
    ...
       "resources": [
          {
             "metadata": {
                "guid": "f9e98c9a-2100-4c84-b0aa-f8e4988dad2b", 
    ...
                "space_url": "/v2/spaces/
    62b657af-df43-417e-a26e-5bdbbf3ba65a ",

    If you received multiple hits when querying applications by name, you may need to repeat the steps below for each application in order to find the one you really want.

    If you know the application domain, you can use the domain_guid parameter as a filter as described in the previous section.

  2. Identify the space name and organization

    (find the space URL in the response, then...)

    cf curl /v2/spaces/spaceid

    e.g.

    cf curl /v2/spaces/62b657af-df43-417e-a26e-5bdbbf3ba65a
    {
       "metadata": {
    ...
      "entity": {
          "name": "dev",
    ...
         "organization_url": "/v2/organizations/63b4a5b9-d9c8-4c69-b4f2-e1a05decf914",

    (find the organization URL in the response, then...)


  3. Identify the organization name

    cf curl /v2/organizations/organizationid

    e.g.

    cf curl /v2/organizations/63b4a5b9-d9c8-4c69-b4f2-e1a05decf914
    {
    ...
      "entity": {
          "name": "dnastacio",
    ...

    (find the organization and space names in the two prior requests, then switch that organization and space)
Now you are finally ready to switch to the organization and space for the application and use other CLI commands, such as “app”, “event” or “logs”

cf target -o dnastacio -s dev
cf app dnastacio2

Monday, July 28, 2014

Using iptables to route DNS searches away from Google DNS rate-limiting

I have a local CloudFoundry deployment where the Google DNS server is placed near the top of the /etc/resolv.conf file for my Ubuntu environment. Long story short, the applications in there started to get a little too busy in their RESTful requests to external servers and ran afoul of the Google rate-limiting policy for DNS lookups (https://developers.google.com/speed/public-dns/docs/security#rate_limit) .

We had a couple of local DNS servers at hand…blush…but I wanted to experiment with results before committing to a full reconfiguration of the environment to push out the new resolv.conf files to all DEAs, which would entail evacuating all applications from their wardens and waiting for Health Manager to restart the applications in other DEAs. I also had the option to launch a distributed script across all VMs to modify the resolv.conf file for the DEA and for all the wardens, but I thought it would be risky since some buildpacks could have DNS caches in place, which would be a major blind-spot in my testing.

Enter iptables, which was something I had somewhat successfully avoided for all these years other than the occasional OUTPUT rule. “Output” rules would have worked very well if it was not for one caveat: my local DNS servers were on the private network for my VMs whereas the Google DNS lookup was obviously being routed through the public interface. The obvious fix was to replace the source address in the packets with the private IP address of the VM, which would move the DNS traffic to the internal interface. In itself this last step precludes a solution based on changes to resolv.conf files.

After some searching I came across this excellent tutorial* to help me morph the conceptual solution into the final iptables instructions: http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch14_:_Linux_Firewalls_Using_iptables#.U4kU3RC1dPF

src_dns_server=8.8.8.8
target_dns_server=x.x.x.x
dea_private_ip=y.y.y.y

# a. Replaces the destination of all warden DNS requests directed at the original DNS
# with the address for the target
DNS. 
iptables -t nat -A OUTPUT -p udp -j DNAT --dport 53 –d $src_dns_server --to-destination x.x.x.x


# b. In the post routing phase, with the DNS requests already pointing at the target DNS,
# replace the source of the packet with the private IP address of the VM, so that the
# request to x.x.x.x will go out through the private interface.
iptables -t nat -A POSTROUTING -p udp -j SNAT -d x.x.x.x --to-source $dea_private_ip

# c. Make the same change as #a for the wardens. I did not have an explicit warden-output chain,
# but the warden output chain was tied to warden-prerouting chain.
iptables -t nat -I warden-prerouting 1 -p udp -j DNAT --dport 53 –d $src_dns_server --to-destination x.x.x.x

# d. Make the same change as #b for the wardens

iptables -t nat -I warden-postrouting 1 -p udp -j SNAT –d x.x.x.x --to-source $dea_private_ip

As a concrete and slightly trite example, if an application requested a DNS lookup for “resftful.service.com”, the IP addresses in DNS search packet would be originally generated as

source=$dea_public_ip
target=8.8.8.8
target_port=53

rule #c will transform the target DNS address to

source=$dea_public_ip
target=x.x.x.x
target_port=53

and rule #d will transform the source request address to

source=$dea_private_ip
target=x.x.x.x
target_port=53

* I also bookmarked this practical cookbook for a different challenge on a different day: http://www.thegeekstuff.com/2011/06/iptables-rules-examples/

Monday, January 20, 2014

Heading for the clouds

Last week marked the 4th year of my tenure in Jazz for Service Management (it was known by different names at first) , as the lead developer for Registry Services, a data reconciliation service for the multitude of IBM products developed in-house and acquired over the years.

Looking back…

For the first three years I did not write much about the Registry Services experience, other than offering general impressions on leadership and focus on quality in context of the challenges at the time, both in "Guns, Police, Alcohol, and Leadership" and in “On Mountains, Beliefs and Leadership”. In 2013, I did write a more detailed technical account in the Service Management Connect blog.

Without getting into the gritty details of the pace-setting procedural excellence we achieved in these four years (modesty did improve a bit, but the progress we all made in these four years earned us a few seconds of self-congratulatory pats on the back :- ) it is worth noting that over 4000 functional points are now validated in an automated manner in less than 12 hours across key platforms and 24 hours across the complete set of platforms, every single day of the week.

This marked evolution was only possible with disciplined execution within the Agile development method, few steps at a time, guided by stakeholders one sprint demo at a time, honed by team’s feedback one retrospective session at a time.

I think it will be hard to top the combination of motivation we had from the people executing the tasks with the vision and support from the management team, but four years was also enough time for several people on the team to grow to a level of leadership, skill, and confidence where the smartest thing to do was to make room for the next level of collective growth.

...to look forward

As I started to look for a position to make room for the overall transition, it was to my surprise and delight to hear about the IBM Cloud Operating Environment team just starting to look for a Scrum Master and Dev/Ops lead for its core deployment (based on Pivotal’s Cloud Foundry) . To distill the letter soup, this is essentially IBMs Platform as a Service (PaaS) offering, now officially known as Bluemix. Within two weeks both teams had lined up the whole transition and starting this week, this is my new professional home.

Cloud and Dev/Ops is a space I have been meaning to enter for a long while. It is an opportunity to work with the most recent technologies in a very competitive space, against established giants, but backed by another giant flexing a multi-billion effort across all of its divisions.

I am looking forward to a brilliant 2014 in company of the Bluemix team, learning how to bring the same level of excellence into a different development model, adapting to the new cultures in our geographically dispersed teams, closing the gaps between development and production deployments (a tooling geek’s paradise) , and moving closer to our customers experience, both in the way we support IBM offerings in Blue,ix and watching what development shops around the world will do with it.

Monday, October 28, 2013

Registry Services by the DASHboard light

Quite excited about the prospects of our most recent work on getting Registry Services data (IT resource management) into DASH, the dashboarding component of Jazz for Service Management.

I wrote an entry on the Jazz for Service Management blog:

https://www.ibm.com/developerworks/community/blogs/69ec672c-dd6b-443d-add8-bb9a9a490eba/entry/you_can_see_registry_by_the_dashboard_light?lang=en

The possibilities of integration with datasets from other systems management products offering OSLC interfaces make it really cool.

And there is even a video-recording.

Registry Services data on JazzSM DASH UI