Gregory Lampshire

It’s not about lower taxes…that completely misses the point again

Republicans have for a long time focused on the “lower taxes” mantra. While recent events suggest that this mantra was probably not a sticky value, the idea of lowering taxes as a key objective is wrong and incredibly poor thinking–it’s not even good business thinking. These thoughts apply to all political parties.

The real challenge is to lower my total costs, not just lowering one part of my cost burden while jacking up other costs that’s just cost-shifting which invariably is a zero-sum game.

A recent article in the Washington Post highlights this issue. The article describes how CA will demand that all new homes have solar power installed. The payback for doing so is over the life of the home, more or less, potentially sooner. This seems like an outrageous cost burden to impose. But let’s think of the alternatives.

If we believe that we can fully identify costs, including the costs of building new power plants and the cost of pollution or even the cost of servicing the national debt, then let’s focus on lowering “total” costs.

For the solar home law, they are shifting costs to the homeowner with the anticipation of lowering or zero-summing costs away from large utilities/higher taxes. That does not really help on a net basis–the costs may be the same. Of course, if solar homes lower overall costs over the calculation horizon, then its a win and a good model. We could also speculate that the law imposes regulations on homes with the trade-off of not imposing regulations on future power generation or other utilities. While you may think that all regulations are bad, a subject of another blog, this regulation is a cost-shifting regulation with potential, we presume, of lowering overall costs.

It’s possible that cost-shifting stimulates innovation in the area where the costs are borne. We have seen, however, that when costs are imposed on companies, many companies just lobby to have them removed and play tax schemes vs innovating around them. That’s not good and we should demand more from the companies we purchase from.

If we imagine that there is widespread popular support for addressing global warming, etc. through solar power generation then shifting costs to the “people” may force the people to “demand” innovation in companies. This allows the public to have an end-run around companies not being well managed, lacking morals and civil responsibility. By shifting costs to the homeowners, the CA government may also be getting around various federal and state officials’ lack of action on this particular topic.

We face the “lower taxes” hoax all the time. Lawmakers have shown that they are not really interested in lowering total costs–which should be the real focus. In fact, they do not seem to be interesting in lowering taxes either except for a few donors. Lowering taxes (or taxes in one category) while increasing costs elsewhere is disingenuous. I do not care whether I pay one person or another for a certain level of benefit, comfort or moral objective given the two are roughly equivalent. And in some cases, forcing costs onto one party or another is not morally or tactically helpful. Sometimes it could be, sometimes it is not.

Blockchain will change everything…wait…Blockchain 2.0 will change everything…wait…

Much has been said about Blockchain and how it will, literally, change the entire world. Blockchain is the hip and relatively new technology that has been described as the second coming of the internet. Most people are familiar with it based on the cryptocurrency, bitcoin, a pseduo-anonymous, distributed, public distributed ledger of currency. Depending on how you use the vocabulary, bitcoin/blockchain could refer to a variety of things ranging from the algorithm, the protocol or the currency. It’s been claimed that the blockchain can be applied to “all human endeavors”–as has been foretold since bitcoin came into the public view. It’s important to remember that blockchain technology is part of a cryptocurrency but a cryptocurrency is focused on payments while blockchain technology can be used for more than payments.

Regardless of the risk, legal or moral issues surrounding blockchain as a currency, bitcoin technology allows parties with various trust levels to transact together. Blockchain 1.0 really viewed the world through a currency and financial lens–financial transactions between two or more parties. Blockchain 2.0 is based on the idea that “all human endeavors” can be coded (you pick your programming language) into little programs that are baked into the blockchain and “run” based on triggers or other criteria i.e. smart contracts. These little blockchain programs allow you to execute conditional logic e.g. if it rains on Tuesday, pay party “A” 2 bitcoins. Obviously, as soon as a “program” is executing, you run into a large variety of issues such as the ability of that program to run a “trusted” fashion or who gets access to what and whether access can be limited (talk about risk mangement!). Blockchain 2.0 technology also has additional features to serve diverse needs of their users e.g. blockchain tokens/coins for use in representing physical (or even non-physical) assets.

Newer projects such as Ethereum, Hyperledger and others have been created to deliver the Blockchain 2.0 vision. They add the ability to run these programs, control access, create trusted execution environments, etc. I will state for the record that all of these things are needed to create a Blockchain that is useful to business interests e.g. B2B type activities where additional privacy, control and capabilities are needed–governance in general. You could easily imagine taking Blockchain 1.0 and using it carefully to create Blockchain 2.0 capabilities, but Blockchain 2.0 is a bit more a rewrite than a tweak.

This is all very good, but the questions you should start asking yourself is “who gets to cash the check–who really benefits?” The person “cashing the check” really determine how fast things will move and whether they will share the benefits with others.

Blockchain promises to reduce the cost of transactions and make it easier for parties that do not trust each other, to conduct transactions. Does that mean that banks are not needed and the cost of a transaction becomes minuscule unlike today? I’ll mention that the concept of “transaction” related to banks may or may not mean exchanging payments, it could also mean “asset management.”

If so, the consumer benefits, the banks do not. Or does it mean that banks are still needed, maybe they are not called banks anymore, but a middleman is still needed. If so, then the “new middleman” benefits at the loss of the old middleman (ala Platform Scale). Consumers may lose for awhile due to an increase in choices/confusion.

The technology can deliver benefits. However, it is interesting to consider:

You will still need alot of computer servers and people to feed and care them.
- The actual blockchain can be viewed as a database that talks to other databases to sync up and update itself. Sometimes the algorithms require alot of computational power.
You’ll still need to administer the process e.g. in Blockchain 2.0, someone has to give “permission” to transact.
There are legacy assets that need to be retired over time and sometimes this takes a really long time–as in decades.
There will probably be multiple, maybe thousands, smaller transaction networks setup for specialized interests and uses. This means that all the above issues are multiplied by “n.”
It is hard to get people to agree to use the same standards across the entire stack of an application unless it gives them an advantage.
New technology and its applications that enable new scenarios can create challenges to managing risk—not transaction risk but overall risk of the activities the transactions support.
Perhaps most importantly, if you transact with Bitcoin 2.0, you have to trust the platform to execute, which means you have to trust the people running the platform, which is exactly the issue we have today, “who do you trust?”

Bitcoin 2.0 thinking is designed to be more business friendly e.g. less computational power needed and more access controls. As Bitcoin 1.0 becomes Bitcoin 2.0, the types of issues present in today’s systems crept in and imposed an overhead and burden similar to the way the same requirements burden today’s environments. The key issue though is that IF companies can agree to use these new technologies together, then their total cost of ownership CAN go down. In other words, if companies collaborate smartly to transact, then yes, costs can go down and benefits can increase. This was true 30 years ago as well–standardization can benefit the entire ecosystem.

So its clear there can be a benefit. Most likely companies will benefit first as they will incur the initial investments and most companies will not fully transfer everything over to the new platform. Eventually, consumers will benefit as existing goods and services can operate under cheaper transactions. Cryptocurrencies are where people can obtain a benefit fairly quickly if you can become comfortable with the use of non-fiat currency. Government regulations will eventually catch up.

So back to the title, Blockchain can definitely change everything. Companies could benefit the most first, incrementally. There can clearly be a shift in the players and there are opportunities for startups to disrupt if they can get out far enough ahead using Christensen‘s definition of disruption.

But I am not convinced that it is a tidal wave about to hit me this year or next (2019 looks like a strong blockchain year with 2018 being a ramp) especially since large corporations most likely hold the keys to deployment speed and deployment functionality. For example, today, there are only a few firms that really hold the “ledgers” (custodians) for financial accounts. These players are enormously powerful and “trusted” for good reason. That’s not going to change. They are the only ones that will really lead the charge in the financial sector because they own the “transactions.”

They are not going to go away quietly or at all. They will probably create a bitcoin-based system that benefits them–the new market makers. Whether good or bad, they will deploy blockchain first and reap the benefits of the investments and they are the ones who will create a system beneficial to them. It is doubtful if they will ever pass along the benefits since they must still maintain legacy systems, they’ll have two systems to maintain. More importantly, why should they pass along the benefits to others? A smart person, with morals not strictly aligned with public benefit, will seek to make money and enhance their position. It is known that this is exactly what they are doing, right now.

Sure, there are other types of “custodians” who hold the ledger today. But due to a variety of factors, once you back away from a “single, transparent system that untrusted parties can transaction with” which is what bitcoin 1.0 is today with its “proof-of-work”, the collaboration and standards benefits start bouncing up against creeping costs to “use.”

Today, there are over 100 cryptocurrencies. Beyond payments, will the future hold tens of thousands of “bitcoin 2.0” ledgers? Fragmentation, even using the same technology, also seems like the bogeyman of the benefits story. In order to try and gain control from current owners, disruptors will try to “own” the bitcoin 2.0 ledger platforms to run smart “contracts.” In the in process the “ledgers” will fragment.

Also, since its doubtful that there will really be any disruption quickly (but it is coming), the limited set of players who deploy these capabilities will reap the reward in the short and near-term. There will be benefits from Bitcoin 2.0 but maybe we (the public) will need to wait…until Bitcoin 3.0, wait, until Bitcoin 4.0, wait, …

Disclaimer: I own a few bitcoins.

Drink the Kool-Aid? Yes, But Pick When You Drink It

In the business world, drinking the kool-aid refers to an employee’s willingness to commit fully, without hesitation and without cynicism, to their organization’s and boss’s objectives–to be a fully engaged team member.

I was thinking about a friend who recently changed jobs. He is a smart guy and always has two or things running in parallel–backup plans in case the primary activity fails. I had suggested that for the moment, he needs to drink the Kool-Aid on his primary activity. He needed to stop keeping options actively in play once he made his primary choice as maintaining options sometimes has its price. The idea was to stop thinking that the current gig was temporary. Was I wrong to recommend this?

HBR recently had a short article that suggested there is a real cost to making backup plans. The fundamental question goes something like this:

“When we think about what we’ll do if we fail to achieve our goals, are we less likely to succeed?”

The answer, according to Jihae Shin the principal investigator, is mostly a “yes.” His research concluded that people who made back-up plans achieved their goals less often than others who did not have back-up plans. But his findings did not say *not* to make back-up plans. Instead, you should be more thoughtful about the timing and level of effort you put into your backup planning.

That makes sense. Different people operate differently. For example, we want a backup plan for our son, who is focusing on a music career in college. But we do not want him spending a lot of time on the backup plan *now*. We do not want our son to be distracted from his focus on music *now*, in order to prepare later for a different career later which he may never pursue. We encourage him to think of options but to the point that at the expense of his current focus.

I think my suggestion makes sense specific to my friend’s situation. I was not suggesting that he forgo multiple threads running, but that he fully commit to the one in front of him and assume that this choice would be the solution for a very long time.

The idea is to take the opportunity as far as it will go and only then get the backup plans moving along. It was really a suggestion to stop thinking that the current objective would not be achieved and to avoid the distraction of trying to line up alternate plans prematurely.

At the right time, even temporarily, go all in, get the tee-shirt, buy the mug, think that your organization is great even if it has warts, adopt its strategy–drink the Kool-Aid. Pick a time, later, to consider options.

The future of CRM application software – today’s tech can rebaseline the norm

CRM applications used by the frontline have been around for around 20-30 years. My first consulting job was designing a CRM portal for wealth management advisors distributed around the country. Technically, it was web based and was a bit of a reach at the time but it was highly innovate and essentially had all the moving parts you see in CRM applications today. Over time, I went on to design and launch many more CRM applications covering a broad range of areas some of which won awards or were highly placed. CRM apps cover a wide range of touchpoints usages and my focus here are those CRM apps used by the frontline when the engage with the customer.

The world of CRM apps has not changed much. Today’s CRM apps are slicker, more integrated an easier to be program. But overall, they still fundamentally are hard to use, hard to enforce a process with and generally try to force you to enter in structured data all for the explicit purpose of using that data an the backend side.

In other words, the way you interact or want to interact with a customer–a fluid dance of conversations and touchpoints–comes to a jarring halt when you have to type your customer “data” into a relatively fixed, confining CRM application on your screen. Even the marketing automation space has learned that it screwed up as it realized that email campaigns have become old school and the nuances of social media marketing and messaging are the new black. After all, a growing majority of people today use email less than the previous generation, significantly less.

What is the future?

The future is not narrow list of checkboxes, pick lists, small text boxes or small fields to capture one specific concept, like the first name.

Instead the future is fluid and free flowing, much like many of the newer collaboration tools just now gaining prominence in small companies and now larger companies. It’s more about “notes” and small snippets of information versus structured screens. It’s more about searching different locations for data about customers and not requiring that all information be managed in a single tool. It’s about automating the interactions so that the right information is available to personalize a touchpoint.

Evidence for this model abound:

CRM applications now have “chatter” or “posts” that capture a stream of unstructured notes and objects like pictures or audio clips.
Applications like “slack” show that collaboration and documentation is easier when it’s fluid, in context and completely searchable. Trello is the same way.
Many CRM applications capture only a few structured fields and most of the complexity is really around trying to capture additional customer information–which is where the application start becoming unwieldy.
Most CRM software tries to tie together a 360 degree view of the customer using various ad-hoc methods of integrating with other applications. They shoehorn that “app’s” data into the CRM application to get a 360 degree view of a customer. These integration costs are often the largest costs in a CRM project.
CRM has started to rely on data mining and machine learning algorithms to help the advisor/rep become more productive about how to spend their time at the same time they personalize communication to the customer.
CRM automation is increasing as bots and other automation techniques become more prevalent…for some products and channels, customers prefer automation.

Now CRM is more than just capturing information about customers, it’s also about servicing them and using information, again in context, to order their products, resolve their issues or try to understand their behavior. Getting information from other applications into the context “flow” has proven to be very tricky.

It’s true that some data, like an order, is highly structured and needs to be in sequence properly to support the supply chain, that’s fair. But a lot of CRM data does not need the same amount of structure. When interacting with a company’s rep or a automated systems, the needs are much different. CRM apps do need to digest data of different media types and tell you what’s important. Or, at the very least, sort through the data and summarize it for you.

In other words, the future of CRM is really more like an instant messaging program like Slack or a free-form note taking application OneNote or collaborative management tool like Trello then an application framework like popular CRM platforms today. Think tweets and hashtags and AI driving data record enrichment.

It’s not about checkboxes anymore. Sales people do not really like check checkboxes. Text mining, or unstructured analysis–whatever you want to call it–is mature enough to sort through the data and fined postal addresses, email addresses, phone numbers and linkage information to connect all the dots and prepare the data for analytical use. Network analysis is mature enough to create a graph of contacts, with context, from your email and notes. This crystal ball thinking is true for both B2C and B2B although B2B has regulatory issues that suggest that it does require some additional “structure.” In fact, these techniques are in play in extremely advanced CRM scenarios such as Know Your Customer in the AML/BSA space.

A lot of what passes today for CRM software is just a jumble of straight jackets that are unneeded and run counter to how people communicate, create information and collaborate today.

Branding, advertising and social media

There were two articles this week/month on social media advertising that did not seem to overlap per se but are related.

The first is in HBR, March 2016 issue titled “Branding in the Age of Social Media.” (here) This article suggests that companies have spent billions on trying to build out their brands using social media but most of the money and effort has been a waste. The basic idea idea is that branded content and sponsorships in the past used to work because there were limited channels of distribution for the content and therefore most consumers had limited choices and had to watch what was shoved into those channels.

Today, it’s a bit different. The mulitude of channels means that consumers can filter out ads, shape their own customized content flows and create their own flow of entertainment content–much of it created by their friends. Rather suddenly, brands no longer could command the audience. The article mentions that most heavily branded companies such as Coca-Cola command less viewership than two guys sitting on a couch narrating video games (“e-sports”). Now, brand must fit into the flow of either “amplified subcultures” (groups of people with more narrow interests) or “art worlds” where new creative breakthroughs occur. Either way, you have to fit in via cultural branding where you align the brand around the culture of people in those two areas. So the brand can be there but only in the context of say, for example, the subculture of people who do not like smelly socks that come from running 10 miles a day. You have to create a story about smelly socks and positioning your laundry detergent as part of addressing the smelly socks problem (I made up the smelly socks example).

You essentially align the product/brand around a more specific theme that resonates with the target audience. Because the specific themes are more narrow, the amount of creative customization increases.

This is not a new concept. The article is really just saying that you have to create content about your brand/product that aligns with you target audience and is delivered to them through the “channels” that they watch.

I was also scanning Bloomberg Businessweek and their article “If You Don’t Know It By Now You’ll Never Make Millions on Snapchat.” (here) It described the “snapchat” phenomena, with its rapid rise, as well the challenge many similar companies have on maintaining their user volumes. The biggest issue is that they need to generate revenue and Snapchat is considered “expensive” advertising with little insights into “returns.” One of the strategies Snapchat has taken is to focus their sales time on helping customers create stories to fit into their Discover channels and Snapchat’s model of perishable content. Still, a slightly talented musician posting just his daily musings and activities garners more views than all the biggest networks combined, daily. Ouch!

But it is just another lesson in what we already knew. Find the audience you want to reach, find out where their eyes are especially now that they more choices about how and where they engage, tailor your content with a message and delivery that will engage them to watch, take action or whatever. Segment, segment, segment…

That’s about it. So yes, branding (and really just general advertising) has changed. It has to be more clever/entertaining, more thoughful and more tailored to a smaller group. You cannot rely on a famous name to push your product alone and you cannot count on blanket reach to communicate.

So there is not really a lot of new news here, just a recognition that we as companies and marketers have to be more clever because the easy ways no longer work and it’s possible to get a huge ramp (given the viewing numbers) if we put that cleverness to work.

Perhaps the real news is that some people in their current jobs need to become more clever quickly or find some clever people to help them with their branding/marketing. What is wonderful at least to me, is that the volumes of eyeballs in some of these channels makes them worth paying attention to.

Got it.

Check.

Roll credits.

Platform Scale

Sanjeet Choudary has put out a book about how platforms, not pipes, are the new business model. The book is very inspiring so I recommend reading it. There are not any new ideas in it but they are packaged together very nicely. It’s very much another “explaining things” book and for the lens that it wants you to use, I think it does a good job.

The key thought behind the book is actually fairly simple:

Be a middleman. Reduce your costs as a middleman to gain share. Shift cost and risk out to everyone else, as much as possible. Allow companies to build on your platform. Reducing your middleman costs can gain you share and the best way do that is to be digital. If you only make a small slice of money at every interaction, you need alot of interactions so don’t forget the “make it big” part.

That’s really about it. There’s not alot of examples with deep insight in the book and he avoids most levels of strategic thinking entirely. The book also fails to connect what has been going on today to the massive “platforms” built in the past few decades but which are not necessarily fully digital as in the examples reused in the book. The book spends most of its pages explaining that if you can reduce transactions costs and get scale, the wold is your oyster. Of course, this is only just one model of succeeding in business and actually not always the most interesting or sustainable.

But that’s OK. Go find your “unit,” reduce that friction and make a billion. It’s a good read.

Enjoy!

Why I Like Fishing – It’s Not What You Think

Yesterday, my family went on a fishing trip.

We keep a twenty-one foot, center console fishing boat over on the Eastern Shore just off the Chester River. The Chester feeds the Chesapeake Bay. The mouth of the Chester is about 1 mile north of the Bay Bridge.

There were four of us, my wife and I and our two sons. I bought sandwiches and some chips at the nearby Safeway and we had each had our own water jugs. We brought 10 fishing poles. Four poles are heavy duty and are designed to catch larger fish deeper in the bay (around forty to fifty feet in the main channel). We had our planar boards with us to spread out the lines but we used them only once.

This was October–the stripers had just started running. The stripers (aka Rockfish) become larger by November, but we were out early to see what we could catch. Most of the time we did the following:

Jet out to the middle of the channel.
Look for bird flocks on the water.
Jet the boat to the seagull flock, along with several other fishing boats.
Fish with individual poles using a variety of lures. My youngest son is an expert fisherman so he knew which lures to use for each situation.
Try not to hit the other boats.
Catch fish.
Release those that were too small.
Catch seagulls, by accident.
Untangle the seagulls, unharmed.
When the seagulls picked up and moved, following the fish, jet to the new spot.
So we, along with alot of other fisherman, move from flock to flock, jetting around in the water, trying to catch legal sized fish.

That’s it! We did that for half the day.

Our “charter” started late because I was late from a Saturday meeting. We left around 1:30pm on the boat and came back right after sunset, around 6pm. As we returned to the Chester after sunset, we were not paying close attention to driving and almost hit a dock, but that’s another matter that my youngest son can explain one day to his kids when discussing boat safety.

It was wonderful weather, not too cold. Skies were overcast which kept it cooler–good for fishing of course. We had forgotten to fill the oil reservoir so the oil engine light kept coming on. We had plenty of oil, the reservoir was just low that’s all.

After the trip, we came back and had some delicious crab cakes at the house with my wife’s mom. The crabcakes were from the Bay Shore Steam Pot in Centerville. I think they are the best crab cakes on the Eastern Shore and the shop is very close to where we keep our boats.

It was our older son’s eighteenth birthday. He had wanted to go fishing. The night before, we went to a jazz concert with the Anderson twins (sax, clarinet and flute) and Alex Wintz (guitar), known as the Peter and Will Anderson Trio, in Baltimore at the fabulously cool An die Musik. Fabulous concert. All of the chairs were oversized and full of padding; relics from a regal hotel no doubt. Front row seats. The jazz seemed to infuse the next day’s boating trip.

It seemed to me that fishing was about getting things done and working together, like jazz, versus pop music or old style rock and roll both of which have a different type of energy.

Overall, we caught around forty fish but only a few were keepers. Stripers need to be twenty inches to keep, and our largest was seventeen. No matter.

While you can still catch a fish on a simple fishing pole off the dock, the larger fish need to be found. You need the right gear but it’s not excessive. You need to know some techniques to catch alot of fish to find the few keepers. You need to work as a team since steering, fishing and keeping your eyes open for the bird flock is hard for just a single person to do. My wife and I did less fishing than the kids but we helped as much as we could having been relegated to deck hands. My wife took alot of pictures and I sneaked in a few. We were fortunate to grab some pictures below the Bay Bridge with the bridge framing our fishing activities.

As we headed back to the dock for the night, I thought this was the nicest family weekend in a long time. We all worked well together on a small boat and got things done. Everything seemed to come together and it felt good. My youngest son captained the boat and it was my older son’s birthday. In a crazy, fast world, we spent a little slice of time trying to catch a few fish, together. Perhaps the fish were really not the point.

That’s why I like fishing.

yes, yet another bigdata summary post…now it’s a party

Since I am “recovering” data scientist, I thought that once in awhile, it would be good deviate from my more management consulting articles and eyeball the bigdata landscape to see if something interesting has happened.

What!?! It seems like you cannot read an article without encountering yet another treatise on bigdata or at the very least, descriptions of the “internet of things.”

That’s true, but if you look under the hood, the most important benefits of the bigdata revolution have really been on two fronts. First, recent bigdata technologies have decreased the cost of analytics and this makes analytics more easily available to smaller companies. Second, the bigadata bandwagon has increased awareness that analytics are needed to run the business. Large companies could long afford the investments in analytics which made corporate size an important competitive attribute. The benefits from analytics should not lead to a blanket and unthoughtful endorsement of analytics. Not every business process, product or channel needs overwhelming analytics. You want, however, analytics to be part of the standard toolkit for managing the value chain process and decision making.

The ability to process large amounts of data, beyond what mainframes could do, has been with us for years-twenty to thirty years The algorithms developed decades ago are similar to the algorithms and processing schemes pushed in the bigdata world today. Teradata helped created the MPP database and SQL world. AbInitio (still available) and Torrent (with their Orchestrate product sold to IBM eventually) defined the pipeline parallelism and data parallelism data processing toolchain world. Many of the engineers at these two ETL companies came from Thinking Machines. The MPI API defined parallel processing for the scientific world (and before that PVM and before that…).

All of these technologies were available decades ago. Mapreduce is really an old lisp concept of map and fold which was available in parallel from Thinking Machines even earlier. Today’s tools build on the paradigms that these companies created in the first pass of commercialization. As you would expect, these companies built on what had occurred before them. For example, parallel filesystems have been around for a long time and were present on day one in those processing tools mentioned above.

Now that the hype around mapreduce is declining and its limitations are finally becoming widely understood, people recognize that mapreduce is just one of several parallel processing approaches. Free from the mapreduce-like thinking, bigdata toolchains can finally get down to business. The bigdata toolchains realize that sql query expressions are a good way to express computations. Sql query capabilities are solidly available in most bigdata environments. Technically, many of the bigdata tools provide “manual” infrastructure to build the equivalent sql commands. That is, they provide the parsing, planning and distribution of the queries to independent processing nodes.

I consider the current bigdata “spin” that started a about 1-2 years ago healthy because it increased the value of other processing schemes such as streaming, real-time query interaction and graphs. To accommodate these processing approaches, the bigdata toolchains have changed significantly. Think SIMD, MIMD, SIPD and all the different variations.

I think the framework developers have realized that these other processing approaches require a general purpose parallel execution engine. An engine that AbInitio and others have had for decades. You need to be able to execute programs using a variety of processing algorithms where you think of the “nodes” as running different types of computations and not just a single mapreduce job. You need general purpose pipeline and data parallelism.

We see this in the following open-source’ish projects:

Hadoop now as a real resource and job management subsystem that is a more general parallel job scheduling tool. It is now useful for more genera parallel programming.
Apache Tez helps you build general jobs (for hadoop).
Apache Flink builds pipeline and data parallel jobs. Its also a general purpose engine e.g. streaming, …
Apache Spark builds pipeline and data parallel jobs. Its also a general purpose engine e.g. streaming, ..
Apache Cascading/Scalding builds pipeline and data parallel jobs, etc.
DataTorrent: streaming and more.
Storm: Streaming
Kafka: Messaging (with persistency)
Scrunch: Based on apache crunch, builds processing pipelines
…many of the above available as PaaS on AWS or Azure…
…

I skipped many others of course and I am completely skipping some of the early sql-ish systems such as hive and I have skipped visualization, which I’ll hit in another article. Some of these have been around for a few years in various stages of maturity. Most of these implement pipeline parallelism and data parallelism for creating general processing graphs and some provide sql support where that processing approach makes sense.

In addition the underlying engines, what’s new? I think some very important elements: usability. The tools are a heck-of-alot easier to use now. Here’s why.

What made the early-stage (20-30 years ago) parallel processing tools easier to use was that they recognized, due to their experience in the parallel world, that usability by programmers was key. While it is actually fairly easy to get inexpensive scientific and programming talent, programming parallel systems has always been hard. It needs to be easier.

New languages are always being created to help make parallel programming easier. Long ago, HPF and C* among many were commercial variations of the same idea. Programmers today want to stay within their toolchains because switching toolchains to run a data workflow is hard work and time consuming to develop. Many of today’s bigdata tools allow multiple languages to be used: Java, Python, R, Scala, javascript and more. The raw mapreduce system was very difficult to program and so user-facing interfaces were provided, for example, cascading. Usability is one of the reasons that SAS is so important to the industry. It is also why Microsoft’s research Dryad project was popular. Despite SAS’s quirks, its alot easier to use than many other environments and its more accessible to the users who need to create the analytics.

In the original toolsets from the vendors mentioned earlier in this article, you would program in C++ or a special purpose data management language. It worked fine for those companies who could afford the talent that could master that model. In contrast to today, you can use languages like python or scala to run the workflows and use the language itself to express the computations. The language itself is expressive enough that you are not using the programming environment as a “library” that you make programming calls to. The language constructs are translated into the parallel constructs transparently. The newer languages, like lisp of yore, are more functionally oriented. Functional programming languages come with a variety of capabilities that makes this possible. This was the prize that HPF and C* were trying to win. Specialized languages are still being developed that help specify parallelism and data locality without being “embedded” in other modern languages and they to can make it easier to use the new bigdata capabilities.

While the runtimes of these embedded parallel capabilities are still fairly immature in a variety of ways. Using embedded expressions, data scientists can use familiar toolchains, languages and other components to create their analytical workflows easier. Since the new runtimes allow more than just mapreduce, streaming, machine learning and other data mining approaches suddenly becomes much more accessible at large scale in more ways than just using other tools like R and others.

This is actually extremely important. Today’s compute infrastructure should not be built with rigid assumptions about tools, but be “floatable” to new environments where the pace of innovation is strong. New execution engines are being deployed at a fantastic rate and you want to be able to use them to obtain processing advantages. You can only do that if you are using well known tools and technologies and if you have engineered your data (through data governance) to be portable to these environments that often live in the cloud. It is through this approach that you can obtain flexibility.

I won’t provide any examples here, but lookup the web pages for storm and flink for examples. Since sql-like query engines are now available in these environments, this also contributes to the user-friendliness.

Three critical elements are now in play: cost effectiveness, usability and generalness.

Now its a party.

Do sanctions work? Not sure, but they will keep getting more complex

After Russia and Ukraine ran into some issues a few months back, the US gathered international support and imposed sanctions.

Most people think that sanctions sound like a good idea. But do they work?

Whether sanctions work is a deeply controversial topic. You can view sanctions through many different lenses. I will not be able to answer that question in this blog. It is interesting to note that the sanctions against Russia over the Ukraine situation are some of the most complex in history. I think the trend will continue. Here’s why.

Previously, sanctions would be imposed on a country that is doing things the sanctioning entity does not want to happen. Country-wide sanctions are fairly easy to understand and implement. For example, sanctions against Iran for nuclear enrichment. Sanctions in the past could be levelled at an entire country or a category of trade e.g. steel or high performance computers. But they have to be balanced. In the case of Russian and Ukraine, the EU obtains significant amounts of energy from Russia. Sanctions against the energy sector would hurt both the EU and Russia.

Sanctions today often go against individuals. The central idea is to target individuals who have money at stake. OFAC publishes a list of sanctioned individuals and updates in regularly. If you are on the list, you are not allowed to do business with those sanctioned individuals, that is, you should not conduct financial transactions of any type with that individual (or company).

The new Russian sanctions target certain individuals, a few Russian banks (not all of them), and allows certain forms of transactions. For example, you cannot transact with a loan or debenture longer than 90 days maturity or new issues. Instead of blanket sanctions, its a combination of attributes that apply as to whether a financial transaction can be made.

Why are the Russian sanctions not a blanket “no business” set of sanctions?

By carefully targeting (think targeted marketing) the influences of national policy, the sanctions would hurt the average citizen a bit less, perhaps biting them, but no so much that the average citizen turns against the sanctioning entity. Biting into the influencers and others at the top is part of a newer model of making individuals feel the pain. This approach is being used the anti-money laundering (AML) and regulatory space in the US in order to drive change in the financial services industry e.g. hold a chief compliance officer accountable if a bad AML situation develops.

So given the philosophical change as well as the new information-based tools that allow governments to be more targeted they will keep getting more complex.

Oso Mudslides and BigData

There was much todo about google’s bigdata bad flu forecasts recently in the news. google had tried to forecast flu rates in the US based on search data. That’s a hard issue to forecast well but doing better will have public benefits by giving public officials and others information to identify pro-active actions.

Lets also think about other places where bigdata, in a non-corporate, non-figure-out-what-customers-will-buy-next way, could also help.

Let’s think about Oso, Washington (Oso landslide area on google maps)

Given my background in geophysics (and a bit of geology), you can look at Oslo, Washington and think…yeah…that was a candidate for a mudslide. Using google earth, its easy to look at the pictures and see the line in the forest where the earth has given way over the years. It looks like the geology of the area is mostly sand and it was mentioned it was glacier related. All this makes sense.

We also know that homeowner’s insurance tries to estimate the risk of a policy before its issued and its safe to assume that the policies either did not cover mudslides or catastrophes of this nature for exactly this reason.

All of this is good hind-sight. How do we do better?

Its pretty clear from the aerial photography that the land across the river was ripe for a slide. The think sandy line, the sparse vegetation and other visual aspects from google earth/maps shows that detail. Its a classic geological situation. I’ll also bet the lithography of the area is sand, alot of sand, and more sand possible on top of hard rock at the base.

So lets propose that bigdata should help give homeowners a risk assessment of their house which they can monitor over time and use to evaluate the potential devastation that could come from a future house purchase. Insurance costs alone should not prevent homeowners from assessing their risks. Even “alerts” from local government officials sometimes fall on deaf ears.

Here’s the setup:

Use google earth maps to interpret the images along rivers, lakes and ocean fronts
Use geological studies. Its little known that universities and the government have conducted extensive studies in most areas of the US and we could, in theory, make that information more accessible and usable
Use aerial photography analysis to evaluate vegetation density and surface features
Use land data to understand the terrain e.g. gradients and funnels
Align the data with fault lines, historical analysis of events and other factors.
Calculate risk scores for each home or identify homes in an area of heightened risk.

Do this and repeat monthly for every home in the US at risk and create a report for homeowners to read.

Now that would be bigdata in action!

This is a really hard problem to solve but if the bigdata “industry” wants to prove that its good at data fusion on a really hard problem that mixes an extremely complex and large amount of disparate data and has public benefit, this would be it.