Storage provider Cloudian raises $94M

Cloudian, a company that specializes in helping businesses store petabytes of data, today announced that it has raised a $94 million Series E funding round. Investors in this round, which is one of the largest we have seen for a storage vendor, include Digital Alpha, Fidelity Eight Roads, Goldman Sachs, INCJ, JPIC (Japan Post Investment Corporation), NTT DOCOMO Ventures and WS Investments. This round includes a $25 million investment from Digital Alpha, which was first announced earlier this year.

With this, the seven-year-old company has now raised a total of $174 million.

As the company told me, it now has about 160 employees and 240 enterprise customers. Cloudian has found its sweet spot in managing the large video archives of entertainment companies, but its customers also include healthcare companies, automobile manufacturers and Formula One teams.

What’s important to stress here is that Cloudian’s focus is on on-premise storage, not cloud storage, though it does offer support for multi-cloud data management, as well. “Data tends to be most effectively used close to where it is created and close to where it’s being used,” Cloudian VP of worldwide sales Jon Ash told me. “That’s because of latency, because of network traffic. You can almost always get better performance, better control over your data if it is being stored close to where it’s being used.” He also noted that it’s often costly and complex to move that data elsewhere, especially when you’re talking about the large amounts of information that Cloudian’s customers need to manage.

Unsurprisingly, companies that have this much data now want to use it for machine learning, too, so Cloudian is starting to get into this space, as well. As Cloudian CEO and co-founder Michael Tso also told me, companies are now aware that the data they pull in, no matter whether that’s from IoT sensors, cameras or medical imaging devices, will only become more valuable over time as they try to train their models. If they decide to throw the data away, they run the risk of having nothing with which to train their models.

Cloudian plans to use the new funding to expand its global sales and marketing efforts and increase its engineering team. “We have to invest in engineering and our core technology, as well,” Tso noted. “We have to innovate in new areas like AI.”

As Ash also stressed, Cloudian’s business is really data management — not just storage. “Data is coming from everywhere and it’s going everywhere,” he said. “The old-school storage platforms that were siloed just don’t work anywhere.”

Abbyy leaked 203,000 sensitive customer documents in server lapse

Abbyy, a maker of optical character recognition software, has exposed a trove of sensitive customer documents after a database server was left online without a password.

The exposed server was found by former Kromtech security researcher Bob Diachenko, who now works independently. In a blog post shared prior to publication, he said one of the company’s MongoDB servers was mistakenly configured for public access. He told TechCrunch that the server contained 203,896 scanned files, including contracts, non-disclosure agreements, memos and other highly sensitive documents dating back to 2012.

The data also included corporate usernames and scrambled passwords.

The Moscow-based company specializes in document capture products and services, including converting physical documents to searchable and indexable digital content across a range of languages.

The company claims to serve thousands of organizations and over 50 million users.

After a private disclosure earlier this month, the server was pulled offline. Abbyy confirmed the exposure in an email Monday but did not say why the storage server was left open for anyone to access.

“The incident in question concerns one rather than several customers and files bearing commercial information,” said spokesperson Anna Ivanova-Galitsina. “The customer has been duly notified and we are cooperating on corrective measures.”

“As soon as [Diachenko] notified us we locked external access to the documents. We have made all the notifications that are legally required, have conducted a full corrective security review of our infrastructure, processes and procedures,” the spokesperson said. The company said that the exposure was “a one-off incident and doesn’t compromise any other services, products or clients of the company,” but noted that a “further analysis is ongoing.”

When pressed, the company would not confirm the name of the customer affected. Abbyy has dozens of major global customers, including Volkswagen, PepsiCo, McDonalds, and the Australian Taxation Office.

Abbyy did not say if anyone else accessed the database.

It’s the latest in a string of exposed MongoDB databases found by Diachenko in recent months, including a popular virtual keyboard app with 31 million users and more recently an app for connecting babysitters.

MongoDB is widely used across the enterprise for scalability and versatility, but many older versions of the database software still in use today operate without a password by default. Last year, hackers took advantage of thousands of exposes servers by downloading and deleting their contents — effectively holding them for ransom.

Audit of NHS Trust’s app project with DeepMind raises more questions than it answers

A third party audit of a controversial patient data-sharing arrangement between a London NHS Trust and Google DeepMind appears to have skirted over the core issues that generated the controversy in the first place.

The audit (full report here) — conducted by law firm Linklaters — of the Royal Free NHS Foundation Trust’s acute kidney injury detection app system, Streams, which was co-developed with Google-DeepMind (using an existing NHS algorithm for early detection of the condition), does not examine the problematic 2015 information-sharing agreement inked between the pair which allowed data to start flowing.

“This Report contains an assessment of the data protection and confidentiality issues associated with the data protection arrangements between the Royal Free and DeepMind . It is limited to the current use of Streams, and any further development, functional testing or clinical testing, that is either planned or in progress. It is not a historical review,” writes Linklaters, adding that: “It includes consideration as to whether the transparency, fair processing, proportionality and information sharing concerns outlined in the Undertakings are being met.”

Yet it was the original 2015 contract that triggered the controversy, after it was obtained and published by New Scientist, with the wide-ranging document raising questions over the broad scope of the data transfer; the legal bases for patients information to be shared; and leading to questions over whether regulatory processes intended to safeguard patients and patient data had been sidelined by the two main parties involved in the project.

In November 2016 the pair scrapped and replaced the initial five-year contract with a different one — which put in place additional information governance steps.

They also went on to roll out the Streams app for use on patients in multiple NHS hospitals — despite the UK’s data protection regulator, the ICO, having instigated an investigation into the original data-sharing arrangement.

And just over a year ago the ICO concluded that the Royal Free NHS Foundation Trust had failed to comply with Data Protection Law in its dealings with Google’s DeepMind.

The audit of the Streams project was a requirement of the ICO.

Though, notably, the regulator has not endorsed Linklaters report. On the contrary, it warns that it’s seeking legal advice and could take further action.

In a statement on its website, the ICO’s deputy commissioner for policy, Steve Wood, writes: “We cannot endorse a report from a third party audit but we have provided feedback to the Royal Free. We also reserve our position in relation to their position on medical confidentiality and the equitable duty of confidence. We are seeking legal advice on this issue and may require further action.”

In a section of the report listing exclusions, Linklaters confirms the audit does not consider: “The data protection and confidentiality issues associated with the processing of personal data about the clinicians at the Royal Free using the Streams App.”

So essentially the core controversy, related to the legal basis for the Royal Free to pass personally identifiable information on 1.6M patients to DeepMind when the app was being developed, and without people’s knowledge or consent, is going unaddressed here.

And Wood’s statement pointedly reiterates that the ICO’s investigation “found a number of shortcomings in the way patient records were shared for this trial”.

“[P]art of the undertaking committed Royal Free to commission a third party audit. They have now done this and shared the results with the ICO. What’s important now is that they use the findings to address the compliance issues addressed in the audit swiftly and robustly. We’ll be continuing to liaise with them in the coming months to ensure this is happening,” he adds.

“It’s important that other NHS Trusts considering using similar new technologies pay regard to the recommendations we gave to Royal Free, and ensure data protection risks are fully addressed using a Data Protection Impact Assessment before deployment.”

While the report is something of a frustration, given the glaring historical omissions, it does raise some points of interest — including suggesting that the Royal Free should probably scrap a Memorandum of Understanding it also inked with DeepMind, in which the pair set out their ambition to apply AI to NHS data.

This is recommended because the pair have apparently abandoned their AI research plans.

On this Linklaters writes: “DeepMind has informed us that they have abandoned their potential research project into the use of AI to develop better algorithms, and their processing is limited to execution of the NHS AKI algorithm… In addition, the majority of the provisions in the Memorandum of Understanding are non-binding. The limited provisions that are binding are superseded by the Services Agreement and the Information Processing Agreement discussed above, hence we think the Memorandum of Understanding has very limited relevance to Streams. We recommend that the Royal Free considers if the Memorandum of Understanding continues to be relevant to its relationship with DeepMind and, if it is not relevant, terminates that agreement.”

In another section, discussing the NHS algorithm that underpins the Streams app, the law firm also points out that DeepMind’s role in the project is little more than helping provide a glorified app wrapper (on the app design front the project also utilized UK app studio, ustwo, so DeepMind can’t claim app design credit either).

“Without intending any disrespect to DeepMind, we do not think the concepts underpinning Streams are particularly ground-breaking. It does not, by any measure, involve artificial intelligence or machine learning or other advanced technology. The benefits of the Streams App instead come from a very well-designed and user-friendly interface, backed up by solid infrastructure and data management that provides AKI alerts and contextual clinical information in a reliable, timely and secure manner,” Linklaters writes.

What DeepMind did bring to the project, and to its other NHS collaborations, is money and resources — providing its development resources free for the NHS at the point of use, and stating (when asked about its business model) that it would determine how much to charge the NHS for these app ‘innovations’ later.

Yet the commercial services the tech giant is providing to what are public sector organizations do not appear to have been put out to open tender.

Also notably excluded in the Linklaters’ audit: Any scrutiny of the project vis-a-vis competition law, public procurement law compliance with procurement rules, and any concerns relating to possible anticompetitive behavior.

The report does highlight one potentially problematic data retention issue for the current deployment of Streams, saying there is “currently no retention period for patient information on Streams” — meaning there is no process for deleting a patient’s medical history once it reaches a certain age.

“This means the information on Streams currently dates back eight years,” it notes, suggesting the Royal Free should probably set an upper age limit on the age of information contained in the system.

While Linklaters largely glosses over the chequered origins of the Streams project, the law firm does make a point of agreeing with the ICO that the original privacy impact assessment for the project “should have been completed in a more timely manner”.

It also describes it as “relatively thin given the scale of the project”.

Giving its response to the audit, health data privacy advocacy group MedConfidential — an early critic of the DeepMind data-sharing arrangement — is roundly unimpressed, writing: “The biggest question raised by the Information Commissioner and the National Data Guardian appears to be missing — instead, the report excludes a “historical review of issues arising prior to the date of our appointment”.

“The report claims the ‘vital interests’ (i.e. remaining alive) of patients is justification to protect against an “event [that] might only occur in the future or not occur at all”… The only ‘vital interest’ protected here is Google’s, and its desire to hoard medical records it was told were unlawfully collected. The vital interests of a hypothetical patient are not vital interests of an actual data subject (and the GDPR tests are demonstrably unmet).

“The ICO and NDG asked the Royal Free to justify the collection of 1.6 million patient records, and this legal opinion explicitly provides no answer to that question.”

YugaByte’s new database software rakes in $16 million so developers can move to any cloud

Looking to expand the footprint of its toolkit giving developers a unified database software that can work for both relational and post-relational databases, YugaByte has raised $16 million in a new round of funding.

For company co-founder, Kannan Muthukkaruppan, the new database software liberates developers from the risk of lock-in with any provider of cloud compute as the leading providers at Amazon, Microsoft and Google jockey for the pole position among software developers and reduces programming complexity.

“YugaByte DB makes it possible for organizations to standardize on a single, distributed database to support a multitude of workloads requiring both SQL and NoSQL capabilities. This speeds up the development of applications while at the same time reduces operational complexity and licensing costs,” said Kannan Muthukkaruppan, co-founder and chief executive of YugaByte, in a statement. 

Muthukkaruppan and his fellow co-founders know their way around database software. Alongside Karthik Ranganathan and Mikhail Bautin, Muthukkaruppan built the NoSQL platform that powered Facebook Messenger and its internal time series monitoring system. Before that Ranganthan and Muthukkaruppan had spent time working at Oracle . And after Facebook the two men were integral to the development of Nutanix’s hybrid infrastructure.

“These are tens of petabytes of data handling tens of millions of messages a day,” says Muthukkaruppan.

Rangantahan and Muthukkaruppan left Nutanix in 2016 to begin working on YugaByte’s database software. What’s important, founders and investors stress is that YugaByte breaks any chains that would bind software developers to a single platform or provider.

While developers can move applications from one cloud provider to another, they have to maintain multiple databases across these systems so that they inter-operate.

“YugaByte’s value proposition is strong for both CIOs, who can avoid cloud vendor lock-in at the database layer, and for developers, who don’t have to re-architect existing applications because of YugaByte’s built-in native compatibility to popular NoSQL and SQL interfaces,” said Deepak Jeevankumar,  a managing director at Dell Technologies Capital

Jeevankumar’s firm co-led the latest $16 million financing for YugaByte alongside previous investor Lightspeed Venture Partners.

What attracted Lightspeed and Dell’s new investment arm was the support the company has from engineers in the trenches, like Ian Andrews, the vice president of products at Pivotal. “YugaByte is going to be interesting to any enterprise requiring an elastic data tier for their cloud-native applications,” Andrews said in a statement. “Even more so if they have a requirement to operate across multiple clouds or in a Kubernetes environment.” 

With new software infrastructure, portability is critical, since data needs to move between and among different software architectures.

The problem is that traditional databases have a hard time scaling, and new database technologies aren’t incredibly reliable when it comes to data consistency and durability. So developers have been using legacy database software from folks like Oracle and PostgreSQL for their systems of record and then new database software like Microsoft Azure’s CosmosDB, Amazon’s DynamoDB, Apache’s Cassandra (which the fellas used at Facebook), or MongoDB for distributed transactions for applications (things like linear write/read scalability, plus auto-rebalancing, sharding and failover).

With YugaByte, software developers get support for Apache Cassandra and Redis APIs, along with support for PostgreSQL, which the company touts as the best of both the relational and post-relational database worlds.

Now that the company has $16 million more in the bank, it can begin spreading the word about the benefits of its new database software, says Muthukkaruppan.

“With the additional funding we will accelerate investments in engineering, sales and customer success to scale our support for enterprises looking to bring their business-critical data to the cloud,” he said in a statement. 

To truly protect citizens, lawmakers need to restructure their regulatory oversight of big tech

If members of the European Parliament thought they could bring Mark Zuckerberg to heel with his recent appearance, they underestimated the enormous gulf between 21st century companies and their last-century regulators.

Zuckerberg himself reiterated that regulation is necessary, provided it is the “right regulation.”

But anyone who thinks that our existing regulatory tools can reign in our digital behemoths is engaging in magical thinking. Getting to “right regulation” will require us to think very differently.

The challenge goes far beyond Facebook and other social media: the use and abuse of data is going to be the defining feature of just about every company on the planet as we enter the age of machine learning and autonomous systems.

So far, Europe has taken a much more aggressive regulatory approach than anything the US was contemplating before or since Zuckerberg’s testimony.

The European Parliament’s Global Data Protection Regulation (GDPR) is now in force, which extends data privacy rights to all European citizens regardless of whether their data is processed by companies within the EU or beyond.

But I’m not holding my breath that the GDPR will get us very far on the massive regulatory challenge we face. It is just more of the same when it comes to regulation in the modern economy: a lot of ambiguous costly-to-interpret words and procedures on paper that are outmatched by rapidly evolving digital global technologies.

Crucially, the GDPR still relies heavily on the outmoded technology of user choice and consent, the main result of which has seen almost everyone in Europe (and beyond) inundated with emails asking them to reconfirm permission to keep their data. But this is an illusion of choice, just as it is when we are ostensibly given the option to decide whether to agree to terms set by large corporations in standardized take-it-or-leave-it click-to-agree documents.  

There’s also the problem of actually tracking whether companies are complying. It is likely that the regulation of online activity requires yet more technology, such as blockchain and AI-powered monitoring systems, to track data usage and implement smart contract terms.

As the EU has already discovered with the right to be forgotten, however, governments lack the technological resources needed to enforce these rights. Search engines are required to serve as their own judge and jury in the first instance; Google at last count was doing 500 a day.  

The fundamental challenge we face, here and throughout the modern economy, is not: “what should the rules for Facebook be?” but rather, “how can we can innovate new ways to regulate effectively in the global digital age?”

The answer is that we need to find ways to harness the same ingenuity and drive that built Facebook to build the regulatory systems of the digital age. One way to do this is with what I call “super-regulation” which involves developing a market for licensed private regulators that serve two masters: achieving regulatory targets set by governments but also facing the market incentive to compete for business by innovating more cost-effective ways to do that.  

Imagine, for example, if instead of drafting a detailed 261-page law like the EU did, a government instead settled on the principles of data protection, based on core values, such as privacy and user control.

Private entities, profit and non-profit, could apply to a government oversight agency for a license to provide data regulatory services to companies like Facebook, showing that their regulatory approach is effective in achieving these legislative principles.  

These private regulators might use technology, big-data analysis, and machine learning to do that. They might also figure out how to communicate simple options to people, in the same way that the developers of our smartphone figured that out. They might develop effective schemes to audit and test whether their systems are working—on pain of losing their license to regulate.

There could be many such regulators among which both consumers and Facebook could choose: some could even specialize in offering packages of data management attributes that would appeal to certain demographics – from the people who want to be invisible online, to those who want their every move documented on social media.

The key here is competition: for-profit and non-profit private regulators compete to attract money and brains the problem of how to regulate complex systems like data creation and processing.

Zuckerberg thinks there’s some kind of “right” regulation possible for the digital world. I believe him; I just don’t think governments alone can invent it. Ideally, some next generation college kid would be staying up late trying to invent it in his or her dorm room.

The challenge we face is not how to get governments to write better laws; it’s how to get them to create the right conditions for the continued innovation necessary for new and effective regulatory systems.

BigID lands in the right place at the right time with GDPR

Every startup needs a little skill and a little luck. BigID, a NYC-based data governance solution has been blessed with both. The company, which helps customers identify sensitive data in big data stores, launched at just about the same time that the EU announced the GDPR data privacy regulations. Today, the company is having trouble keeping up with the business.

While you can’t discount that timing element, you have to have a product that actually solves a problem and BigID appears to meet that criteria. “This how the market is changing by having and demanding more technology-based controls over how data is being used,” company CEO and co-founder Dimitri Sirota told TechCrunch.

Sirota’s company enables customers to identify the most sensitive data from among vast stores of data. In fact, he says some customers have hundreds of millions of users, but their unique advantage is having built the solution more recently. That provides a modern architecture that can scale to meet these big data requirements, while identifying the data that requires your attention in a way that legacy systems just aren’t prepared to do.

“When we first started talking about this [in 2016] people didn’t grok it. They didn’t understand why you would need a privacy-centric approach. Even after 2016 when GDPR passed, most people didn’t see this. [Today] we are seeing a secular change. The assets they collect are valuable, but also incredibly toxic,” he said. It is the responsibility of the data owner to identify and protect the personal data under their purview under the GDPR rules, and that creates a data double-edged sword because you don’t want to be fined for failing to comply.

GDPR is a set of data privacy regulations that are set to take effect in the European Union at the end of May. Companies have to comply with these rules or could face stiff fines. The thing is GDPR could be just the beginning. The company is seeing similar data privacy regulations in Canada, Australia, China and Japan. Something akin go this could also be coming to the United States after Facebook CEO, Mark Zuckerberg appeared before Congress earlier this month. At the very least we could see state-level privacy laws in the US, Sirota said.

Sirota says there are challenges getting funded as a NYC startup because there hadn’t been a strong big enterprise ecosystem in place until recently, but that’s changing. “Starting an enterprise company in New York is challenging. Ed Sim from Boldstart [A New York City early stage VC firm that invests in enterprise startups] has helped educate through investment and partnerships. More challenging, but it’s reaching a new level now,” he said.

The company launched in 2016 and has raised $16.1 million to date. It scored the bulk of that in a $14 million round at the end of January. Just this week at the RSAC Sandbox competition at the RSA Conference in San Francisco, BigID was named the Most Innovative Startup in a big recognition of the work they are doing around GDPR.

Facebook suspends Cambridge Analytica, the data analysis firm that worked for the Trump campaign

Facebook announced late Friday that it had suspended the account of Strategic Communication Laboratories, and its political data analytics firm Cambridge Analytica — which used Facebook data to target voters for President Donald Trump’s campaign in the 2016 election.

In a statement released by Paul Grewal, the company’s vice president and deputy general counsel, Facebook explained that the suspension was the result of a violation of its platform policies.

Cambridge Analytica apparently obtained Facebook user information without approval from the social network through work the company did with a University of Cambridge psychology professor named Dr. Aleksandr Kogan. Kogan developed an “thisisyourdigitallife” that purported to offer a personality prediction that would be “a research app used by psychologists”.

Apparently around 270,000 people downloaded the app and gave Kogan access to both geographic information, content they had liked, and limited information about users’ friends.

That information was then passed on to Cambridge Analytica and Christopher Wylie of Eunoia Technologies.

Facebook said it first identified the violation in 2015 and took action — apparently without informing users of the violation. The company demanded that Kogan, Cambridge Analytica and Wylie certify that they had destroyed the information.

Over the past few days, Facebook said it received reports (from sources it would not identify) that not all of the data Cambridge Analytica, Kogan, and Wylie collected had been deleted. While Facebook investigates the matter further, the company said it had taken the step to suspend the Cambridge Analytica account.

The UK-based Cambridge Analytica played a pivotal role in the U.S. presidential election, according to its own chief executive’s admission in an interview with TechCrunch late last year.

In the interview, Cambridge Analytica’s chief executive Alexander Nix said that his company had detailed hundreds of thousands of psychographic profiles of Americans throughout 2014 and 2015 (the time when the company was working with Sen. Ted Cruz on his campaign).

…We used psychographics all through the 2014 midterms. We used psychographics all through the Cruz and Carson primaries. But when we got to Trump’s campaign in June 2016, whenever it was, there it was there was five and a half months till the elections. We just didn’t have the time to rollout that survey. I mean, Christ, we had to build all the IT, all the infrastructure. There was nothing. There was 30 people on his campaign. Thirty. Even Walker it had 160 (it’s probably why he went bust). And he was the first to crash out. So as I’ve said to other of your [journalist] colleagues, clearly there’s psychographic data that’s baked-in to legacy models that we built before, because we’re not reinventing the wheel. [We’ve been] using models that are based on models, that are based on models, and we’ve been building these models for nearly four years. And all of those models had psychographics in them. But did we go out and rollout a long form quantitive psychographics survey specifically for Trump supporters? No. We just didn’t have time. We just couldn’t do that.

It’s likely that some of that psychographic data came from information culled by Kogan. The tools that Cambridge Analytica deployed have been at the heart of recent criticism of Facebook’s approach to handling advertising and promoted posts on the social media platform.

Nix, from Cambridge Analytica, acknowledged that advertising was ahead of most political messaging and that the tools used for creating campaigns could be effective in the political arena as well.

There’s no question that the marketing and advertising world is ahead of the political marketing the political communications world. And there are some things that I would definitely [say] I’m very proud of that we’re doing which are innovative. And there are some things which is best practice digital advertising, best practice communications which we’re taking from the commercial world and are bringing into politics.

Advertising agencies are using some of these techniques on a national scale. For us it’s been very refreshing, really breaking into the commercial and brand space… walking into a campaign where you’re basically trying to educate the market on stuff they simply don’t understand. You walk into a sophisticated brand or into an advertising agency, and the conversation [is sophisticated] You go straight down to: “Ah, so you’re doing a programmatic campaign, you can augment that with some linear optimized data… they understand it.” They know it’s their world, and now it comes down to the nuances. “So what exactly are you doing that’s going to be a bit more effective and give us an extra 3 percent or 4 percent there.” It’s a delight. You know these are professionals who really get this world and that’s where we want to be operating.

 

 

Data management startup Rubrik confirms $180M round at a $1.3B valuation

Data flying over group of laptops to illustrate data integration/sharing. Rubrik, a startup that provides data backup and recovery services for enterprises across both cloud and on-premises environments, has closed a $180 million round of funding that values the company at $1.3 billion. The news confirms a report we ran earlier this week noting that the company was raising between $150 million and $200 million. Read More