Key takeaways from CyberSec&AI Connected

Even since my beginnings in the cybersecurity field, I’ve been told that academia and business should work together more in our field. One example of this collaboration was the CyberSec&AI Connected conference that I attended, organized by Avast and Czech Technical University in Prague. The conference hosted many great speakers from different parts of the world and also from different professional spheres. During the conference, many great ideas and quotes got stuck in my head and I want to share them with you.

Sharing my data if I have control over it

Large companies that process and store your personal data are giving you more and more control over what data they gather and store regarding your account.

Dr. Alessandro Acquisti, Privacy Economist & Professor on IT and Public Policy at the Heinze College, Carnegie Mellon University, noted that giving people control over their data is a good thing, but he also mentioned something that even today many users ignore or don’t know: that essentially they are being forced to take control of their data - and to take responsibility for it. As Alessandro said, this is fundamentally wrong, and I agree - because when you have disclosed some information that became public because you agreed to it (whether you forgot it or, in most cases ignored or didn’t understand it), it is your fault in case something happens - but not the platform’s fault.

As I see it, and the outcome of the research Alessandro presented supports this, giving people the feeling that they are in control of their data and what is shown to others or used by the data platform, will result in more people providing sensitive answers to the data platform, even if they feel that the answers are highly sensitive - because, if others are not seeing it, it doesn’t hurt me, right?

Alessandro presented two illuminating research studies. In one study, 169 people were asked to fill out a questionnaire asking them a series of less personal and highly personal questions - all questions were voluntary. They were informed at the outset that all answers would be published by the researchers. Even so, around 40% of the respondents answered the highly intrusive questions. The other study used the same questions, but respondents were asked for every question if they agreed with publishing their response. The outcome was staggering - nearly 80% of the respondents responded to the highly sensitive questions. Just by giving them control over what data others may know, they became more willing to share sensitive information with the researchers.

A question remains: if people share this information with the platform but choose that they don’t want others to see it, why provide the sensitive information to the platform in the first place? We tend to assume that what no one sees, doesn’t hurt us, right?

(Image from Journal of Privacy and Confidentiality (2012), Silent Listeners: The Evolution of Privacy and Disclosure on Facebook by Fred Stutzman, Ralph Gross, Alessandro Acquisti.)

Based on a graph that Alessandro shared, we can see that social network users have become more and more cautious about publicly sharing their birthdays and probably other sensitive information, over time. The same trend holds for publicly sharing the name of your high school - people became similarly more reluctant to make this information public over time. And while the name of your high school might not seem sensitive, plenty of sensitive services, like your banking app, might allow you to use this as the answer to a security question, so even innocuous information can be highly sensitive in the right context. But what is interesting is the drop and inversion of the trend. Apparently in one year, the number of people sharing their high school publicly doubled. But wait, did it? In December 2009, Facebook changed its default settings. The Federal Trade Commission did fine Facebook for the change, but by then the harm has already been done to users - since on the internet, whatever you share, might stay there forever.

Nevertheless, why are platforms keen to get your data even though it’s not shared publicly, or even with your group of friends? For many reasons, especially for ad targeting. Even when platforms do not share these data with anyone else, they are using it to generate profits for themselves and others in the ad business. Because the platform that best targets its users for advertising is the one that profits most from their data, and that will result in an oligopoly - perhaps even bigger than it already is.

Do I have enough information to be in control of my data?

Multiple times during the conference, I’ve heard that as a user, having control over what data the platforms are collecting about me is a good thing. This makes perfect sense for me. However, do I really know what will actually happen to that data? So, one question could be, “Do they need this information?” Another question could be that if they need it, then why? Often, these platforms state that the data will be used for statistical purposes, for enhancing the platform, etc. But is it well-enough described for me to be comfortable with sharing my data with the platform - and for the general public to understand it?

Miroslav Umlauf, Chief Data Officer at Avast, asked Alessandro about the younger generation, and if their awareness of data privacy is better than that of adults. You might be surprised to hear that it is better. As Alessandro said, and I agree with him, youth are well aware of the technologies and possible risks they bring.

Alessandro posited that this might be one reason for the migrations youth make from platform to platform, since they are less confident about using them. Based on my work with younger kids teaching about security and privacy online, I believe that’s too simple of an explanation. There are many reasons why users migrate to different platforms: expense, the “cool” factor, a lack of / introduction of new features, and a critical mass of friends or peers. While youth have the knowledge and means to be in control of their data, I do wonder if they are maturely deciding if the data is okay to share or not, or if there is something else driving their decisions - like social pressure from peers and friends or platforms, many of which have created their UX to encourage data sharing. A nice example is the biography or “about me” section of some social media accounts, which ask users to fill out a large list of questions - some less, some more intrusive.

You can of course choose what you’ll share and what you’ll disclose to the platform itself. But when you look on other profiles and see that lots of people have this information public, you might wonder, why shouldn’t I have it public too? Kids and teenagers are often worried about standing out from the small crowd they are in. They are often afraid that by standing out, they might become a target. I wonder if on some level, sharing information publicly creates the idea that someone can be trusted; those who don’t share may be seen as less trustworthy within their group because they’ve not disclosed the same level of information as others. If this is the case, just one vocal member of this crowd needs to be more open than is smart to encourage others to disclose more information than they probably should. Peer pressure is a powerful force, as is avoiding social duress. Unfortunately, the impacts of sharing too much data may be much longer-lasting.

Synthetic Data Paradox

Developing Artificial Intelligence (AI) and Machine Learning (ML) models is hard - and expensive. For a computer to understand the tasks we give it, the computer needs to have some data that it can learn from in order to solve the problem we gave it. GPT-3 (an ML model that produces readable, human-like text), for example, was trained by publicly accessible books and internet texts. This could be one of the easiest ways to train the ML model, but it still requires extensive data scraping and cleaning.

To address this, researchers are now talking about artificially creating data for ML models that can be used to learn from - called Synthetic Data. This method is far cheaper than getting the data that has been already generated, and as Miroslav Umlauf noted during the panel discussion, the use of synthetic data is not regulated since the data are artificial. As promising as this may sound in theory, the question of whether the synthetic data have similar utility as real data is a hot one. David Freeman, Anti-Abuse Research Scientist and Engineer at Facebook, pointed out that according to “Information Theory, if there is really no risk and it is truly anonymous, then there is no information in there for utility.” He also said that he hasn’t yet seen a convincing example of the usefulness of synthetic data. Micah Sheller, Machine Learning Research Scientist at Intel is skeptical about the basic problem he sees in Synthetic Data, which is that, as he says, if you are generating useful information, you are basing it on likely private data that you are deriving from, so it is derived data at the end not data from thin air. Reza Shokri, Assistant Professor at The National University of Singapore further says that the claim that Synthetic Data is privacy-preserving is wrong; it can be made privacy-preserving but at the high cost of its utility.

From my point of view, if these problems can be resolved, synthetic data may become more widely used than real data, since it provides better scalability combined with likely lower costs.

What about the metaverse?

As a Co-founder of a startup, Confer-O-Matic, where we develop a 3D virtual world intended for networking and changing the future of workplaces, this question is really close to my heart. How will we in the future secure the metaverse environment? Are there some threats that we already know of? And maybe a more basic question, what is the metaverse, actually?

I like to think that the metaverse is not just the virtual world like the one portrayed in movies like Ready Player One, but more as an important milestone for humanity, where we will start thinking about the virtual world more than the real one and when the decisions we make and their consequences we do in a digital world will have a real impact on our physical lives.

Darren Shou, Head of Technology at NortonLifeLock, stated the metaverse is interesting because instead of consuming the content through the screen, we’ll be living the experiences. That fact could amplify certain attacks; for instance, strobe light effects could be used in an adversarial and intentional way against someone.

Michal Pěchouček, CTO at Avast and Professor at the CTU in Prague (and also a member of the Avast Foundation’s Advisory Board), said nicely that today, we are using our (sense) perception to distinguish between the real and the fake. But as soon as we’ll be fully immersed in the virtual world, technology will be our new eyes. We will rely on technology to help us distinguish what is truth and what is a crafted lie.

Today, some of the potential problems of the metaverse can already be demonstrated. There are virtual platforms where users can create realistic avatars. However, if I use a photo of another person to create an avatar, upload a voice, and process that into a voice model, I can pretend to be another person on that platform and pass off my views as a person who may have more reach.

In the metaverse, however, the advancement of technology and our inability to use our senses to evaluate data from our perceptions as real or fake could take these problems to a whole new level. Regarding this, I fear that by the time the virtual world starts to be misused in influencing people and their opinions with fake data, it may be too late to overcome. By then, we may already be unable to discern what is real by relying on our senses, or even what is presented to us virtually.

Of course, there are always solutions to these kinds of problems, which must evolve as the technology does. Cryptography - encryption - as Michal Pěchouček nicely pointed out, will be a core principle of securing the metaverse - and our experiences of it.