Arion Research LLC

View Original

Disambiguation Podcast Ep 3 - Balancing AI Adoption and Privacy

Michael: Welcome to the Disambiguation Podcast, where each week we try to remove some of the confusion around AI and business automation by talking to experts across a broad spectrum of business use cases and the supporting technology. I'm your host, Michael Fauscette. If you're new to the show, we release every Friday as a podcast on all the major podcast channels, as a video on YouTube, and we also post a transcript of the episode on the Arion Research blog. Now, I'm excited today because we're gonna talk about a topic that I think is extremely important when we look at AI and automation and generative AI particularly. And that's balancing the use of AI enabled solutions with privacy and also regulatory compliance challenges.

So I'm gonna bring in my guest and I'm excited to have Soribel Feliz join me today. And she is a thought leader in the field of responsible AI emerging tech, and she's the founder and CEO of the company Responsible AI = Inclusive AI. Previously she worked for the Democracy Forward Initiative at Microsoft and she also worked in the Trust and Safety Organization at Meta. Before joining Meta, she was a foreign service officer for the US Department of State with overseas and domestic postings. So welcome Soribel.

Soribel: Thank you. Thank you for having me. I'm so excited to be here.

Michael: Great. this is a fun topic. It's an important topic and why don't we just jump right in.

Let's do it. What are the main privacy concerns when using AI, gen AI, large language models, machine learning algorithms in B2B use cases?

Soribel: Yeah, there are so many. How much time do we have?

Michael: As much time as you need.

Soribel: So I, so let me start with one, which is the over collection of data beyond what's needed for training the models. I get companies want as much data as they want. They want big data, they want lots of data. But if they collect more customer data than they need to train their models, then they expose themselves to increased privacy risks if they have a breach, right? So that's one. Another one is, going on this one, using that data for purposes other than originally intended without consent. I give you my data and you tell me you're going to use it for this, but you have it and you wanna use it for other things. Now, you didn't ask me for consent for that other purpose. So that's a risk. That's a risk. And those are basic privacy risks.

But then you put AI and machine learning and then you go into, Having bias algorithms that may discriminate against certain people. You have the lack of explainability into how these models make decisions. Things like the block box that people talk about. So that's, I think those are some of the biggest risks.

And you also have your basic day-to-day risks, cybersecurity risks. Yeah, so important to have strong data governance in place.

Michael: Yeah, that makes sense. So I just recently did a survey and published a report on the Arion Research site on AI adoption. And it was about 400 respondents North America only. And one of the questions I asked had to do with, Data prep and what are your primary challenges? And let me show you this slide. I'm curious to see what you think of this. And I thought it was an interesting bit of data to show and it's probably no surprise that data quality comes out high. And the respondents are all AI associated. So either they're using AI in their business or they're running a project. They're decision makers, influencers, so they have to, they're involved in AI already so that probably also changes the response a little bit if we just went to the general population. But just curious what you think about the data on this one.

Soribel: Yeah, I'm not surprised. Data quality. There's a reason why data analysts and engineers, data engineers are in such high demand because, sometimes company companies can have this huge amount of data, but. It's all over the place. It's, yeah. It's not well put together, not well organized.

And there's again, the governance part of data privacy. I, yeah, absolutely. And you've seen how big companies and small companies have gotten in a lot of trouble, have gotten fined because they are not respecting the privacy regulations, especially from the EU. I think the EU is a tough, it's a tough place in terms of privacy. And yeah, I can see that I can absolutely agree with those.

Michael: I mean it seems like the EU especially has gotten out ahead and there, there's always been a bit more privacy sensitivity there perhaps than in the US and other countries. So I guess that's not a surprise. I was also happy to see, although the numbers are a little low, but I was happy to see bias come up as well. It, although I was a little surprised that it was only 10% of the respondents listed it as a primary challenge,

Soribel: I'll come back in there.

Yeah. And yeah, for sure.

Michael: What, how can businesses ensure privacy when they're, when they're sharing sensitive information with a large language model? And that, that seems like one of the biggest risks because you train that model, obviously, they wanna train it on a lot of data, to make it extremely functional.

But then how do you. How do you control that internally? How do you make sure that you're, that you are paying attention to the permissions you have and don't have and you're not exposing people's personal data that they don't want exposed?

Soribel: Yeah, I think companies really need to take the security of customer data seriously, and not just telling the customer that they take their privacy and their data seriously, but actually implementing a culture of data security in the company, I think we can go back to the basics, right? Encrypt sensitive personal data, either when it's moving around, when you're sharing it to third parties or when it's sitting there in a database, in a server. Always use strong encryption methods. And also there are multiple methods of encryption that you can employ to not have any gaps. So make use of those. Another safeguard is like cleaning the data, right? Removing any personal identifiers in data so that you can't trace it back to specific people. If there is a breach and in this days you should just, maybe accept that there may be a breach. Like just live with it, right? And try to secure that data as much as possible. And I would say have very clear policies on when to delete or destroy data when it's no longer needed. If you don't need that piece of data deleted, you're just exposing yourself to more breaches. To more hacking. If you don't need the data, get rid of it. Have good policies around it.

Michael: It sounds like that it's a lot about being intentional and really elevating the fact that this could be a risk, this is important to the business. And then having policies around that and having people, be aware. And then maybe the other piece is practicing good data hygiene.

Soribel: That's very important.

Michael: Yeah, that makes sense. That's very important. And a lot of the conversations that I have and actually a lot of the data in the survey really did point to a couple of issues. One, finding the right amount of data that you need to train the model. 'cause it certainly takes a lot. And if have some data cleanliness, data quality issues already, then you're gonna have to do a lot of work there. And also just getting sources of data that can augment the data that you already have, right.

So that definitely top of mind.

Soribel: Yeah. And if I can just add one more, I would say always conduct a third party audit, and risk assessments. When you share data with partners, and these days, companies always have vendors. They always have third party sellers that they work with. So you can do everything right and then you share data with a vendor that doesn't practice data hygiene, and then you just expose yourself in case of a data breach or hacking. So always know who you're working with and make sure that they have the same standards of data cleanliness as you have.

Michael: Yeah, it sounds like that works in both directions, right? Make sure that your partner is aware of and protects what you share, but also then whatever they have, do they have permission for it? Is it clean? Is it good quality? That kind of thing. Yeah, that makes sense. Exactly. And I know like particularly around GDPR companies did invest a fair bit in, in trying to make sure they were compliant. And I know that started, several years ago. In fact, when I was at G2 we had a privacy GDPR governance team, and we worked through the policies 'cause that was really important. But then of course there hadn't been a ton of regulation lately around this. Is it top of mind for companies you think? I guess the survey shows that at least from a privacy perspective, they are thinking about it.

Soribel: I think so. I really do think so. Privacy professionals are in very high demand. People are, they've seen what's happened to some big tech companies. Yeah. In terms of fines in just the last year or so. Yep. Billions of dollars in fines. So I think, companies are taken this seriously. The thing is now you've added, with the introduction of AI and generative AI, you've added a bigger attack surface. So no. Now more than ever it's so important to be compliant with this regulations.

Michael: And a lot of those systems are forward or outward facing too, right?

The number one use case in that survey was chatbots. Obviously there's an even greater exposure when you're using the data in an interactive way in something that's gonna be public facing, right? So that has to be an interesting balance there between the two.

Soribel: Yeah. And I would say even in the US the agencies, the government agencies that are responsible for enforcing this are getting tougher as well.

They're getting more aggressive. So for example, the consumer protection. The Consumer Financial Protection Bureau recently they went after some financial institutions because these financial institutions were putting out chatbots there that were giving out either wrong information or were taking people's data, financial data and not protecting it properly. And when you call your bank because you lost your credit card or whatever, you're, you as a customer, you're in a distressed state, right? And so you are hit with this chatbot that doesn't give you the answers that you need, that doesn't let you move forward. And a lot of bad things can happen in a situation like that. So those agencies are getting tougher and I think companies can't ignore this anymore.

Michael: Yeah, I mean that brings up a another question then. So what advice do you have for companies? How can they comply with all the different data protection regulations that seem to be either out there or in progress so that they will be out there now, particularly around generative ai? Because certainly that seems like that is a, as you say, an attack surface that has a great deal of risk.

Soribel: Yeah. I would say do your homework. I would say to those companies, do your homework. You have to really take inventory and map out exactly what customer data you have, where it is, where it lives. Who has access to it. Where does it flow? Really know what you have. It's, think of it as a budget, right? Know what your net worth is, what, how much you make, how much you spend. I think it's very important to do that. I would also say ensure that you have the lawful basis for processing data like consent. Contracting agreements, legal requirements, especially if you do business with the EU. Like really make sure the consent is top of mind for you. Yeah, I would also add data, subject rights access, deletion, opt out and not just the EU, California has very similar mandates as GDPR, and I know Virginia is doing something similar as well.

Yeah. So it's not just the EU anymore. Always do data protection impact assessments when you're processing data, when you're processing sensitive data. Personally identifiable information. Have someone there in your team, being the data protection officer, not that the full person, I'm not, that's not what I mean, but someone who is responsible for overseeing compliance. It's, I think it's so important to have one team, one central org within your company to oversee compliance.

Michael: Yeah, that, that plays into the next question I was gonna ask you about what you could do to minimize risk of data leakage and unauthorized access to AI data. What other things do you think that companies can do?

And obviously being mindful is an important one, having a policy is important, but are there any other things that you'd share with them that they should be thinking about?

Soribel: Yeah. I would say and I know that's something that they did at Microsoft. A lot of threat modeling in vulnerability testing, risk management risk reviews that's very important. Like it's also called red teaming. That very important to do that. And I think just conducting comprehensive cybersecurity practices is so important because you, before gen AI and chatGPT, you had the basics, right? Install security updates, have need to know policies, access controls, that hasn't gone away with AI. The opposite is true. You have to do it even more. You have to do it, you have to have it even stronger practices. Access controls, compartmentalized data, have strong need to know policies anonymized data, that's very important. You have sensitive data about people that trust you with that data. Don't lose your trust. Anonymize that data. Yeah, I would say that. And just keep having strong cybersecurity practices.

Michael: Yeah, that makes sense. I and obviously there's a, there's some risk around malicious use, but I guess that's true of any data. But what about disinformation or misinformation? I know that's something that I've heard a fair bit of discussion around, particularly with the generative AI technology that can produce content at a very rapid pace compared to humans. How do companies deal with that? Or how can they help mitigate some of the risk around that misuse of their data?

Soribel: Yeah, and there's no easy way to do it. I think. Making sure that you know what you're working with, right? Always audit, always check to see that the information that and a large language model has given you is correct. Double check. Always double check. Trust but verify. Yeah. I can't think of a better time to say that. Trust but verify. And also protect yourself. How do I say this? Not protect yourself against people, but there can be data poisoning, right? How do you limit that? How can you prevent that I would say preventative measures where you do background checks on your own staff and, companies already do it, but keep doing it because this is a big deal.

Always do an ethics training for people that work for you that are dealing with this large language models. And never take the human out of the loop. Never take the human out of the decision making, never publish anything that a large language model has given you without having a human check it. If you can have two humans check it even better, because it's just as beautiful and nice. As it seems, the information that LLMs give you, let's say that 95% is correct, then you get comfortable and then you get that 5% that is wrong, that is misinformation, and then you have a scandal right there and you are liable and you have a negative reputational risks yeah. So it's so much easier, cheaper, and better to just have humans there reviewing the outputs.

Michael: But it reminds me of that story I read not long ago of an attorney that got in trouble because he was using ChatGPT for his briefs. And so he had it write a whole brief for him. He didn't check it and every one of the citations of different cases in there were made up. They looked good. They sounded really good. Exactly. But they didn't exist, right? Yeah. That's interesting. It's insane. I guess that's reputational risk at it, at the max.

Soribel: Exactly, and I think he was disbar or something.

I think

Michael: you're right. I think he was, what the consequences are for him. Yeah. He certainly was embarrassed if nothing else. So that brings up a couple of things then around legal implications, but also, there's some ethical implications around the use of different AI tools, large language models, machine learning, and I've heard you're not the first person to advise the audience that, human in the loop is extremely important. What are the other, how do you navigate some of these other legal, but it may be even more so the ethical ones. 'cause obviously regulatory things are at least reasonably well laid out once they get out there but for ethical implications, that must be an even bigger concern for a lot of companies.

Soribel: Yeah, I would say I think having clear documentation and communication is important. That helps maintain transparency. And when I say documentation, I mean it at the earliest. Moment when you are thinking of using an algorithm or large language model, machine learning model documentation always is important.

Conduct ethical reviews, right? If you are considering an algorithm for a specific use case, conduct an ethical review, not just one person that you hire to conduct an ethical review, but people inside of your own company. I would say a good cross-cultural and cross-functional group of people who get together and look at any risks stemming from using that algorithm for that use case? Any impacts? Any, anything that can go wrong with it. And so one example would be and this is not me bashing anyone, okay. But it's the IRS the IRS has used an algorithm that disproportionately targets people of color and low income people, specifically black people to be audited. And the reason why they, this algorithm disproportionately selects black people to be audited is because one of the variables that goes into the algorithm is the earned income tax credit, which is a credit that goes to low income people. It's a tax credit, right? And that tax credit is very, is subject to being misused. Not necessarily maliciously, but people think they're entitled to it because it's, oh, I'm low income or, it's easy to get it wrong. Yeah. And so if the algorithm over relies on that one variable it will disproportionately target low income people. And that is if, I think if the IRS had done an ethical review and also a risk assessment and an algorithmic impact, it would have known that not just because it's unethical to disproportionately target low income people, but it's also not financially wise, right? Because what can you get about, what can you get out of a low income person, right? You really should be targeting your high income, high net worth individuals, they can make all the difference. But if you can get a thousand dollars from a minimum wage worker, like that is not wise.

Michael: Yeah. I guess it's not a volume play. So that's why. Yeah, that's right.

Soribel: Exactly. It's not financially wise, not to mention that it's discriminatory.

Michael: And it's interesting 'cause the thing you bring up there, that's a really interesting sort of catching this and we talk about transparency and that's obviously really important but, tie it to bias. A simple thing like that really throws the algorithm out of kilter and makes it biased. But you if I just looked at that list of things you were gonna input to the algorithm. I'm not sure I would've thought, oh, that means we're going to get a disproportionately large amount of lower income people in the output to audit because of this. It's, the connection's tough. I that's a really, that's a really great example and also scary.

Soribel: It is. And I think the IRS has been using it for a while. They did say that they're going to stop using Yeah. They're gonna probably not use that variable. Give it so much weight.

Yeah. But, I don't think the IRS did this on purpose. I just think it, they didn't think it through it wasn't as top of mind as it is now.

Michael: Yeah, that makes sense. We weren't really talking much about bias a few years ago in algorithms, but certainly it's come up a lot over the last year or so. Maybe because of the awareness around chatGPT, I think it's raised a large number of the people's awareness of what's out there and what's happening and that sort of thing. So maybe we're finally being a little more sensitive. I like the advice of doing this kind of ethical review.

So that really adds an extra layer of transparency in your algorithm. If you have a team that can really go through that and look at it from a bias specific lens and perspective versus, kind of data quality in general. That makes a lot of sense. So that brings me to this next question and I, because transparency, I hear a lot of people talk about it, but I'm not sure how easy it is. So how can businesses maintain transparency and trust with their clients and their partners when they're using AI and then also make sure that they're safeguarding privacy? What how can they address both halves of that?

Soribel: Yeah, that's a good question. Companies have to build trust with their customers, with people and. They also have to realize that people and customers are now more aware and more alert about privacy than they were ever before. So they know more, they're more fluent in the lingo. So they know about block box, they know about, all they know that sometimes companies have terms of privacy or terms of use in very small letters that is very legal and hard to understand and people push back on that. So people, people are getting smarter in a way. So people, companies really have to do the work and build trust. And remember that trust is really easy to lose. So yeah, always keep that in mind. I would say communicate. With the public in plain language tell them what data is being used, how it's protected, all of that.

Michael: Yeah, that, that makes a lot of sense. One of the recent higher profile issues came out around Zoom's terms of service and how they could, how in the terms of service they made a change in March that I honestly do believe, by the way that the CEO who came out later and apologized said, Eric Yuan did apologize. He did say they're taking that out. They did, they did act once it was out there but that was almost like a, they made changes. A group of people made a change to terms of service that didn't necessarily think through the implications of the change they made, and yet that really did blow up and there was a big PR issue around it. So that there's a lot of risks for companies there. It sounds like.

Soribel: There is.

There is. Like I said, people are, up in arms. I think people have gotten a little tired of having their data abuse in a way. And, I'm not blaming companies because companies need data to do better, to do a better service, to serve us. Yeah. We also have to understand that we, if we want more personalized and better interfaces, we have to give up some data, but I think that's where the communication is important. You tell me what I give you, and you tell me what you give me. And if I don't give this to you, this is what I get. So I think communications and giving me choices, give me choices. Let me decide. And I think that is the key here.

Michael: I mean that, that is really that explicit explanation and really being that openness and level of transparency. I know that's hard for a lot of companies though. I guess the point is the risk is high enough that you really do have to be aware of this. You really do have to pay attention. You really do have to make sure that you can maintain the level of transparency. And yet, we I like personalized offers. I suppose we all do, right? It's if it's in the right context and the right thing relative to whatever it is that I'm doing, but at the same time, I'm off, I'm making a trade. And maybe sometimes that gets out of balance and the value's higher for the company than it is for the consumer or vice versa. So it's back to that balance value proposition.  

Soribel: It is. It is. And yeah, I think that again communications, listening. Yeah. Listening to your customers and also even before you do anything have a brain trust in your company and, think through this before you just sneakily do this and it blows up in your face. Yeah.

Michael: Yeah. And unfortunately these days it blows up in your face and then it's everywhere within a few minutes. So it's, we have the massive distribution channels that are available all the time. Yeah, that's true. This is great. And there's also one more thing I would Oh, sure.

Soribel: Oh, I was just going to add it's also customers are getting matter and matter, right? And if they have lots of choices, then you are. It's an imbalance right there because for example, some people came out and said, oh, I'm never going to use Zoom again. I'm just gonna use all the other choices out there. And there are a lot of choices. There's Google Meet, there is Teams, there's many others. I didn't say that 'cause zoom is, it's very ubiquitous, right? Like it's everywhere.

Michael: It's definitely, it's my tool of choice there. Yeah.

Soribel: Yeah. But I'm gonna be, every time I open someone, I'm gonna be like, oh yeah they're probably doing this and that, and I'm just gonna go in with that alertness. If you have no other choice, and probably the consumer has to live with it. Yeah. But if we have choices, then it's better for you to be transparent.

Michael: Trust is fragile online, particularly because it takes a while to build. Then it's so fragile that it ends just immediately, as soon as you breach that trust. I think that's part of the learning in the amount of information that flows and what we do online now and all of our activities. So yeah. That makes a lot of sense.

That's all the time we have today, but I really appreciate this and it's very interesting conversation and frankly, I could keep going, but I suspect the listeners may not be that happy if we. I had an extra hour or so. So thank you very much for being here.

I really appreciate that. And before I let you go though, the one, one thing I really like to do at the end is get a recommendation from you somebody that has really had an influence over you, a thought leader, an author a mentor that other people could, check out and follow and learn from.

Soribel: Yeah, I would say Laura Miller she's an AI ethicist. She's based in the Midwest. And I spoke to her and I spoken to her and she was very generous with her time. And I just realized that, ethics is something that is so interesting, right? Because, the way she framed it was when you have an AI ethicist, you'd think that this person is the perfect person. She's an example, she's infallible and we have to model her. But that's not the case. Like we have to build our own ethics program and we have to build our own define ethics for ourselves because humans are flawed. So you can't just have, and that goes back to what I said earlier.

Yeah. You can't just have one person who is the person there, this is a whole company effort and you really have to think through what ethics means for you. So I really like her approach to ethics. 'cause it's really practical. Interesting. It's not the theoretical, it's not squishy and yeah. It's not too philosophical. It's practical.

Michael: Yeah, that's good. That, that's an important piece I think from a company standpoint, right? I need it to be actionable. You can have all sorts of interesting conversations about ethics, but if I can't do anything about it in my company, it's not particularly valuable to me.

So thank you all for joining us this week. Remember, hit that subscribe button and for more on AI, you can check out the Arion Research report on AI adoption. It's free research, so you can't beat that. Published on the Arion site. And hopefully that will give you a little bit more information about what companies are doing with AI today.

So join us next week. We have an exciting guest to talk about data prep. I'm Michael Fauscette, and this is the Disambiguation podcast.