Privacy Considerations for Generative AI

What is Generative AI?

Generative artificial intelligence (AI) refers to technology that uses deep learning models to produce human-like content, such as images and words, in response to complex and varied prompts, including languages, instructions, and questions. It uses advanced algorithms and training data to produce new content that approximates the training data. Recent uses of Generative AI have been interactive and driven by input prompts and manipulated by additional feedback. Large Language Models (LLMs) also represent a widely embraced category of generative AI designed to produce responses to natural language queries. Notable examples include Google Bard and OpenAI’s ChatGPT (Chat Generative Pre-Trained Transformer) (text-generation), Microsoft’s Bing and MS 365 Copilot (search engine and text-generation), and OpenAI’s DALL-E (image generation).

Why is Generative AI useful?

Generative AI tools prove valuable for various purposes, such as enhancing creativity, improving productivity, and facilitating communication. They can assist in drafting emails, computer code, outlining reports, and generating images. Artists and designers can leverage generative AI to create diverse artworks, including paintings, sculptures, or logos, drawing inspiration from existing images or styles. Educators and students benefit from generative AI in generating engaging and personalized learning materials like quizzes, summaries, or flashcards, utilizing natural language processing and understanding. Scientists and researchers can employ generative AI to discover new insights and hypotheses, such as drug candidates, molecular structures, or causal relationships, through data analysis and synthesis. In essence, generative AI presents numerous benefits and opportunities for users seeking to harness its potential.

Generative AI also comes with privacy and security challenges and risks.

Given the incredible rise in popularity and the transformative nature of Generative AI, it’s important to be cautious when providing information to generative AI tools, as they operate based on the input they receive. Only Public data or data assessed and approved for use by the University can be provided to these tools.

Sharing High-Risk, Sensitive, and Internal data with AI tools could potentially lead to unintended consequences. The data in these categories, including non-public research data, finance, HR, student records, and medical information, should not be used with Generative AI.

How to use Generative AI responsibly and protect your privacy:

For those using Generative AI in their regular work

Explore options to purchase or license a business or enterprise version of the software. Enterprise software usually brings contractual protections and additional resources such as real-time support.

Begin discussions with colleagues regarding the privacy considerations listed in the next section.

Consider where and how existing policies and best practices can be updated to better protect user privacy.

Remember to validate Generative AI output, and if using Generative AI in a workflow, consider implementing formal fact-checking, editorial, and validation steps.

For those creating and developing Generative AI

Provide transparency about how your Generative AI models are trained.

Inform users what data might be collected about them when using Generative AI.

Create accessible mechanisms for users to request data deletion or opt-out of certain data processing activities.

Explore incorporating privacy enhancing technologies in your initial design stages to mitigate privacy risks and protect user data.

Consider technologies that support data deidentification and anonymization, Personally Identifiable Information (PII) identification and data loss prevention, and always incorporate principles of data minimization.

Keep in mind additional privacy considerations:

Compliance and Regulation

Generative AI tools may not comply with the laws and regulations that govern the use of data and content in different contexts and jurisdictions. Initiatives involving PII at the University, including Generative AI, are subject to all applicable laws, University policies, and University contractual obligations.

In the University setting, specific applicable privacy laws for data use and Generative AI include the U.S. Privacy Act, state privacy laws such as the Personal Information Protection Act (PIPA), industry specific regulations such as Family Educational Rights and Privacy Act (FERPA), Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Rule (COPPA), and geographic and extraterritorial international laws such as EU’s General Data Protection Regulation (GDPR) and China’s Personal Information Protection Law of the People’s Republic of China (PIPL), among others.

Exercise an abundance of caution when considering any new technology that processes PII. Given the unprecedented access to and increasing adoption of AI and Generative AI capabilities, market forces are driving steep competition to add AI capabilities to existing offerings. This pressure may result in compromised laws/policies, ethics, and integrity when rushing new features and new capabilities to market.

Under the GDPR, individuals “have the right not to be subject to a decision based solely on automated processing, including profiling,” that has legal or similarly significant effects (GDPR Article 22(1), PIPL Article 73). State privacy laws further impact AI developers and providers, with examples such as Colorado, Virginia, and Connecticut granting individuals the right to opt out of personal data processing for profiling purposes.

Intellectual Property Rights

Generative AI tools may not adhere to the ethical and professional standards and norms that apply to different domains and activities, such as research, education, or journalism. These tools may violate the intellectual property rights of the original data or content owners or infringe on the privacy rights of the data subjects through utilization of training data.

The training data itself might be sourced from collections that breach intellectual property and privacy laws and regulations. Such tainted data has the potential to compromise the model and any subsequent products derived from it. It’s crucial to recognize that training data encompasses both structured and unstructured information, ranging from databases and text to video, books, websites, blogs, and more, used in the training of machine learning algorithms.

Identification and removal of PII from LLMs remain largely untested, introducing complexities in addressing data subject requests within regulated timeframes. Additionally, if PII is integrated into LLMs, Generative AI could inadvertently expose PII within its generated output. This underscores the importance of a comprehensive understanding and management of these potential intellectual property and privacy challenges associated with generative AI technologies.

Human Autonomy, Trust, and Bias

Users are more likely to overshare when data collection is interactive and conversational. Users may lack the technical literacy or awareness to understand that Generative AI only mimics human behavior. In other words, users may intentionally be led to believe they are interacting with an actual human. Relying solely on Generative AI interactions could lead to unintentional oversharing and potential deception.

Generative AI systems have the potential to generate false, misleading, or inaccurate content for the intended purpose or audience. Users need to be careful and critical when using and evaluating the output content of generative AI tools.

Generative AI tools may create biased, misleading, inaccurate, or offensive content.

Generative AI tools may not be able to handle complex, ambiguous, or nuanced prompts or feedback, or cope with unexpected or erroneous input data.

Output may not be accurate or true. These models are not evaluating or analyzing their outputs for accuracy in fact or substance. Instead, they evaluate their outputs in comparison with the similarities to the training data they are built upon.

Generative AI tools, despite the name, are not intelligent. They are designed to mimic human behavior and to do so in a conversational manner. This could give users a false sense of confidence or accurateness in the AI-supplied results.

Generative AI tools may be used by malicious actors to manipulate or deceive others, such as by creating fake news, deepfakes, or phishing messages. It is crucial to establish and enforce ethical and professional standards and norms for the use of Generative AI, and to ensure the traceability and accountability of the data and content that it generates.

Generative AI systems have the potential to generate content that may inadvertently or intentionally defame individuals or organizations. Vigilance is key when implementing measures to prevent generation of defamatory content, such as robust content moderation, human review and editing, and filtering mechanisms. Clear University policies should address and rectify any instances of defamation that may arise from the use of Generative AI systems, ensuring accountability and protecting the reputation of the University and our communities. Verifying Generative AI responses for accuracy reduces but does not eliminate this potential. If a user has a strong suspicion of generated content creating a situation like this, the user should validate the information with another source.

Data Privacy and Security

It is unclear what personal information, user behavior, and analytics are being recorded and retained, or shared with third-parties. As Generative AI is mainstreamed, it is likely to follow proven channels for monetization, such as using personal data for targeted advertising. Clear policies should be established regarding retention and deletion of user data collected during interactions with Generative AI systems. When considering Generative AI uses, determine whether users have the ability to request the deletion of their personal data, which is a requirement of the EU’s GDPR and most other privacy laws and regulations.

Given the prolonged and conversational interaction of many chatbot-based Generative AI solutions, special care should be taken to minimize legal and privacy risks related to wiretap laws. Risks arise in many possible implementations, including under federal and state wiretap laws. The extent of the risk largely depends on what information is collected and who has access., Configuring the Generative AI solutions with these risks in mind, including incorporating appropriate notice and consent language, is essential. Any implementation of a Generative AI service should be reviewed by University Counsel and the University Ethics and Compliance Office.

Generative AI models can be susceptible to adversarial prompt engineering, where malicious actors manipulate input to generate harmful or misleading content. Malicious prompt engineering may result in dissemination of false information, exposure of sensitive data, or inappropriate collection of private information from users.

A list of AI tools, including Generative AI tools, that have been reviewed by the University or System can be found: https://citl.illinois.edu/citl-101/instructional-spaces-technologies/teaching-with-technology/generative-artificial-intelligence/genai-tools

Training and Awareness

The implementation of Generative AI should be transparent to the user and be accompanied by training and educational programming. Promoting AI literacy within the University community is crucial for better understanding of the privacy implications of interacting with Generative AI systems, empowering individuals to make informed decisions and take privacy precautions.