Privacy considerations for Generative AI

WE HAVE UPDATED THIS INFORMATION. SEE IT HERE

Generative AI refers to artificial intelligence models that create content in various forms, including text, images, and audio across many formats and mediums.

Generative AI uses deep-learning algorithms and training data to produce new content that approximates the training data.

Given the incredible rise in popularity and the transformative nature of Generative AI, following is some general guidance to consider related to data privacy. Note: not legal advice, and not intended to be comprehensive.

If you use generative AI in regular work

  • Explore options to purchase or license a business or enterprise version of the software. Enterprise software usually brings contractual protection and additional resources such as real-time support.
  • Begin discussions with your colleagues about the privacy considerations listed in the next section.
  • Consider where and how existing policies and best practices can be updated to better protect user privacy.
  • Remember to validate the output of Generative AI, and if using Generative AI in a workflow, consider implementing formal fact-checking, editorial, and validation steps to your workflow.

If you create or develop generative AI

  • Provide transparency about how your Generative AI models are trained. Inform users what data might be collected about them when using generative AI and create accessible mechanisms for users to request data deletion or opt-out of certain data processing activities.
  • Explore incorporating privacy enhancing technologies in your initial design stages to mitigate privacy risks and protect user data. Consider technologies that support data deidentification and anonymization, PII identification and data loss prevention, and always incorporate principles of data minimization.

If you would like assistance as you consider data minimization, data anonymization, or data deidentification in your AI, the Privacy Team can help. Contact privacy@illinois.edu.

Additional guidance

Generative AI is not new, and concerns regarding its use and potential harms have been raised and discussed for years.

In light of the recent popularity and public access to generative ai capabilities, it’s important to remember there are existing policies and practices, as well as scholarly, historical, and theoretical applications that should be considered alongside the more recent conversations. Initiatives involving personally identifiable information (PII) at the university, including generative AI, are subject to all applicable laws, university policies, and university contractual obligations.

  • In the university setting, specific privacy laws that come into consideration include the federal U.S. Privacy Act as well as state privacy laws such as PIPA, industry specific regulations such as FERPA, HIPAA, COPPA, and geographic and extraterritorial international laws such as GDPR and PIPL, among others. For more information about these laws and others, see the Electronic Privacy Information Center’s guide.  Given the unprecedented access to and increasing adoption of AI and generative AI capabilities, market forces are driving steep competition to add AI capabilities to existing offerings. This pressure may result in compromised ethics and integrity when rushing new features and new capabilities to market.

Training data may include data that was collected in violation of copyright and privacy laws, among other laws or ethical considerations, which may contaminate the model and any products that use it.

Training data refers to the initial structured and unstructured data (databases, text, video, books, websites, blogs, etc.) used to train machine learning algorithms. We will not know the societal and business impacts of these violations for many years.

  • Identifying and removing personally identifiable information (PII) from large language models is largely untested and therefore may complicate responding to data subject requests within regulated timeframes. Additionally, if PII is a part of the large language model it may be possible for generative AI to expose PII in the output.

It is likely that input data may be used as training data, and users are more likely to overshare when data collection is interactive and conversational.

  • Users may lack technical literacy to understand that Generative AI is mimicking human behavior.
  • Users can be intentionally misled to believe they are interacting with a human.
  • Given the prolonged and conversational method of interaction, users may lower their guard and share personal information.

It is unclear what personal information, user behavior, and analytics are being recorded and retained, or shared with third parties.

As generative AI is mainstreamed, it is likely to follow proven channels for monetization, such as using personal data for targeted advertising. Clear policies should be established regarding the retention and deletion of user data collected during interactions with generative AI systems. When considering uses, determine whether individuals may request deletion of their personal data, which is a requirement of GDPR and most other privacy laws.

Depending on how they’re used, generative AI models may qualify as automated decision-making, which creates heightened privacy and consent obligations.

  • Under the GDPR, individuals “have the right not to be subject to a decision based solely on automated processing, including profiling,” that has legal or similarly significant effects (GDPR Article 22(1), PIPL Article 73).
  • Privacy laws in Colorado, Virginia, and Connecticut give individuals the right to opt out of personal data processing for purposes of profiling.

Given the prolonged and conversational interaction of many chatbot-based Generative AI solutions, special care should be taken to minimize legal and privacy risks related to wiretapping.

Risks arise in many possible implementations, including under federal and state wiretap laws. The extent of the risk largely depends upon what information is collected and who has access to the information, so properly configuring the Generative AI solutions with these risks in mind, including incorporating appropriate notice and consent language, is essential. To mitigate these risks, any implementation of a Generative AI service should be reviewed by University Counsel and the University Ethics and Compliance Office.

Generative AI models can be susceptible to adversarial prompt engineering, where malicious actors manipulate input to generate harmful or misleading content.

Malicious prompt engineering may lead to the dissemination of false information, the exposure of sensitive data, or inappropriate collection of private information.

Implementation of Generative AI should be transparent for users and be accompanied by training and educational programming.

Educating users about how AI models work, the data they collect, and the potential risks involved can empower individuals to make informed decisions and take necessary privacy precautions when engaging with such technologies. Promoting AI literacy within the University community is crucial to assist in understanding the privacy implications of interacting with Generative AI systems.

Generative AI systems have the potential to generate content that may inadvertently or intentionally defame individuals or organizations.

  • Vigilantly implement measures to prevent the generation of defamatory content, such as robust content moderation, human review and editing, and filtering mechanisms.
  • Clear policies should be in place to address and rectify any instances of defamation that may arise from the use of Generative AI systems, ensuring accountability and protecting the reputation of the University and our communities.

Generative AI systems have the potential to generate false, misleading, or inaccurate content.

Users should be aware the output created by generative AI may not be accurate or true. These models do not evaluate or analyze outputs for accuracy in fact or substance. Instead, they s evaluate outputs on the similarities to the training data they are built upon.