How to Opt Out of AI Training on Major Platforms

3,873 words

19 min read

published on December 15, 2024

updated on June 7, 2025

"You have the right to remain silent. Anything you say can and will be used .. for training AI!"

Rules of Thumb:

In most cases, if you use the AI service without logging in, expect that your inputs and outputs will be collected and used for AI training by default.
If you use any popular online service which is free and there is no trial period, then highly likely your inputs and outputs are also collected and used for AI training by default.
After signing up for a new online service, head to the settings page and check if there is an option to turn off the use of your data for AI training.

These days every popular online service collects data for training their own AI by default. If you feel uncomfortable in providing your data for AI, there is a way to opt out! See the list of the popular services and instructions on how to opt out:

Looking for a privacy-first AI approach? Check our article on using AI with full privacy by running local LLMs on your own device.

You can disable using your data for AI training on this page. Note: If you don't see this option and you are located in the European Union, it means your data is not being used due to your location and because of European Artificial Intelligence Act (AI Act) adopted in 2024.

Screenshot of LinkedIn AI training opt-out page

Meta (Facebook and Instagram)

There is no off switch but you may send a request through this shape.

Screenshot of Meta AI training opt-out shape

X (aka Twitter)

You can disable use of your data for training Grok AI model on this page.

Screenshot of X (Twitter) AI training opt-out page

Open AI ChatGPT

IMPORTANT: If you use the service without logging in, then your inputs and outputs are collected and used for AI training by default.

Login with your account at https://chatgpt.com, and click on your profile icon (right-top icon).
Choose settings.
Then select "Data Controls", then "Improve the model for everyone" and set to "Off"

Screenshot of ChatGPT AI training opt-out settings

Anthropic Claude

IMPORTANT: If you use the service without logging in, then your inputs and outputs are collected and used for AI training by default.

No separate switch to turn off using your data for AI training. Also, Anthropic currently (as of September 2024) says that "Anthropic may not train models on Customer Content from paid Services." if you use their paid product (see their commercial terms).

Also, Anthropic reserves the right to store your input and output from 2 to 7 years in case it was automatically flagged by their trust and safety classifiers as violating Anthropic's Usage Policy. More information here.

If you delete a conversation then Anthropic automatically deletes user prompts and outputs on its backend within 30 days, unless you submit a separate data deletion request. More information here.

Screenshot of Anthropic Claude AI training information

Perplexity AI

IMPORTANT: If you use the service without logging in, then your inputs and outputs are collected and used for AI training by default.

Go to Perplexity AI settings and turn off the "AI Data Retention" switch.

Screenshot of Perplexity AI data retention settings

IMPORTANT: If you use the service without logging in, then your inputs and outputs are collected and used for AI training by default.

Reddit does not have a separate switch to exclude your data from AI training. According to Reddit's Terms of Use, by using Reddit you provide them with "...worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit."

Reddit is licensing user data to third parties that may use your data to train AI (source).

But you may remove your posts and comments if required through this page.

Screenshot of Reddit data access information

Wordpress.com

If you are hosting your website on Wordpress.com then your content is used by Wordpress.com for AI training by default (and also for sharing with 3rd party services).

To turn it off you need to go to the Admin panel (dashboard) of your wordpress.com hosted website (for example, https://YOURWEBSITE.wordpress.com/wp-admin/). Then go to Move through to Settings → General (or Hosting → Site Settings if use WP-Admin).
Then scroll down to the Personalization section and enable the Prevent third-party sharing for YOURWEBSITE.wordpress.com option.
If you run multiple websites on Wordpress.com, you need to enable this option for all of them one by one!

Screenshot of Wordpress.com hosted website dashboard with the option controlling access to your website for AI training

For more details about disabling your content from use for AI training, please visit this page: https://wordpress.com/support/privacy-settings/make-your-website-public/#prevent-third-party-sharing.

GitHub CoPilot

IMPORTANT: GitHub Copilot uses public repositories with permissive licenses for training its AI models. Here are three ways to prevent GitHub Copilot from using your code for training:

Do not use GitHub: The most effective way to make sure your code is not used for AI training is to avoid using GitHub altogether.
Keep repositories private: By keeping your repositories private, you can prevent GitHub Copilot from accessing and using your code for training purposes.
Use restrictive licenses for public repositories: For public repositories, you can use a restrictive license or even no license to avoid giving explicit permission for your code to be used for AI training. This can help limit the use of your code by GitHub Copilot.

If your organization uses GitHub Copilot, you can exclude certain content from being used by it. For more details, visit this page

GitHub - Your Repositories In Datasets Used By Popular LLMs

Many of LLMs (Large Language Models) use Stack v2 dataset for training. This dataset is a collection of public source code in 600 programming languages and sized at about 67 terrabytes. It includes source code from GitHub repositories with permissive licenses.

You can prevent your code (from your public Github repositories) from being included into this Stack v2 dataset and therefore exclude it from being used for AI training.

First, to check if your reposities are included into Stack dataset, please visit this page https://huggingface.co/spaces/bigcode/in-the-stack and enter your github username.
If your repositories are included, create a new issue at https://github.com/bigcode-project/opt-out-v2/issues/new?&template=opt-out-request.md to request the exclusion of your repositories from future versions of Stack v2 dataset.

If you want to learn more about Stack v2 dataset, please visit https://huggingface.co/datasets/bigcode/the-stack-v2.

Substack

IMPORTANT: By default, Substack does not block AI bots from using your content for training purposes.

To prevent AI bots from using your Substack content for training, you need to enable the "Block AI training" option. This option is available on the "Settings" page of your Substack account.

Follow these steps to enable the "Block AI training" option:

Log in to your Substack account and move through to the "Settings" page.
Scroll down to the "Block AI training" section.
Enable the "Block AI training" option by toggling the switch.

Please note that this option just updated public rules for AI bots. But some AI bots have been observed not respecting these public rules (like robots.txt and similar rules defining access). Therefore, if your Substack page is public, some AI bots may still have access to your content despite enabling this option. Because of this, this option does not gurarntee that your content will not be used for AI training.

Add rules for AI bots to your website

If you are running a commercial website to promote your business or sell things, or providing a service, you may want to allow your website to be indexed by AI search engines so regular users could easily find your website via AI when needed.

But if you are running a website where you don't want your content to be captured by robots.txt file (or you want this content to be exclusively available and known through your website only), then you first need to define robots.txt file that defines rules for web bots.If you don't have robots.txt on your website then web bots (including ones used by AI) will consider they are allowed to read and learn from your website without any restrictions.

Here are some examples of how to use the robots.txt file to control the access of web bots, including AI bots, to your website:

Example 1: Disallow All Bots


User-agent: *
Disallow: /

This rule tells all bots that they are not allowed to access any part of your website.

Example 2: Allow All Bots


User-agent: *
Disallow:

This rule tells all bots that they are allowed to access all parts of your website.

Example 3: Disallow Specific Bots

User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /

This rule tells a specific bot (in this case, "CCBot" which is used for Common Crawl (used by ChatGPT and many others) and "anthropic-ai" which is used for Anthropic) that it is not allowed to access any part of your website.

Example 4: Disallow All Bots from Specific Directories

User-agent: *
Disallow: /private/
Disallow: /tmp/

This rule tells all bots that they are not allowed to access the "/private/" and "/tmp/" directories of your website.

IMPORTANT: this rule is voluntary and some web and AI bots may ignore it and read these directories from to your website despite this rule.

Example 5: Allow Specific Bots

User-agent: Googlebot Disallow:

User-agent: * Disallow: /

This rule tells the Googlebot (which is used by Google Search indexing) that it is allowed to access all parts of your website, while all other bots (defined as *) are not allowed to access any part of your website.

By using these examples, you can customize the robots.txt file to control which bots can access your website and which parts of your website they can access.

IMPORTANT: while robots.txt file is a standard way to define rules for web bots, it is not a perfect solution. This file is more like a voluntary rules. Some AI bots were cought of not respecting these rules and still accessing and capturing content from websites despite the rules defined in robots.txt file. For more protection (if you need it), please consider adding login/password protection to your website.

Frequently Asked Questions

1. What does it mean to "opt out" of AI training?

Opting out of AI training generally means preventing companies from using your data, such as posts, comments, or source code, to train their AI models. If you find it uncomfortable or invasive that your personal or creative content is used without explicit permission, you can follow certain steps to limit or block data collection.

2. How do I disable AI training using my LinkedIn data?

LinkedIn provides a dedicated page to disable data usage for AI. Simply visit this LinkedIn settings page and toggle off the option to use your data for AI. If this option isn’t visible (particularly in the EU), your data is not being used due to local regulations like the European Artificial Intelligence Act.

3. Is there a way to opt out of AI data usage on Meta (Facebook and Instagram)?

There’s no straightforward "off" switch for AI training on Meta’s platforms. But you can submit a request through this Meta shape. Keep in mind that this process isn’t as clear-cut as toggling a setting; it involves Meta reviewing your request for data exclusion.

4. Can I prevent X (Twitter) from using my posts for AI training?

Yes. X provides a direct way to opt out of its AI model, Grok. Visit this page after logging in, and disable the option to use your content for AI training. This ensures that your tweets are no longer fed into Grok AI’s datasets.

5. Does OpenAI ChatGPT use my data for training by default?

If you use ChatGPT without logging in, your inputs and outputs may be collected for AI model improvements. Once logged in, go to "Settings" > "Data Controls" > "Improve the model for everyone" and switch it off. This setting helps prevent OpenAI from using your conversation data to train ChatGPT.

6. Does Anthropic Claude automatically train on my conversations?

Yes, by default, Anthropic may collect user prompts and responses for training, especially in the free version. Paid users, according to Anthropic’s commercial terms, may have more safeguards. Also, flagged content can be stored from 2 to 7 years for safety checks. Deleting your conversation will remove prompts and outputs from Anthropic’s systems within 30 days, unless you submit a separate data deletion request.

7. How can I prevent Perplexity AI from retaining and training on my data?

When using Perplexity AI without an account, your data may be collected for model improvements. After logging in, visit your account settings and toggle off the "AI Data Retention" option to stop Perplexity from storing your prompts for training.

8. Why is there no single toggle to opt out on Reddit?

Reddit’s Terms of Use grant a broad license allowing the platform and partners to use user content for various purposes, including AI training. You cannot directly opt out, but you can remove your posts and comments. Check this page for deleting or accessing your Reddit data.

9. How do I stop WordPress.com from sharing my website content for AI training?

By default, WordPress.com uses your site content for AI features and may share it with third parties. Go to your WordPress.com admin dashboard (e.g., https://YOURSITE.wordpress.com/wp-admin/) > Settings > General or Hosting > Site Settings and enable "Prevent third-party sharing". Repeat for each of your WordPress.com sites.

10. Can I prevent GitHub Copilot from training on my code?

Copilot primarily trains on public repositories with permissive licenses. Ways to limit your code’s exposure include using private repos, refraining from GitHub altogether, or adopting a restrictive license. Organizations can also exclude content through GitHub’s content exclusion settings.

11. Are my public GitHub repositories in The Stack v2 dataset for AI?

Many popular LLMs, including some ChatGPT variants, train on The Stack v2, a huge code dataset. You can check your repositories at this link. If included, submit an opt-out request at this GitHub issue shape to remove your repos from future data releases.

12. Will using robots.txt or "Block AI training" on Substack fully stop AI bots?

Robots.txt and Substack’s "Block AI training" feature are good first steps but not guarantees. Some bots ignore robots.txt and continue crawling. If you need stronger protection, consider private or restricted-access methods (passwords, paywalls, etc.). Substack’s block setting updates a public rule but cannot force third-party bots to comply.

Keywords

LinkedIn Meta X ChatGPT Claude Perplexity AI Reddit Github AI training AI data training AI data privacy AI data protection AI data opt-out AI data disable AI data opt-out guide AI data disable guide AI data opt-out steps AI data disable steps

About The Author

Ayodesk Publishing Team led by Eugene Mi

Expert editorial collective at Ayodesk, directed by Eugene Mi, a seasoned software industry professional with deep expertise in AI and business automation. We create content that empowers businesses to harness AI technologies for competitive advantage and operational transformation.

How to Opt Out of AI Training on Major Platforms

Table of Contents

LinkedIn

Meta (Facebook and Instagram)

X (aka Twitter)

Open AI ChatGPT

Anthropic Claude

Perplexity AI

Reddit

Wordpress.com

GitHub CoPilot

GitHub - Your Repositories In Datasets Used By Popular LLMs

Substack

Add rules for AI bots to your website

Frequently Asked Questions

Keywords

About The Author

Ayodesk Publishing Team led by Eugene Mi

Continue Reading:

How a Solo Accountant Uses ChatGPT to Simplify Tax Code for Clients

ChatGPT in Travel Agencies: Quick Answers, Personal Trips, Zero Language Barriers

How AI Tools Like ChatGPT Help HVAC Small Businesses in 2025

How to Opt Out of AI Training on Major Platforms

Share this page

Table of Contents

LinkedIn

Meta (Facebook and Instagram)

X (aka Twitter)

Open AI ChatGPT

Anthropic Claude

Perplexity AI

Reddit

Wordpress.com

GitHub CoPilot

GitHub - Your Repositories In Datasets Used By Popular LLMs

Substack

Add rules for AI bots to your website

Frequently Asked Questions

Share this page

Keywords

About The Author

Ayodesk Publishing Team led by Eugene Mi

Continue Reading:

How a Solo Accountant Uses ChatGPT to Simplify Tax Code for Clients

ChatGPT in Travel Agencies: Quick Answers, Personal Trips, Zero Language Barriers

How AI Tools Like ChatGPT Help HVAC Small Businesses in 2025