{"id":92708,"date":"2023-08-14T12:07:08","date_gmt":"2023-08-14T12:07:08","guid":{"rendered":"https:\/\/www.techopedia.com"},"modified":"2023-10-31T09:42:51","modified_gmt":"2023-10-31T09:42:51","slug":"the-pitfalls-of-training-ai-with-made-up-data","status":"publish","type":"post","link":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data","title":{"rendered":"The Pitfalls of Training AI With Made-Up Data"},"content":{"rendered":"<p>AI is growing up, entering our lives and the workplace as the possibilities of an Einstein in your pocket catches on.<\/p>\n<p>Whether it is writing an essay, creating complex artwork, reviewing policies, creating custom code, or writing an after-dinner speech for you, it&#8217;s already beginning to transform how we work and live.<\/p>\n<p>However, <a href=\"https:\/\/www.techopedia.com\/definition\/190\/artificial-intelligence-ai\">artificial intelligence<\/a> (AI) depends solely on data to do what it does.<\/p>\n<p>Let&#8217;s take an example of the prompt: &#8220;Create me a picture of a rose&#8221;. AI first needs to learn about the various data on offer, before getting to work.<\/p>\n<p>It needs to learn about the typical rose shape, colors, design, petal arrangement \u2014 all the characteristics that make a rose a rose.<\/p>\n<p>What is the source of the data from which it learns? The data is supplied by <a href=\"https:\/\/www.techopedia.com\/definition\/34633\/generative-ai\">AI-generated data<\/a> or <a href=\"https:\/\/www.techopedia.com\/definition\/33305\/synthetic-data\">synthetic data<\/a>.<\/p>\n<h2><span id=\"training_an_artificial_intelligence\">Training an Artificial Intelligence<\/span><\/h2>\n<p>While our focus today is training an AI system with AI-generated data, generally, an AI system is trained with a mix of AI-generated and real-world data.<\/p>\n<p>The process is designed around the constraints of legal, ethical, and secrecy considerations in acquiring real-world data.<\/p>\n<p>But data is critical if you are to generate realistic AI systems \u2014 <a href=\"https:\/\/www.techopedia.com\/from-humans-to-ai-meet-the-futuristic-news-anchors-of-tomorrow\">synthetic news readers, for example<\/a> \u2014 and given the lack of real-world data, generating synthetic data, which imitates real-world data, becomes vital.<\/p>\n<p>For example, an AI system might be able to generate a detailed image of a cockpit in an airplane, but it will not match exactly the image of a real-world cockpit.<\/p>\n<h3>Step 1: Generating Synthetic Data<\/h3>\n<p>The source AI system generates synthetic data that is used to train the target AI model, which could be a <a href=\"https:\/\/www.techopedia.com\/definition\/5967\/artificial-neural-network-ann\">neural network<\/a> or another <a href=\"https:\/\/www.techopedia.com\/machine-learning-algorithm-or-machine-learning-model\/7\/34855\">machine learning algorithm<\/a>.<\/p>\n<p>The synthetic data is as close as possible to real-world data and enables the target AI system to learn about the object the data is about. It knows about things like shapes, colors, and configuration details.<\/p>\n<h3>Step 2: Training data preparation<\/h3>\n<p>The synthetic data is mixed with appropriate real-world data. For example, the AI-generated image of an airplane cockpit dashboard is combined with the actual image of a cockpit dashboard.<\/p>\n<p>This is an opportunity for the <a href=\"https:\/\/www.techopedia.com\/definition\/8181\/machine-learning-ml\">AI learning model<\/a> to learn from the data. It can not only identify the component parts of the data, for example, the Fuel Meter and the Altimeter, but also distinguish between synthetic and real-world data.<\/p>\n<h3>Step 3: Training the AI model<\/h3>\n<p>The target <a href=\"https:\/\/www.techopedia.com\/prompt-learning-a-new-way-to-train-foundation-models-in-ai\/2\/34793\">AI model learns from the mixed data set<\/a>.<\/p>\n<p>For example, the objective is to enable the AI model to learn about different types of images of dogs. The acceptable response is that it can identify the dogs\u2019 names and categorize them as sheepdogs, hound dogs, etc.<\/p>\n<p>The AI model provides a limited collection of real dogs\u2019 images and a wider collection of synthetic data.<\/p>\n<p>The learning model studies and understands the various characteristics and parameters and learns to draw inferences and patterns.<\/p>\n<p>For example, dogs with short tails might be identified as Dobermans, or those with prominent and acutely triangular ears might be identified as German Shepherds.<\/p>\n<p>The learning model also learns not to generalize based on the parameters. For example, Dobermans will have short tails, but all dogs with short tails might not be Dobermans.<\/p>\n<h2><span id=\"using_data_in_the_real_world\">Using Data in the Real World<\/span><\/h2>\n<p>One of the most notable real-world examples of AI trained by AI-generated data is PilotNet, the self-driving car project by <a href=\"https:\/\/developer.nvidia.com\/blog\/explaining-deep-learning-self-driving-car\/\" target=\"_blank\">NVIDIA<\/a>.<\/p>\n<p>PilotNet is a deep learning system that learns about real-time driving from both synthetic data and observing human drivers who drive a special car designed to collect data on driving, road conditions, traffic signs, lane markings, vehicles, and pedestrians.<\/p>\n<p>Driving is a complex task because it involves both skills and decision-making within an extremely short period of time. As the human driver drives the car, PilotNet gathers data, and the relevant data is marked as highlighted pixels.<\/p>\n<p>The deep learning system behind the self-driven car must control the driving based on the highlighted pixels that identify various objects on the road, such as pedestrians, traffic signals, and vehicles.<\/p>\n<h2><span id=\"benefits_of_synthetic_data\">Benefits of Synthetic Data<\/span><\/h2>\n<p>The main <a href=\"https:\/\/neptune.ai\/blog\/the-advantages-of-synthetic-data-over-real-data\" target=\"_blank\">benefits<\/a> of training AI with synthetic data are:<\/p>\n<ul>\n<li>As stated, real-life data is hard to acquire because of various constraints, making synthetic data your best bet. Quality synthetic data that can get as close as possible to real data is the best source of learning for AI learning models.<\/li>\n<li>With synthetic data, you don\u2019t have the risks of confidentiality or secrecy breaches that come with real-life data. Real-life data, when legally obtained with consent, comes with strings attached.<\/li>\n<li>Synthetic data enables multiple different scenario explorations. For example, in a self-driven car, synthetic data can help exploring driving on a congested street or a highway &#8211; without needing to get on the road.<\/li>\n<\/ul>\n<h2><span id=\"limitations_and_issues\">Limitations and Issues<\/span><\/h2>\n<p>Synthetic data is both an advantage and a limitation because it is <em>not<\/em> real-world data, regardless of quality.<\/p>\n<p>An <a href=\"https:\/\/www.techopedia.com\/foundation-models-ais-next-frontier\/2\/34781\">AI model<\/a> takes longer to learn about real-world objects with synthetic data.<\/p>\n<p>Synthetic data is likely to contain erroneous and biased data that could lead to unintended training outcomes because the data doesn\u2019t match real-world use cases.<\/p>\n<p>For example, synthetic data on credit scores and loan applications may contain wrong and <a href=\"https:\/\/www.techopedia.com\/can-ai-have-biases\/2\/34037\">biased data<\/a> against specific communities or be inaccurate because it\u2019s not in sync with the latest changes in data laws.<\/p>\n<p>The outcome could be not only unintended but also dangerous.<\/p>\n<p>However, synthetic data, despite limits, is still the best available data source on which AI models can learn.<\/p>\n<p>However, business organizations might be extremely wary about using AI in sensitive use-cases such as medical treatment, social issues, and loan applications.<\/p>\n<h2><span id=\"the_bottom_line\">The Bottom Line<\/span><\/h2>\n<p>Acquiring real-world data seems to be a major hindrance in the learning of AI models, and data acquisition faces many obstacles in many forms.<\/p>\n<p>Considering AI can do remarkable things, major institutions like governments, corporations, and research institutions need to work out how to enable AI systems to parse real-time data and strip off parts that, if processed, might cause real-world problems.<\/p>\n<p>However, in the meantime, synthetic data \u2014 used carefully \u2014 is better than nothing.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI is growing up, entering our lives and the workplace as the possibilities of an Einstein in your pocket catches on. Whether it is writing an essay, creating complex artwork, reviewing policies, creating custom code, or writing an after-dinner speech for you, it&#8217;s already beginning to transform how we work and live. However, artificial intelligence [&hellip;]<\/p>\n","protected":false},"author":7870,"featured_media":92947,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_lmt_disableupdate":"","_lmt_disable":"","om_disable_all_campaigns":false,"footnotes":""},"categories":[573,599],"tags":[],"category_partsoff":[],"class_list":["post-92708","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-machine-learning"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.2 (Yoast SEO v24.5) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>The Pitfalls of Training AI With Made-up Data<\/title>\n<meta name=\"description\" content=\"Acquiring real-world data seems to be a major hindrance in the learning of AI models. So what do we do? We use synthetic data.\" \/>\n<meta name=\"robots\" content=\"noindex, follow\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The Pitfalls of Training AI With Made-Up Data\" \/>\n<meta property=\"og:description\" content=\"Acquiring real-world data seems to be a major hindrance in the learning of AI models. So what do we do? We use synthetic data.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data\" \/>\n<meta property=\"og:site_name\" content=\"Techopedia\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/techopedia\/\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/techalpine\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-14T12:07:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-10-31T09:42:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Kaushik Pal\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/techalpine\" \/>\n<meta name=\"twitter:site\" content=\"@techopedia\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kaushik Pal\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data\"},\"author\":{\"name\":\"Kaushik Pal\",\"@id\":\"https:\/\/www.techopedia.com\/#\/schema\/person\/d7df6d38ac044fdfba1f7ca2eb5f4034\"},\"headline\":\"The Pitfalls of Training AI With Made-Up Data\",\"datePublished\":\"2023-08-14T12:07:08+00:00\",\"dateModified\":\"2023-10-31T09:42:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data\"},\"wordCount\":1018,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.techopedia.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png\",\"articleSection\":\"\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#respond\"]}],\"copyrightYear\":\"2023\",\"copyrightHolder\":{\"@id\":\"https:\/\/www.techopedia.com\/#organization\"}},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data\",\"url\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data\",\"name\":\"The Pitfalls of Training AI With Made-up Data\",\"isPartOf\":{\"@id\":\"https:\/\/www.techopedia.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png\",\"datePublished\":\"2023-08-14T12:07:08+00:00\",\"dateModified\":\"2023-10-31T09:42:51+00:00\",\"description\":\"Acquiring real-world data seems to be a major hindrance in the learning of AI models. So what do we do? We use synthetic data.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage\",\"url\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png\",\"contentUrl\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png\",\"width\":1200,\"height\":600,\"caption\":\"An AI-powered android navigating the real world.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.techopedia.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Artificial Intelligence\",\"item\":\"https:\/\/www.techopedia.com\/ai\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Machine Learning\",\"item\":\"https:\/\/www.techopedia.com\/topic\/318\/machine-learning\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"The Pitfalls of Training AI With Made-Up Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.techopedia.com\/#website\",\"url\":\"https:\/\/www.techopedia.com\/\",\"name\":\"Techopedia\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.techopedia.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.techopedia.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.techopedia.com\/#organization\",\"name\":\"Techopedia\",\"url\":\"https:\/\/www.techopedia.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.techopedia.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/techopedia-light-logo.svg\",\"contentUrl\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/techopedia-light-logo.svg\",\"caption\":\"Techopedia\"},\"image\":{\"@id\":\"https:\/\/www.techopedia.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/techopedia\/\",\"https:\/\/x.com\/techopedia\",\"https:\/\/www.linkedin.com\/company\/techopedia\/\",\"https:\/\/www.youtube.com\/c\/Techopedia\"],\"publishingPrinciples\":\"https:\/\/www.techopedia.com\/about\/editorial-policy\",\"ownershipFundingInfo\":\"https:\/\/www.techopedia.com\/about\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.techopedia.com\/#\/schema\/person\/d7df6d38ac044fdfba1f7ca2eb5f4034\",\"name\":\"Kaushik Pal\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.techopedia.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/kaushik-pal-e-150x150.jpg\",\"contentUrl\":\"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/kaushik-pal-e-150x150.jpg\",\"caption\":\"Kaushik Pal\"},\"description\":\"Kaushik is a Technical Architect and Software Consultant with over 23 years of experience in software analysis, development, architecture, design, testing and training. He has an interest in new technologies and areas of innovation. He focuses on web architecture, web technologies, Java\/J2EE, open source software, WebRTC, big data and semantic technologies. He has demonstrated expertise in requirements analysis, architecture design and implementation, technical use cases and software development. His experience has spanned across industries like insurance, banking, airlines, shipping, document management and product development etc. He has worked on a wide range of technologies ranging from large scale (IBM S\/390), mid scale (AS\/400), web technologies, open source and big data. Kaushik is primarily involved in Java\/J2EE\/Open Source\/Web\/WebRTC\/Hadoop and Big Data technologies. Kaushik also founded TechAlpine, a technology blogging\/consulting firm in Kolkata. The TechAlpine team works with multiple clients in India and abroad and has expertise in Java\/J2EE\/open source\/web\/webRTC\/Hadoop\/big data technologies and technical writing.\",\"sameAs\":[\"https:\/\/techalpine.com\/\",\"https:\/\/www.facebook.com\/techalpine\",\"https:\/\/in.linkedin.com\/in\/kaushik-pal-36b36915\",\"https:\/\/x.com\/https:\/\/twitter.com\/techalpine\",\"https:\/\/www.youtube.com\/channel\/UCKSE-GsRz1GrGhvzy6hYHNQ?view_as=subscriber\"],\"knowsAbout\":[\"Technology Specialist\"],\"url\":\"https:\/\/www.techopedia.com\/contributors\/kaushik-pal\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"The Pitfalls of Training AI With Made-up Data","description":"Acquiring real-world data seems to be a major hindrance in the learning of AI models. So what do we do? We use synthetic data.","robots":{"index":"noindex","follow":"follow"},"og_locale":"en_US","og_type":"article","og_title":"The Pitfalls of Training AI With Made-Up Data","og_description":"Acquiring real-world data seems to be a major hindrance in the learning of AI models. So what do we do? We use synthetic data.","og_url":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data","og_site_name":"Techopedia","article_publisher":"https:\/\/www.facebook.com\/techopedia\/","article_author":"https:\/\/www.facebook.com\/techalpine","article_published_time":"2023-08-14T12:07:08+00:00","article_modified_time":"2023-10-31T09:42:51+00:00","og_image":[{"width":1200,"height":600,"url":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png","type":"image\/png"}],"author":"Kaushik Pal","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/techalpine","twitter_site":"@techopedia","twitter_misc":{"Written by":"Kaushik Pal","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#article","isPartOf":{"@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data"},"author":{"name":"Kaushik Pal","@id":"https:\/\/www.techopedia.com\/#\/schema\/person\/d7df6d38ac044fdfba1f7ca2eb5f4034"},"headline":"The Pitfalls of Training AI With Made-Up Data","datePublished":"2023-08-14T12:07:08+00:00","dateModified":"2023-10-31T09:42:51+00:00","mainEntityOfPage":{"@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data"},"wordCount":1018,"commentCount":0,"publisher":{"@id":"https:\/\/www.techopedia.com\/#organization"},"image":{"@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage"},"thumbnailUrl":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png","articleSection":"","inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#respond"]}],"copyrightYear":"2023","copyrightHolder":{"@id":"https:\/\/www.techopedia.com\/#organization"}},{"@type":"WebPage","@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data","url":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data","name":"The Pitfalls of Training AI With Made-up Data","isPartOf":{"@id":"https:\/\/www.techopedia.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage"},"image":{"@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage"},"thumbnailUrl":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png","datePublished":"2023-08-14T12:07:08+00:00","dateModified":"2023-10-31T09:42:51+00:00","description":"Acquiring real-world data seems to be a major hindrance in the learning of AI models. So what do we do? We use synthetic data.","breadcrumb":{"@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#primaryimage","url":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png","contentUrl":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2023\/08\/artificial_intelligence_08.png","width":1200,"height":600,"caption":"An AI-powered android navigating the real world."},{"@type":"BreadcrumbList","@id":"https:\/\/www.techopedia.com\/the-pitfalls-of-training-ai-with-made-up-data#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.techopedia.com\/"},{"@type":"ListItem","position":2,"name":"Artificial Intelligence","item":"https:\/\/www.techopedia.com\/ai"},{"@type":"ListItem","position":3,"name":"Machine Learning","item":"https:\/\/www.techopedia.com\/topic\/318\/machine-learning"},{"@type":"ListItem","position":4,"name":"The Pitfalls of Training AI With Made-Up Data"}]},{"@type":"WebSite","@id":"https:\/\/www.techopedia.com\/#website","url":"https:\/\/www.techopedia.com\/","name":"Techopedia","description":"","publisher":{"@id":"https:\/\/www.techopedia.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.techopedia.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.techopedia.com\/#organization","name":"Techopedia","url":"https:\/\/www.techopedia.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.techopedia.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/techopedia-light-logo.svg","contentUrl":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/techopedia-light-logo.svg","caption":"Techopedia"},"image":{"@id":"https:\/\/www.techopedia.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/techopedia\/","https:\/\/x.com\/techopedia","https:\/\/www.linkedin.com\/company\/techopedia\/","https:\/\/www.youtube.com\/c\/Techopedia"],"publishingPrinciples":"https:\/\/www.techopedia.com\/about\/editorial-policy","ownershipFundingInfo":"https:\/\/www.techopedia.com\/about"},{"@type":"Person","@id":"https:\/\/www.techopedia.com\/#\/schema\/person\/d7df6d38ac044fdfba1f7ca2eb5f4034","name":"Kaushik Pal","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.techopedia.com\/#\/schema\/person\/image\/","url":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/kaushik-pal-e-150x150.jpg","contentUrl":"https:\/\/www.techopedia.com\/wp-content\/uploads\/2025\/02\/kaushik-pal-e-150x150.jpg","caption":"Kaushik Pal"},"description":"Kaushik is a Technical Architect and Software Consultant with over 23 years of experience in software analysis, development, architecture, design, testing and training. He has an interest in new technologies and areas of innovation. He focuses on web architecture, web technologies, Java\/J2EE, open source software, WebRTC, big data and semantic technologies. He has demonstrated expertise in requirements analysis, architecture design and implementation, technical use cases and software development. His experience has spanned across industries like insurance, banking, airlines, shipping, document management and product development etc. He has worked on a wide range of technologies ranging from large scale (IBM S\/390), mid scale (AS\/400), web technologies, open source and big data. Kaushik is primarily involved in Java\/J2EE\/Open Source\/Web\/WebRTC\/Hadoop and Big Data technologies. Kaushik also founded TechAlpine, a technology blogging\/consulting firm in Kolkata. The TechAlpine team works with multiple clients in India and abroad and has expertise in Java\/J2EE\/open source\/web\/webRTC\/Hadoop\/big data technologies and technical writing.","sameAs":["https:\/\/techalpine.com\/","https:\/\/www.facebook.com\/techalpine","https:\/\/in.linkedin.com\/in\/kaushik-pal-36b36915","https:\/\/x.com\/https:\/\/twitter.com\/techalpine","https:\/\/www.youtube.com\/channel\/UCKSE-GsRz1GrGhvzy6hYHNQ?view_as=subscriber"],"knowsAbout":["Technology Specialist"],"url":"https:\/\/www.techopedia.com\/contributors\/kaushik-pal"}]}},"modified_by":"vukstojkovic","_links":{"self":[{"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/posts\/92708","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/users\/7870"}],"replies":[{"embeddable":true,"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/comments?post=92708"}],"version-history":[{"count":0,"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/posts\/92708\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/media\/92947"}],"wp:attachment":[{"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/media?parent=92708"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/categories?post=92708"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/tags?post=92708"},{"taxonomy":"category_partsoff","embeddable":true,"href":"https:\/\/www.techopedia.com\/wp-json\/wp\/v2\/category_partsoff?post=92708"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}