Tuesday, July 4, 2023

Data Scraping

 

With the new updated Google’s Privacy Policy read as:

publicly accessible sources 

“For example, we may collect information that’s publicly available online or from other public sources to help train Google’s languageAI models and build products and features like Google Translate, Bard, and Cloud AI capabilities. Or, if your business’s information appears on a website, we may index and display it on Google services;”

and as not all the publicly available data are mere facts or government published information which can be legally mined; & further, as all the content available in public domain on Internet is in some or the other way, copyrightable, unless been explicitly mentioned to as not; Still, the problem isn’t the use of all publicly available data to train likes of Bard or indispensable AI, as the same, & it’s is presumed even OpenAI (that wasn’t even search platform that hosted data) + others might’ve done too; & further, the question is no more now restricted to & as already discussed in one of my previous blog with examples w.r.t. copyright related issues, to, furthermore such services/products ever even affirming explicitly that neither derivative nor translated nor reproduction nor replicated work (reciprocal rights after all) be published/generated, including to the subscribed members; & if done, then mentioning the original author’s name from wherein such output was derived for private use only, (until AI becomes conscious enough to be considered as: person & person interested), or, except in case of scientific research & data analysis, report generation, content aggregation, fair use or transformative content. The question is, how many of these above issues even been considered ethically by anyone whilst inducting the likes of data scraping? Besides going through the legal aspects of the likes of GDPR, proposed DPDP, CFAA, IT Acts, Browsewrap, Clickwrap Agreements etc.      

I’ve always contended that Intellectual Property isn’t about exploitation, BUT, to essentially know the original: author or artist or an inventor’s whereabouts. And there’re original authors to artists to inventors who put their contents/products/processes absolutely FREE (including APIs) for the readers/users to learn, read, write & grow, without login authorizations. Thus, if any AI service/product (which hasn’t become conscious or self-aware & could not be considered as a person yet) been built sans taking the above issues into consideration, then such data usage even a healthy business model?

A true & first originator isn’t one who first imports or to whom it’s communicated to or who’s scraping for use (unless taken consent or mentioning the originator’s name). Thus, likes of data scraping, isn’t an issue, as been used by almost all Institutions for refined Business Intelligence. But, as AI is indispensable, & unless the above issues are addressed; its refined evolution too is unachievable.

© Pranav Chaturvedi

No comments:

Post a Comment

Should There Be Any Limitation Timeline For Copyright Infringement?

  Let’s separate trademarks, designs, G.I., Patents, and Domain Name Disputes for a moment first, when it come to the infringement proceed...