About of original raw data and intermediate data has been transformed

@bgarcial wrote:

I want to use the Cookiecutter Data science project structure, to my project. Looks great http://drivendata.github.io/cookiecutter-data-science/

I am analyzing the different directory on their structure and I have some question related to the different data stages. In the README.md file [setup the difference between external, interim, processed and raw data.][1]
 ├── data
    │   ├── external       <- Data from third party sources.
    │   ├── interim        <- Intermediate data that has been transformed.
    │   ├── processed      <- The final, canonical data sets for modeling.
    │   └── raw            <- The original, immutable data dump.
I am working on a project, in which the data are originated from sensors and are managed them via a web application dashboard. Additionally, I have been performed some JOINS on SQL database dump with the order to extract other features or data which I need to start to work.

What is the difference between raw data and external data?
The data which I describe the extract process above or the way in how do I get them to make that they are to be cataloged like raw data?

Why aren’t these considered like external data?

These will be considered external data whether I get them from other sources different to my organization which owns of the sensors and web application dashboard data administration?

About of raw data
They make approach especially to:

Don’t ever edit your raw data, especially not manually, and especially not in Excel. Don’t overwrite your raw data. Don’t save multiple versions of the raw data. Treat the data (and its format) as immutable. The code you write should move the raw data through a pipeline to your final analysis

I understand this best practice

To illustrate my question, I want to select some indexes from one dataset sample which I am working:

I read some raw dataset which I extract using SQL joins. The data are changed

Then, these are my raw data:
# I read some raw dataset
data = pd.read_csv('fruit-RawData.csv')
data.head()


    weight	date	            number	lat	     lng	      farmName
0	3.09	2012-07-27 07:08:58		15   57.766231 -16.762676	Totti
1	1.50	2012-07-27 07:09:01		15	57.766231 -16.762676	Totti
2	10.50	2012-07-27 07:09:02		15	57.766231 -16.762676	Totti
3	2.50	2012-07-27 07:09:04		15	57.766231 -16.762676	Totti
4	6.50	2012-07-27 07:09:06		15	57.766231 -16.762676	Totti 
If I select only the weight, date and number …
data = data[['weight','date','number']]
data.to_csv('fruits.csv', sep=',', header=True, index=False)
And I get:
	weight	date	           number
0	23.09	2012-07-27 07:08:58	5
1	30.50	2012-07-27 07:08:58	5
2	19.50	2012-07-27 07:08:58	5
3	25.50	2012-07-27 07:08:58	5
4	26.50	2012-07-27 07:08:58	5
These data subset could be considered like intermediate data which has been transformed, or still are raw data?

I unknow if these questions are valid.

Posts: 1

Participants: 1

Read full topic

About of original raw data and intermediate data has been transformed

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112