Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command #4213

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 27 additions & 12 deletions docs/dml-of-carbondata.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,28 @@ CarbonData DML statements are documented here,which includes:
This command is used to load csv files to carbondata, OPTIONS are not mandatory for data loading process.

```
LOAD DATA INPATH 'folder_path'
LOAD DATA INPATH 'folder_path' [ OVERWRITE ]
INTO TABLE [db_name.]table_name
OPTIONS(property_name=property_value, ...)
```
**NOTE**:
* Use 'file://' prefix to indicate local input files path, but it just supports local mode.
* If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
* Use 'file://' prefix to indicate local input files path, but it just supports local mode.

* If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.

* [ OVERWRITE ] :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [ OVERWRITE ] :
* If the OVERWRITE keyword is used, then it will overwrite the existing data in the table with new data.


​ By default, new data is appended to the table. If `OVERWRITE` is used, the table is instead overwritten with new data.

​ Example:

```sql
CREATE TABLE carbon_load_overwrite(id int, name string, city string, age int)
Copy link
Contributor

@Indhumathi27 Indhumathi27 Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add example here. Since it is mentioned in syntax, that could be enough

STORED AS carbondata
LOAD DATA LOCAL INPATH 'filepath.csv' overwrite into table carbon_load_overwrite
```



**Supported Properties:**

Expand Down Expand Up @@ -266,7 +281,7 @@ CarbonData DML statements are documented here,which includes:
numPartitions = total size of input data / splitSize
```
The default value is 3, and the range is [1, 300].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert all these changes below in this PR. Space related changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will do it.

```
OPTIONS('SCALE_FACTOR'='10')
```
Expand Down Expand Up @@ -322,9 +337,9 @@ CarbonData DML statements are documented here,which includes:

Stage input files are data files written by external application (such as Flink). These files
are committed but not loaded into the table.

User can use this command to insert them into the table, thus making them visible for a query.

```
INSERT INTO <CARBONDATA TABLE> STAGE OPTIONS(property_name=property_value, ...)
```
Expand Down Expand Up @@ -357,10 +372,10 @@ CarbonData DML statements are documented here,which includes:
Examples:
```
INSERT INTO table1 STAGE

INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5')
Note: This command uses the default file order, will insert the earliest stage files into the table.

INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5', 'batch_file_order'='DESC')
Note: This command will insert the latest stage files into the table.
```
Expand Down Expand Up @@ -404,10 +419,10 @@ CarbonData DML statements are documented here,which includes:
## UPDATE AND DELETE

Since the data stored in a file system like HDFS is immutable, the update and delete in carbondata are done via maintaining two files namely:

* Insert Delta: Stores newly added rows (CarbonData file format)
* Delete Delta: Store RowId of rows that are deleted (Bitmap file format)

### UPDATE

This command will allow to update the CarbonData table based on the column expression and optional filter conditions.
Expand Down Expand Up @@ -480,13 +495,13 @@ CarbonData DML statements are documented here,which includes:
```
DELETE FROM carbontable WHERE column1 IN (SELECT column11 FROM sourceTable2 WHERE column1 = 'USA')
```

### DELETE STAGE

This command allows us to delete the data files (stage data) which is already loaded into the table.
```
DELETE FROM TABLE [db_name.]table_name STAGE OPTIONS(property_name=property_value, ...)
```
```
**Supported Properties:**

| Property | Description |
Expand Down