task150_afs_argument_quality_gun_control.json #615

danyaljj · 2021-11-14T00:01:44Z

Invalid|Valid classification task is heavily skewed.

@liusiyi641, wondering if you can make the instances of this task more balanced?

RushangKaria · 2021-11-20T23:05:13Z

@liusiyi641 @danyaljj i have the source code for this task (i created it). LMK if you want the source code or want me to take it instead.

(See edit below).

RushangKaria · 2021-11-22T20:29:48Z

EDIT: i think i mistook the task name with another task. Sorry for the confusion!

Palipoor · 2022-02-08T18:05:27Z

I took a look at the data. For each argument, there are five yes/no votes. It seems like the "invalid" arguments in the task are the ones that scored 0 or 1 or 2, and the ones that scored 3 or 4 or 5 are the "valid" ones.
If I downsample the valid ones, the whole dataset will be 160 instances (or a bit more if we compromise some entropy.) I can also change the criteria, and label the ones scored 3 as "invalid" (I read some of them and they weren't good arguments.), we can have a ~500 instance task.
@danyaljj What do you think?

danyaljj · 2022-02-09T01:57:57Z

Hmm ... it's a bit difficult for me to judge this without seeing the data/examples. Do you think the task is well-defined? Can average humans solve it (i.e., score relatively high, but not necessarily perfect)?

liusiyi641 · 2022-02-09T03:33:23Z

Sorry I just saw this. I remember I binarized the task so that it was more well-defined and more comprehensible for humans. I agree that changing the criteria could be a good idea @Palipoor. Some of the examples scored three aren't good enough indeed.

Some examples of arguments with scores 3: "This is akin to someone learning the value of hard work while working at a fast food place and then applying that value when they work professionally as a furniture maker." ; "Are you going to suggest that the lawmakers shouldn't be trusted to be armed for their own defense?".

Palipoor · 2022-02-09T19:01:48Z

@danyaljj I think the task is well-defined. We can't expect humans to give a 1-5 score, but definitely, the good and bad arguments are distinguishable.
@liusiyi641 Can you share the scripts you used with me so I can change the criteria for all the tasks from this dataset? Or can you do it yourself?

danyaljj · 2022-02-25T20:52:01Z

@danyaljj I think the task is well-defined. We can't expect humans to give a 1-5 score, but definitely, the good and bad arguments are distinguishable.

Just seeing this, sorry! You're basically suggesting we collapse the 5 labels into two (good and bad) labels. Right?

Palipoor · 2022-03-01T01:09:04Z

right!

danyaljj added the urgent label Nov 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task150_afs_argument_quality_gun_control.json #615

task150_afs_argument_quality_gun_control.json #615

danyaljj commented Nov 14, 2021

RushangKaria commented Nov 20, 2021 •

edited

Loading

RushangKaria commented Nov 22, 2021

Palipoor commented Feb 8, 2022

danyaljj commented Feb 9, 2022 •

edited

Loading

liusiyi641 commented Feb 9, 2022

Palipoor commented Feb 9, 2022

danyaljj commented Feb 25, 2022

Palipoor commented Mar 1, 2022

task150_afs_argument_quality_gun_control.json #615

task150_afs_argument_quality_gun_control.json #615

Comments

danyaljj commented Nov 14, 2021

RushangKaria commented Nov 20, 2021 • edited Loading

RushangKaria commented Nov 22, 2021

Palipoor commented Feb 8, 2022

danyaljj commented Feb 9, 2022 • edited Loading

liusiyi641 commented Feb 9, 2022

Palipoor commented Feb 9, 2022

danyaljj commented Feb 25, 2022

Palipoor commented Mar 1, 2022

RushangKaria commented Nov 20, 2021 •

edited

Loading

danyaljj commented Feb 9, 2022 •

edited

Loading