Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task150_afs_argument_quality_gun_control.json #615

Open
danyaljj opened this issue Nov 14, 2021 · 8 comments
Open

task150_afs_argument_quality_gun_control.json #615

danyaljj opened this issue Nov 14, 2021 · 8 comments
Labels

Comments

@danyaljj
Copy link
Contributor

Invalid|Valid classification task is heavily skewed.

@liusiyi641, wondering if you can make the instances of this task more balanced?

@RushangKaria
Copy link
Contributor

RushangKaria commented Nov 20, 2021

@liusiyi641 @danyaljj i have the source code for this task (i created it). LMK if you want the source code or want me to take it instead.

(See edit below).

@RushangKaria
Copy link
Contributor

EDIT: i think i mistook the task name with another task. Sorry for the confusion!

@Palipoor
Copy link
Contributor

Palipoor commented Feb 8, 2022

I took a look at the data. For each argument, there are five yes/no votes. It seems like the "invalid" arguments in the task are the ones that scored 0 or 1 or 2, and the ones that scored 3 or 4 or 5 are the "valid" ones.
If I downsample the valid ones, the whole dataset will be 160 instances (or a bit more if we compromise some entropy.) I can also change the criteria, and label the ones scored 3 as "invalid" (I read some of them and they weren't good arguments.), we can have a ~500 instance task.
@danyaljj What do you think?

@danyaljj
Copy link
Contributor Author

danyaljj commented Feb 9, 2022

Hmm ... it's a bit difficult for me to judge this without seeing the data/examples. Do you think the task is well-defined? Can average humans solve it (i.e., score relatively high, but not necessarily perfect)?

@liusiyi641
Copy link
Contributor

Sorry I just saw this. I remember I binarized the task so that it was more well-defined and more comprehensible for humans. I agree that changing the criteria could be a good idea @Palipoor. Some of the examples scored three aren't good enough indeed.

Some examples of arguments with scores 3: "This is akin to someone learning the value of hard work while working at a fast food place and then applying that value when they work professionally as a furniture maker." ; "Are you going to suggest that the lawmakers shouldn't be trusted to be armed for their own defense?".

@Palipoor
Copy link
Contributor

Palipoor commented Feb 9, 2022

@danyaljj I think the task is well-defined. We can't expect humans to give a 1-5 score, but definitely, the good and bad arguments are distinguishable.
@liusiyi641 Can you share the scripts you used with me so I can change the criteria for all the tasks from this dataset? Or can you do it yourself?

@danyaljj
Copy link
Contributor Author

@danyaljj I think the task is well-defined. We can't expect humans to give a 1-5 score, but definitely, the good and bad arguments are distinguishable.

Just seeing this, sorry! You're basically suggesting we collapse the 5 labels into two (good and bad) labels. Right?

@Palipoor
Copy link
Contributor

Palipoor commented Mar 1, 2022

right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants