-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/pronounce_digits #150
base: master
Are you sure you want to change the base?
Changes from 2 commits
1ff2255
a58a716
bc14dae
42c0aa5
06386dc
fee093c
2dff502
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,6 +104,6 @@ venv.bak/ | |
.mypy_cache/ | ||
|
||
# VSCod(e/ium) | ||
.vscode/ | ||
.vscode* | ||
vscode/ | ||
*.code-workspace |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,8 @@ | |
# limitations under the License. | ||
# | ||
|
||
from math import modf | ||
|
||
from lingua_franca.lang.format_common import convert_to_mixed_fraction | ||
from lingua_franca.lang.common_data_en import _NUM_STRING_EN, \ | ||
_FRACTION_STRING_EN, _LONG_SCALE_EN, _SHORT_SCALE_EN, _SHORT_ORDINAL_EN, _LONG_ORDINAL_EN | ||
|
@@ -302,6 +304,43 @@ def _long_scale(n): | |
return result | ||
|
||
|
||
def pronounce_digits_en(number, places=2, all_digits=False): | ||
decimal_part = "" | ||
op_val = "" | ||
ChanceNCounter marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result = [] | ||
is_float = isinstance(number, float) | ||
if is_float: | ||
op_val, decimal_part = [part for part in str(number).split(".")] | ||
ChanceNCounter marked this conversation as resolved.
Show resolved
Hide resolved
|
||
decimal_part = pronounce_number_en( | ||
float("." + decimal_part), places=places).replace("zero ", "") | ||
ChanceNCounter marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
op_val = str(number) | ||
|
||
if all_digits: | ||
result = [pronounce_number_en(int(i)) for i in op_val] | ||
if is_float: | ||
result.append(decimal_part) | ||
result = " ".join(result) | ||
else: | ||
while len(op_val) > 1: | ||
idx = -2 if len(op_val) in [2, 4] else -3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without first reading this code I wrote the following tests:
I like that you go from the end rather than beginning so the final numbers can be read closer to what they actually are - "ninety six". However being a longer number, it ends up getting broken down into multiple groups of three so we get:
What's the intended outcome here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're aiming for speaking in two digit numbers, should we check for an odd number length, speak the first digit and then speak all remaining pairs? Something like: if len(op_val) % 2 == 1:
result.append(pronounce_number(op_val[0]))
op_val = op_val[1:]
remaining_pairs = # some code
for pair in remaining_pairs:
result.append(pronounce_number(pair)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems to be speaking in pairs slightly more often than intended. It doesn't really work on large numbers, but my intention was to "end with" three digit groupings in most cases, which just sounded most natural to me. I'm gonna go over the code again top to bottom tomorrow, but the gist is:
It's definitely bugged on large numbers atm. The above should be followed by "one two thirty four five sixty seven", but I'm getting "twelve thirty four five sixty seven". Once you're looking at 9+ digits, I don't think the function is much use without >>> assert(format.pronounce_digits(238513096, all_digits=True) == "two three eight five one three zero nine six")
>>> (edit: "tomorrow" to commence mid-afternoon UTC") |
||
back_digits = op_val[idx:] | ||
op_val = op_val[:idx] | ||
result = pronounce_number_en( | ||
int(back_digits)).split(" ") + result | ||
if op_val: | ||
result.insert(0, pronounce_number_en(int(op_val))) | ||
if is_float: | ||
result.append(decimal_part) | ||
no_no_words = list(_SHORT_SCALE_EN.values())[:5] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we specifically care about the first 5 values? Is this just an optimisation because the chances of the rest being there are so slim? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because it slices 2 or 3 digits at a time, the rest can't be there. Right now, I'm trying to remember why I included anything but 'hundred'. |
||
no_no_words.append('and') | ||
print(no_no_words) | ||
print(result) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. debug prints |
||
result = [word for word in result if word.strip() not in no_no_words] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any case where you think this might happen that we can test for? Or is it just a safety measure? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This happens anytime the input is longer than two digits. The algorithm acts by running The latter stray debug print (=P) is the result prior to this operation: >>> pronounce_digits(234534)
['two', 'hundred', 'and', 'thirty', 'four', 'five', 'hundred', 'and', 'thirty', 'four']
'two thirty four five thirty four'
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The strip, on the other hand, is probably unneeded. |
||
result = " ".join(result) | ||
return result | ||
|
||
|
||
def nice_time_en(dt, speech=True, use_24hour=False, use_ampm=False): | ||
""" | ||
Format a time to a comfortable human format | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pronounce_digits not pronounce_number