Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speller suggestion order different from command line, does not follow weight #8

Open
snomos opened this issue Mar 18, 2019 · 15 comments
Assignees

Comments

@snomos
Copy link
Member

snomos commented Mar 18, 2019

This is the order using a modes shell script (including suggestion weight):

$ echo "Eat ge mii liiko go skuvllat bidjojuvvojit vuostálagaid." | tools/grammarcheckers/modes/smegram.mode
"<Eat>"
	"ii" <aux> V IV Neg Ind Pl1 <W:0.0> @+FAUXV
: 
"<ge>"
	"ge" Pcle <W:0.0> @PCLE
: 
"<mii>"
	"mii" Pron Indef Sg Nom <W:0.0> @SUBJ>
	"mii" Pron Rel Sg Nom <W:0.0> @SUBJ>
: 
"<liiko>"
	"liikot" <mv> V <EX-Nom-Ani> <TH-Inf> <TH-jus> <TH-go> <TH-ahte> <TH-Ill-Any> <XT-Adv-Xt> IV Ind Prs Sg3 <W:0.0> @FS-<ADVL
: 
"<go>"
	"go" CS <W:0.0> @CVP
: 
"<skuvllat>"
	"skuvla" N Sem/Edu_Org Pl Nom <W:0.0> @SUBJ>
: 
"<bidjojuvvojit>"
	"bidjat" Ex/V Ex/TV Der/PassL V IV Ind Prs Pl3 <W:9.65234> <WA:11.6523> <spelled> "<biddjojuvvojit>" &SUGGESTWF &typo
typo
	"bidjat" Ex/V Ex/TV Der/PassL V IV Ind Prt Sg2 <W:9.65234> <WA:11.6523> <spelled> "<biddjojuvvojit>" &SUGGESTWF &typo
typo
	"bieđđat" Ex/V Ex/IV Der/PassL V IV Ind Prs Pl3 <W:19.3018> <WA:17.3018> <spelled> "<biđđojuvvojit>" &SUGGESTWF &typo
typo
	"biđđit" Ex/V Ex/TV Der/PassL V IV Ind Prs Pl3 <W:19.3018> <WA:17.3018> <spelled> "<biđđojuvvojit>" &SUGGESTWF &typo
typo
	"biddut" Ex/V Ex/IV Der/PassL V IV Ind Prs Pl3 <W:23.6084> <WA:15.6084> <spelled> "<biddojuvvojit>" &SUGGESTWF &typo
typo
: 
"<vuostálagaid>"
	"vuostálagaid" Adv <W:0.0>
	"vuostálat" N Sem/Plc Err/Orth Pl Acc <W:0.0> @<OBJ
	"vuostálat" N Sem/Plc Pl Acc <W:0.0> @<OBJ
"<.>"
	"." CLB <W:0.0>
:\n

And this is the output of the divvun-checker tool:

$ echo "Eat ge mii liiko go skuvllat bidjojuvvojit vuostálagaid." | divvun-checker -a tools/grammarcheckers/se.zcheck | jq .
{
  "errs": [
    [
      "bidjojuvvojit",
      29,
      42,
      "typo",
      "Ii leat sátnelisttus",
      [
        "biddjojuvvojit",
        "biđđojuvvojit",
        "biddojuvvojit"
      ],
      "Čállinmeattáhusat"
    ]
  ],
  "text": "Eat ge mii liiko go skuvllat bidjojuvvojit vuostálagaid."
}

The order of the speller suggestions is the same, and follows the weight of the suggestions. But then in LO it looks like this:

Skjermbilde 2019-03-18 kl  17 08 11

https://filebin.net/mvvnmqgal0 will soon contain the newest se.zcheck used for this (as soon as filebin decides to cooperate).

Otherwise using the latest of everything, AFAIK.

@snomos
Copy link
Member Author

snomos commented Mar 18, 2019

New se.zcheck is available.

@unhammer
Copy link
Member

Eg får
bidjo

og

18-03-2019:21:36:54,423 INFO     [GrammarChecker.py:80] Checking 'Eat ge mii liiko go skuvllat bidjojuvvojit vuostálagaid.', nStartOfSentencePos=0, nSuggestedBehindEndOfSentencePosition=56
18-03-2019:21:36:54,469 INFO     [GrammarChecker.py:85] dError on form=bidjojuvvojit at (29,42) replacements: ('biddjojuvvojit', 'biđđojuvvojit', 'biddojuvvojit')
18-03-2019:21:36:54,469 INFO     [GrammarChecker.py:101] dError on form=bidjojuvvojit at (29,42) replacements: ('biddjojuvvojit', 'biđđojuvvojit', 'biddojuvvojit')

når eg køyrer med

PYUNO_LOGLEVEL=ARGS DIVVUN_DEBUG=1

Får du feil følgje òg i den debug-infoen?

@snomos
Copy link
Member Author

snomos commented Mar 18, 2019

Ja, for meg er resultatet det same med og utan debug:

INFO:root:Checking 'Eat ge mii liiko go skuvllat bidjojuvvojit vuost\xe1lagaid. ', nStartOfSentencePos=0, nSuggestedBehindEndOfSentencePosition=56
INFO:root:dError on form=bidjojuvvojit at (29,42) replacements: ('biddjojuvvojit', 'biddojuvvojit', 'bi\u0111\u0111ojuvvojit')
INFO:root:dError on form=bidjojuvvojit at (29,42) replacements: ('biddjojuvvojit', 'biddojuvvojit', 'bi\u0111\u0111ojuvvojit')

@unhammer
Copy link
Member

unhammer commented Mar 18, 2019 via email

@snomos
Copy link
Member Author

snomos commented Mar 19, 2019

Det ser heilt likt ut med det eg har (bortsett frå at det er /usr/local/share/ på mac-en):

DEBUG:root:DivvunHandlePool.__openHandleWithVariant
INFO:root:Listing langs including getDictionaryPath=/Users/smo036/Library/Application Support/LibreOffice/4/user/uno_packages/cache/uno_packages/lu898709sq3yg.tmp_/divvun.oxt/divvun
INFO:root:Found 3 languages: ['fo', 'se', 'sma']
INFO:root:len: 1
INFO:root:specpath=/usr/local/share/voikko/4/se.zcheck
INFO:root:Loading language se with spec from /usr/local/share/voikko/4/se.zcheck
INFO:root:Checking 'Eat ge mii liiko go skuvllat bidjojuvvojit vuost\xe1lagaid.', nStartOfSentencePos=0, nSuggestedBehindEndOfSentencePosition=56
INFO:root:dError on form=bidjojuvvojit at (29,42) replacements: ('biddjojuvvojit', 'biddojuvvojit', 'bi\u0111\u0111ojuvvojit')
INFO:root:dError on form=bidjojuvvojit at (29,42) replacements: ('biddjojuvvojit', 'biddojuvvojit', 'bi\u0111\u0111ojuvvojit')
INFO:root:return result, errors: 1

@unhammer
Copy link
Member

Men i eksempelet ditt med divvun-checker har du ./tools/grammarcheckers/se.zcheck – er den identisk med /usr/local/share/voikko/4/se.zcheck ?

@snomos
Copy link
Member Author

snomos commented Mar 19, 2019

Ja, det er same fila.

@unhammer
Copy link
Member

Eg klarer ikkje å sjå kor i koden dette skulle skje (alt er vektorar ol., ingen uordna mengder), litt vanskeleg å finna feilkjelden når det ikkje skjer her, men eg skal prøva om eg klarer å reprodusera på bøttemacen.

@snomos
Copy link
Member Author

snomos commented Mar 19, 2019

Eg klarer ikkje å sjå kor i koden dette skulle skje (alt er vektorar ol., ingen uordna mengder), litt vanskeleg å finna feilkjelden når det ikkje skjer her, men eg skal prøva om eg klarer å reprodusera på bøttemacen.

ok 👍

@unhammer
Copy link
Member

unhammer commented Apr 29, 2019

EDIT: Eg fekk -dd- sist på bøttemacen, som du får i terminalen. Rett i python på bøttemacen får eg òg den rekkefølgen:

PYTHONPATH="/Users/unhammer/Library/Application Support/LibreOffice/4/user/uno_packages/cache/uno_packages/lu828941joy4o.tmp_/divvun(1).oxt/pythonpath" /Applications/LibreOffice.app/Contents/Frameworks/LibreOfficePython.framework/Versions/3.5/bin/python3.5 -c 'import libdivvun; s=libdivvun.ArCheckerSpec("/Users/unhammer/.config/voikko/4/se.zcheck");smegram=s.getChecker("smegram",False);print([(e.rep,e.msg,e.beg,e.end,e.form,e.dsc,e.err) for e in libdivvun.proc_errs_bytes(smegram,"Eat ge mii liiko go skuvllat bidjojuvvojit vuostálagaid.")]);import _libdivvun;print("    "+ _libdivvun.__file__)'
[(('biddjojuvvojit', 'biđđojuvvojit', 'biddojuvvojit'), 'Čállinmeattáhusat', 29, 42, 'bidjojuvvojit', 'Ii leat sátnelisttus', 'typo')] 
    /Users/unhammer/Library/Application Support/LibreOffice/4/user/uno_packages/cache/uno_packages/lu828941joy4o.tmp_/divvun(1).oxt/pythonpath/_libdivvun.cpython-3.5m.so

mens når eg kompilerte sjølv på bøttemacen gav pythonbiblioteket -đđ- sist:

$ python3 -c 'import libdivvun; s=libdivvun.ArCheckerSpec("/Users/unhammer/.config/voikko/4/se.zcheck");smegram=s.getChecker("smegram",F
alse);print([(e.rep,e.msg,e.beg,e.end,e.form,e.dsc,e.err) for e in libdivvun.proc_errs_bytes(smegram,"Eat ge mii liiko go skuvllat bidjo
juvvojit vuost\303lagaid.")]);import _libdivvun;print("    "+ _libdivvun.__file__)'                                                    
[(('biddjojuvvojit', 'biddojuvvojit', 'biđđojuvvojit'), 'Čállinmeattáhusat', 29, 42, 'bidjojuvvojit', 'Ii leat sátnelisttus', 'typo')] 
    /Users/unhammer/src/libdivvun/python/build/lib.macosx-10.14-x86_64-3.6/_libdivvun.cpython-36m-darwin.so 

Så prøvde eg å få travis sjølv til å køyra same se.zcheck i løpet av kompileringa, og travis gir same svar (-dd- sist, som i terminalen din) både frå divvun-checker og python-/pluginkoden:

+/usr/local/bin/divvun-checker -a se.zcheck -n smegram
{"errs":[["bidjojuvvojit",29,42,"typo","Ii leat sátnelisttus",["biddjojuvvojit","biđđojuvvojit","biddojuvvojit"],"Čállinmeattáhusat"]],"text":"Eat ge mii liiko go skuvllat bidjojuvvojit vuostálagaid."}
+python3 -c 'import libdivvun; s=libdivvun.ArCheckerSpec("/Users/travis/build/divvun/libdivvun/pythonpath/se.zcheck");smegram=s.getChecker("smegram",False);print([(e.rep,e.msg,e.beg,e.end,e.form,e.dsc,e.err) for e in libdivvun.proc_errs_bytes(smegram,"Eat ge mii liiko go skuvllat bidjojuvvojit vuostálagaid.")]);import _libdivvun;print("    "+ _libdivvun.__file__)'
[(('biddjojuvvojit', 'biđđojuvvojit', 'biddojuvvojit'), 'Čállinmeattáhusat', 29, 42, 'bidjojuvvojit', 'Ii leat sátnelisttus', 'typo')]
    /Users/travis/build/divvun/libdivvun/pythonpath/_libdivvun.cpython-35m-darwin.so

(frå https://travis-ci.org/divvun/libdivvun/jobs/525863215#L3452 )

@unhammer
Copy link
Member

unhammer commented Apr 29, 2019

Det ser ut som cg-spell gir -dd- sist på
https://travis-ci.org/divvun/libdivvun/jobs/525919141#L3623

"<bidjojuvvojit>"
	"bidjojuvvojit" ?
	"bidjat" Ex/V Ex/TV Der/PassL V IV Ind Prs Pl3 <W:9.65234> <WA:11.6523> <spelled> "<biddjojuvvojit>"
	"bidjat" Ex/V Ex/TV Der/PassL V IV Ind Prt Sg2 <W:9.65234> <WA:11.6523> <spelled> "<biddjojuvvojit>"
	"bieđđat" Ex/V Ex/IV Der/PassL V IV Ind Prs Pl3 <W:19.3018> <WA:17.3018> <spelled> "<biđđojuvvojit>"
	"biđđit" Ex/V Ex/TV Der/PassL V IV Ind Prs Pl3 <W:19.3018> <WA:17.3018> <spelled> "<biđđojuvvojit>"
	"biddut" Ex/V Ex/IV Der/PassL V IV Ind Prs Pl3 <W:23.6084> <WA:15.6084> <spelled> "<biddojuvvojit>"

cg-spell sjølv antar at ZHfstOspeller::suggest gir forslag i rett rekkefølge, og endrar ikkje på rekkefølge sjølv.

Det ser ut som funksjonen burde gi rett rekkefølge: Me får ut ein heap som skal sørga for at elementet me tar ut alltid har lågast vekt av gjenverande element:

            //! @brief construct an ordered set of corrections for misspelled
            //!        word form.
            OSPELL_API CorrectionQueue suggest(const std::string& wordform);
typedef std::priority_queue<StringWeightPair,
                            std::vector<StringWeightPair>,
                            StringWeightComparison> CorrectionQueue;

der

class StringWeightComparison
/* results are reversed by default because greater weights represent
   worse results - to reverse the reversal, give a true argument*/

{
    bool reverse;
public:
    StringWeightComparison(bool reverse_result=false):
        reverse(reverse_result)
        {}
    
    bool operator() (StringWeightPair lhs, StringWeightPair rhs)
        { // return true when we want rhs to appear before lhs
            if (reverse) {
                return (lhs.second < rhs.second);
            } else {
                return (lhs.second > rhs.second);
            }
        }
};

@unhammer
Copy link
Member

Er staving skrudd av i grammarchecker-release? Eg får ikkje blå strek i det heile lenger.

@snomos
Copy link
Member Author

snomos commented Feb 24, 2020

Nei, skal ikkje vera det. smegramrelease er standard, og inkluderer stavekontrollen.

@unhammer
Copy link
Member

Hm, eg trur kanskje det at voikko er installert globalt for LO på bøttemacen overstyrer divvun, vanskeleg å testa då. Får høyra med Trond om han kan fjerna.

@unhammer
Copy link
Member

Installerte LO på min brukar – men eg ser me no berre får biddjojuvvojit som forslag uansett (sjølv frå kommandolinja), så då må me kanskje ha ein ny test case …

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants