Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the dataset search search on more fields #93

Open
andylolz opened this issue Nov 15, 2018 · 20 comments
Open

Make the dataset search search on more fields #93

andylolz opened this issue Nov 15, 2018 · 20 comments
Labels
enhancement New feature or request
Milestone

Comments

@andylolz
Copy link
Collaborator

Reported by Matt Geddes on discuss:

[…] because it is so easy to search for a publisher via the browser plugin, I tend to use that rather than going to the registry, but often it means I need to know the ‘correct name’ that the publisher is using. Perhaps this could search a few more fields e.g. the publisher country? For example - to find the German Foreign Ministry file, I tried BMZ, Bundes…, Deutschland, German…and finally ‘Germany’ before I got it - whereas if it could return all publishers based in Germany, it would be easier in many cases.

@andylolz andylolz added the enhancement New feature or request label Nov 15, 2018
@matmaxgeds
Copy link

Thanks for this @andylolz, let me know if there is anything I can help with - explanations, testing etc

@andylolz
Copy link
Collaborator Author

andylolz commented Nov 15, 2018

Great! So the issue is with how I’m using the registry API. The query that’s run at the moment is:

var tmpl = 'https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&qf=title&q='

…so e.g.:
https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&qf=title&q=germany

So that’s just looking at the title. If you can figure out a better query, let me know! Or if you have a list of the fields that should be searched.

The CKAN docs are here: https://docs.ckan.org/en/2.8/api/index.html
The solr docs are here: https://lucene.apache.org/solr/guide/7_3/common-query-parameters.html

@andylolz andylolz added this to the Priority milestone Nov 15, 2018
@matmaxgeds
Copy link

Just a note to say that we are making some progress one this e.g. https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&q=BMZ returns files with the search string located outside of the title field - but we are now need to narrow down what it returns e.g. https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&q="DE" also returns files with the word 'description' in the returned data which we don't want. More soon.....

@matmaxgeds
Copy link

Hi Andy - got a pull request coming for this coming from @kndm - a programmer I work with - we have done a few quick tests of a modified plugin and think it works well e.g. "bmz" now picks up the Germany file, "GB" returns all the orgs based in the UK etc, without being information overload or too many false positives. But you might decide that it is better with a narrower search field, or perhaps some other way.....let us know

@andylolz
Copy link
Collaborator Author

andylolz commented Nov 22, 2018

Oh, nice! Looks cool – I’ll test it out very shortly.

@andylolz
Copy link
Collaborator Author

Fixed in #96.

@matmaxgeds
Copy link

@andylolz @kndm - in 1.3.1 a search for 'asdb' isn't bringing up the Asian Development Bank file....i.e. https://iatiregistry.org/api/3/action/package_show?id=asdb-org - which includes the string 'asdb' several times - have we missed a search field?

@andylolz
Copy link
Collaborator Author

andylolz commented Nov 27, 2018

Just checked, and this was the case at 1.3.0 too – so (thankfully!) unrelated to that change.

organization_name should work here, since that exactly equals “asdb” in this case.

' OR organization_name:"{}"' +

I’m not sure why that isn’t working! I was a bit suspicious of the underscore separator before merging this PR, but I tested it and it did seem to be doing the right thing. So I’m at a loss, I’m afraid!

It might be worth us checking with CKAN developers (or even on a solr mailing list) to find out the best search string here, since this is not an IATI-specific problem.

@kndm
Copy link
Contributor

kndm commented Nov 27, 2018 via email

@kndm
Copy link
Contributor

kndm commented Nov 27, 2018

Upon further inspection it seems some of the fields are not mapped properly, i.e organization_name may not be the name of the field for organization -> name (key)

@andylolz do you happen to have any leads on to where I could better see documentation for these fields returned? :)

@andylolz
Copy link
Collaborator Author

@kndm I added a couple of links above:

The CKAN docs are here: https://docs.ckan.org/en/2.8/api/index.html
The solr docs are here: https://lucene.apache.org/solr/guide/7_3/common-query-parameters.html

But I think you’d be better off asking a solr mailing list, or possibly a CKAN mailing list.

If you’re happy to keep looking into this, that would be great – I’ll be happy to review and merge a PR.

@andylolz
Copy link
Collaborator Author

andylolz commented Dec 10, 2018

Reopening this, since it still needs work (though thanks for the improvements so far, @kndm and @matmaxgeds!)

@andylolz andylolz reopened this Dec 10, 2018
@andylolz
Copy link
Collaborator Author

This was raised again recently by two users separately, both times with the specific example of AfDB / African Development Bank. The former search works, the latter doesn’t (despite it being listed as the organisation name).

@andylolz
Copy link
Collaborator Author

andylolz commented Dec 17, 2018

The CKAN-dev mailing list page suggests searching the archive via:
https://www.google.com/search?q=%22%5Bckan-dev%5D%22+site%3Alists.okfn.org

So e.g. this or perhaps this.

There’s plenty of reading material there – I’d bet the answer lies within!

@matmaxgeds
Copy link

@kndm happy for you to skip a bit of Somalia work to have another look at this - maybe you also got a reply to your post on the CKAN forums?

@andylolz
Copy link
Collaborator Author

I’ve posted the following: https://lists.okfn.org/pipermail/ckan-dev/2018-December/023005.html

Does that look okay?

Fingers crossed for a response!

@andylolz andylolz pinned this issue Dec 17, 2018
@kndm
Copy link
Contributor

kndm commented Dec 18, 2018 via email

@andylolz
Copy link
Collaborator Author

Okay, so it seems like the answer is: this isn’t possible without changes to the registry API :(

@matmaxgeds
Copy link

Ooof/thanks for the detective work - is that something we can request changes to, as from my unenlightened position it is hard to understand why the org title field can't be queried? I guess the alternative is to download all the org files ourselves which isn't particularly appealing.

@andylolz
Copy link
Collaborator Author

andylolz commented Dec 20, 2018

from my unenlightened position it is hard to understand why the org title field can't be queried

Yep, same.

download all the org files ourselves which isn't particularly appealing

That would work, but I’m really not keen to do it because I think the registry API should be able to handle it. I’ve raised a ticket on the registry github (IATI/ckanext-iati#226), asking about the possibility of a plugin.

@andylolz andylolz unpinned this issue Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants