-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues with large number of datasets #106
Comments
It would certainly be possible to persist the catalogue data between restarts, but I don't think it's a trivial thing to do. I'd certainly consider it if we can't find an alternative solution, although I don't have a great deal of time for ncWMS development at the moment. The slowness of getting an image is interesting and I don't know what could be causing it. How large are your datasets? I'd love to get a local setup matching yours so I could investigate further, but I'm guessing that that would not be practical unless each dataset is very small. However, I do think that dynamic datasets could solve your problem - they are designed for the use case where you have a directory structure containing a vast amount of datasets and want them all to be accessible without individual configuration. You configure each root directory of datasets with 3 parameters:
To access the dynamic datasets, you need to set the
So in More often I will use the Godiva3 interface to view the datasets, by simply passing the Dynamic datasets are slower than configured datasets, since the metadata needs to be read when each dataset is accessed. It is also not possible to list all datasets up front, so some knowledge of the data structure is necessary. For a full dynamic data catalogue, I would recommend using THREDDS. It uses the EDAL libraries to provide WMS, although the stable version (4.6) uses an old version which may lack some features you want. |
Thanks for your prompt reply. I'll give it another try with the dynamic datasets and see if that solves the problem and we can find a way of keeping the knowledge of the data structure. Besides DATASET parameter, LAYERS parameter is still required. What should be the value for LAYERS? |
Apologies, However, |
Hi
I tried again the dynamic datasets and cannot make it work.
I have a file called CRU_1901.nc located at C:\Work\Data\FD
I set the Alias for dynamic service as local, Location as
C:\Work\Data\FD and the regex as .*
Calling GetMap with LAYERS=local%2FCRU_1901.nc%2Fprecipitation does not
work (the file contains a variable called precipitation)
Calling http://localhost:8080/ncWMS2/Godiva3.html?DATASET=local%2F* also
does not work.
…On 16/02/2018 13:12, Guy Griffiths wrote:
Apologies, |DATASET| is only required for specifying it through
Godiva, but not for general WMS requests. Instead you just specify the
layer as |dataset/variable|, e.g.:
|LAYERS=local/01-xyzt/synthetic_rectilinear_data.nc/temperature|.
However, |DATASET| is a convenient way to specify the dynamic dataset,
in which case |LAYERS| just needs to contain the variable ID within
that dataset. So for example, a |GetMap| request could have:
|DATASET=local/01-xyzt/synthetic_rectilinear_data.nc&LAYERS=temperature|
- this is exactly equivalent to the above.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#106 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Ai0XzcvEqTpPf5_Ec-sSf5Dkqn-r_1hSks5tVXDIgaJpZM4SIIkP>.
|
I haven't tried it on Windows, but the first thing I'd try is changing the backslashes to slashes. Let me know if that doesn't work and I'll try and get a Windows setup to see what I can do about debugging it. |
I made it work. My mistake, sorry. There was a missing colon after C when specifying the location I'll test this and let you know if we can use it. Thanks for all the help so far |
It seems we can make it work using dynamic datasets but will developing a proxy that can convert from layer names (the traditional way) to folder/file structure supported by dynamic datasets. |
My datasets are updated frequently but that is not reflected on the dynamic datasets resulting in an error when trying to get data added after the first access to the file. Is there a way to update the metadata information for dynamic datasets without restarting the webapp ? |
@mtsales - How are the datasets added to? The problem is that dynamic datasets are cached, and I have implemented some code which allows you to configure this cache and request that it is emptied, but there will still be issues if you are defining the dataset by a glob expression and adding files to it. |
The dynamic datasets are defined with "local" alias pointing to the top most folder containing data and using .* regex expression. |
I think that should be fine - there would be a potential issue if you were using a glob expression to aggregate multiple files, but using it for a single file won't interfere with emptying the dynamic dataset cache. |
Great. How can the cache for a dataset be emptied? |
In the admin interface, just under the dynamic datasets configuration, there is a box to configure cache settings for dynamic datasets. One of the options is a checkbox labelled "Empty cache". You'd need to check it and click the save button. Note that this code is implemented in the |
thanks.. I'll wait for next release. Is the metadata for a dataset kept int the cache until the timestamp of a file changes and the metadata re-read when a file is newer than the one cached? Or do we have always to manually empty the cache? |
It will need to be manually emptied. If you are updating datasets very regularly, you might be better just switching it off. |
any tentative date for the next release? |
Not currently. I'm working on some changes to EDAL for a separate project, and I'll do a release for that at some point in the next month or so. |
Thanks for the info. I'll wait for that. I have noticed that sometimes, when using dynamic datasets, ncWMS complains about missing file (see errors below) 2018-05-04 09:02:53 WARN WmsServlet:2742 - Wms Exception caught: "Requested menu for dataset: local/Volta/1b_TRMM/[tT][rR][mM][mM]_2018.[nN][cC] which does not exist on this server" from:uk.ac.rdg.resc.edal.wms.WmsServlet:1044 2018-05-04 09:02:53 WARN CdmUtils:439 - Using relative path for a dataset. This may cause unpredictable or platform-dependent behaviour. The use of absolute paths is recommended 2018-05-04 09:02:53 WARN WmsServlet:2742 - Wms Exception caught: "The layer local/Volta/1b_TRMM/[tT][rR][mM][mM]_2018.[nN][cC]/ was not found on this server" from:uk.ac.rdg.resc.edal.wms.util.WmsUtils:262 |
I'll have a look into it and see if I can replicate it. What is |
local corresponds to E:\FloodDraughtPortal on a windows machine that contains hierarchy of subfolders with large number of datasets. I could not reproduce this using normal datasets but it is getting errors very frequently with dynamaic datasets. I removed the glob expressions and still run into problems randomly. I got a bit further with the investigation and now I also have a stack trace (see attached) I'm also attaching the nc file that caused this exception |
The fact that's it's warning you about using a relative path suggests that it's failing to detect you're referring to a dynamic dataset, but I haven't been able to reproduce this problem. I'll be doing a new release of EDAL/ncWMS later today. Assuming you see the same issue with the new version, could you please post the whole section of the logs when the error occurs (not just the stack trace part, but anything within say a minute of it happening)? |
I tested the latest release and the problem persists. Please note I can no longer reproduce the error : 2018-05-04 09:02:53 WARN CdmUtils:439 - Using relative path for a dataset. This may cause unpredictable or platform-dependent behaviour. The use of absolute paths is recommended 2018-05-04 09:02:53 WARN WmsServlet:2742 - Wms Exception caught: "The layer local/Volta/1b_TRMM/[tT][rR][mM][mM]_2018.[nN][cC]/ was not found on this server" from:uk.ac.rdg.resc.edal.wms.util.WmsUtils:262 since I have stopped using glob expressions. Now what I get is random reading errors. For the same files, sometimes it reads and renders correctly, sometimes it throws exceptions as per log files attached. Usually there are 21 GetMap requests for each dataset but please note a peculiar thing in lines from 258 to 264. There are only 7 errors for this dataset and the image was only partially rendered due to these 7 errors These logs were produced using version 2.3.1 and not the latest. With the latest release the exception is different. Please see ncwms_latest.log |
Any news on this? Is more information needed? |
Sorry, I haven't had time to work on this, I'm currently very busy with a number of other projects. It's still on my todo list, but ncWMS is fairly low priority at the moment. |
I have been using ncWMS in several occasions and I'm very pleased with it. Thanks for the great work!
Now I have a project where we have 12 000 datasets and I'm facing some issues:
1 - The loading time after a server restart takes more than 1 hour
2 - Getting the image of a dataset is too slow (even a small dataset)
I have looked at the dynamic datasets but could not figure out how it works so I'm not sure this solves the issue.
In any case I was wondering if the issues above have a solution.
For 1 I was thinking if it is possible to implement persistence of the catalogue after the firt load of config.xml, including the update time of the datasets and when reloading the server, load the persisted catalogue and only reload the datasets that have files with modified time later than the last update in the catalogue.
Regarding 2 I noticed in the code that a hash table is used to get dataset ids based on the name mas given what I experience in terms of slow rendering time I was wondering if somewhere in the code the hash table is not used but a loop over the datasets?!?
If solving 1 and 2 is not possible, do you think dynamic datasests will solve my problem and if so provide guidance how to set them up with netcdfs files located in various folders in Windows?
The text was updated successfully, but these errors were encountered: