Skip to content

Can't load xarray from certain URL #8620

@chudlerk

Description

@chudlerk

What happened?

Normally, if there is a NetCDF file hosted online somewhere, I can just put the url into the open_dataset function, and it works great.
i.e.
da = xr.open_dataset('https://www.someurl.com/data/file.nc')

However, If I try to download a file from this website, I get an error.

For example, scrolling down to "1991–2020 Monthly Normals", right clicking on "Precipitation", and copying the link address...

da = xr.open_dataset('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc')

Leads to this long error (see below)

If I just download the file to disk by clicking on the link on the page, and then do xr.open_dataset on the path of the downloaded file, it works just fine.

What did you expect to happen?

NetCDF file to be read in as a Dataset when passing the URL to xr.open_dataset, as works with other URLs

Minimal Complete Verifiable Example

import xarray as xr

da = xr.open_dataset('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\file_manager.py:211 in _acquire_with_cache_info
    file = self._cache[self._key]

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\lru_cache.py:56 in __getitem__
    value = self._cache[key]

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '172372dd-6014-42db-bbd6-c1f17be389be']


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  Cell In[14], line 1
    da = xr.open_dataset('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc')

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\api.py:570 in open_dataset
    backend_ds = backend.open_dataset(

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:602 in open_dataset
    store = NetCDF4DataStore.open(

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:400 in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:347 in __init__
    self.format = self.ds.data_model

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:409 in ds
    return self._acquire()

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:403 in _acquire
    with self._manager.acquire_context(needs_lock) as root:

  File ~\AppData\Local\miniconda3\lib\contextlib.py:119 in __enter__
    return next(self.gen)

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\file_manager.py:199 in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\file_manager.py:217 in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)

  File src\netCDF4\_netCDF4.pyx:2353 in netCDF4._netCDF4.Dataset.__init__

  File src\netCDF4\_netCDF4.pyx:1963 in netCDF4._netCDF4._ensure_nc_success

OSError: [Errno -90] NetCDF: file not found: b'https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc'


syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <!DOCTYPE^ html><html lang="en"><head><!-- Document specific SSI statements --><meta http-equiv="content-type" content="text/html; charset=UTF-8" /><link rel="shortcut icon" href="/Images/favicon.ico" /><title>Error 404: Not Found</title><meta name="keywords" content=", oceanography,ocean,data,archive,marine,coast,temperature,salinity,buoy,ocean climate,world ocean atlas,nitrate,phosphate,silicate,CTD,XBT,ADCP,SST,circulation,currents,sea level,altimetry,chlorophyll,plankton,ocean chemistry,ocean physics,ocean biology,ocean profiles,ocean time series,GTSPP,WOCE,JGOFS,World Data Center,alkalinity,pH,nitrite,dissolved oxygen,satellite,remote sensing,wave height,GODAR,NODC" /><meta name="Description" content="NOAA's National Centers for Environmental Information (NCEI) are responsible for hosting and providing public access to one of the most significant archives for environmental data on Earth with over 20 petabytes of comprehensive oceanic, atmospheric, and geophysical data. /errors/notfound.html" /><meta name="DC.title" content="Error 404: Not Found" /><meta name="DC.description" content="Home page of the National Centers for Environmental Information, containing high quality global physical, chemical, and biological oceanographic data sets" /><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /><link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /><meta name="DC.title" lang="en" content="Error 404: Not Found" /><meta name="DC.creator" lang="en" content="US Department of Commerce, NOAA National Centers for Environmental Information" /><meta name="DCTERMS.modified" scheme="W3CDTF" content="2015-07-14" /><meta name="DC.language" scheme="RFC4646" content="en" /><meta name="DC.identifier" scheme="DCTERMS.URI" content="http://www.nodc.noaa.gov/errors/notfound.html" /><link rel="stylesheet" href="/styles/reset.css" /><link rel="stylesheet" href="/styles/960.css" /><link rel="stylesheet" href="/styles/style.css" /><link rel="stylesheet" href="/styles/jshowoff.css" /><!--[if lte IE 8]> <link rel="stylesheet" type="text/css" media="all" href="/styles/iefix.css" /><![endif]--><script id="_fed_an_ua_tag" type="text/javascript" src="/scripts/federated-analytics.js?agency=DOC&subagency=NOAA&pua=UA-42101633-1"></script></head><body class="menu4_on lnhome_on snnone_on"> <div class="bkgoutter"><div class="bkginner parallax" data-speed="4"> <div class="container_16 mainpaper">  <div class="grid_16" id="noaahead" style="margin-bottom:8px"> <!-- NOAA Header --> <a href="http://www.noaa.gov" class="nobg"><img src="/media/images/common/noaalogo2.png" alt="NOAA Logo" /></a> <img src="/media/images/common/nceilogo2.png" alt="National Centers for Environmental Information" />  <a href="http://www.Commerce.gov" class="nobg"><img src="/media/images/common/commercelogo2.png" class="clogo" alt="Department of Commerce Logo" /></a> <!-- End NOAA Header --> </div> <!-- end #noaahead .grid_16 --> <!-- Main Navigation Bar --> <div class="grid_16"><!--    <div class="noaahead-extra"> -->   <!-- Formerly NODC Bar --><!--    <p>formerly the National Oceanographic Data Center (NODC)... &nbsp;<a href="http://www.ncei.noaa.gov/">more on NCEI</a></p> --><!--   </div>   -->    <div id="nodcnav">     <ul id="menu"><li id="home"><a href="/"><span>Home</span></a></li><li id="menu1"><a href="/access/index.html"><span>Access Data</span></a></li><li id="menu2"><a href="/submit/index.html"><span>Submit Data</span></a></li><li id="menu3"><a href="/outreach/index.html"><span>Public Outreach</span></a></li><li id="menu4"><a href="/about/index.html"><span>About</span></a></li></ul>     </div>     </div>  <!-- End Main Navigation Bar -->   <div class="grid_16 topsearch">   <p>NOAA Satellite and Information Service</p>   <div class="searchbox">    <form action="https://search.usa.gov/search" method="get" class="noaainfo">         <label for="affnodc"><input class="marg2" id="affnodc" type="radio" name="affiliate" checked="checked" value="nodc.noaa.gov" />This Site</label>         <label class="marg" for="affnoaa"><input class="marg2" type="radio" id="affnoaa" name="affiliate" value="noaa.gov" />All of NOAA</label>         <input type="hidden" name="v:project" value="firstgov" />         <input class="search" type="text" name="query" size="18" value="Search" onfocus="this.value=''"/>         <input type="image" class="go" title="Go search the NOAA or NCEI Website" src="/media/images/common/go.gif" alt="Go search the NOAA or NCEI Website" border="0" />    </form>   </div> <!-- end .searchbox -->  </div> <!-- end .topsearch .grid_16 -->  <div class="clear"></div><!-- See WD-769 - Moving to NCEI -->  <div class="grid_16">   <div style="padding: 10px; border: 5px solid red;">    <p style="margin-bottom:0;"><strong>NCEI is transitioning to a new website and paths to data resources will be changing. Please contact <a href="mailto:[email protected]">[email protected]</a> with any questions of issues. See the new website at <a href="https://www.ncei.noaa.gov/">www.ncei.noaa.gov</a>.</strong></p>   </div>    </div><div class="grid_16" id="crumbs"> <p><strong>You are here:</strong> <a href="/index.html">Home</a> &rsaquo; Error 404: Not Found</p></div>  <div class="content3 grid_16" id="content">  <div class="main grid_12 omega">  <h2>Error 404: Not Found</h2> <p>We apologize, but the page or file does not exist.</p> <div class="infobox1 shadow grid_8 alpha"> <h3 class="separator2"></h3>  <h3 style="margin-left:10px;">Please try the following:</h3> <ul> <li>Check the URL for spelling / typing errors</li> <li>Review old bookmarks</li> <li>Go <a href="/">Home</a> or <a href="/about/contact.html">Contact Us</a></li> <li><form accept-charset="UTF-8" action="http://search.usa.gov/search" id="search_form" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="&#x2713;" /></div> <input id="affiliate" name="affiliate" type="hidden" value="nodc.noaa.gov" /> <input autocomplete="off" class="usagov-search-autocomplete" id="query" name="query" type="text" /> <input name="commit" type="submit" value="Search" /> </form></li> </ul> </div> </div> <!-- end .grid_12 --> <div class="leftbar grid_4 alpha"> <div class="leftnav3"> <h3 id="lnhome"><a href="/access/" class="mnav">Error 404</a></h3>  </div> <!-- end .leftnav --> </div>    </div> <!-- end .content --> <div class="grid_16 bottombar">  <p><a href="/access/index.html">Access Data</a> - <a href="/submit/index.html">Submit Data</a> - <a href="/General/datacom_form.html">Intended Use of the Data?</a> - <a href="https://www.ncdc.noaa.gov/nespls/olstore.main?look=1">Online Store</a> - <a href="/about/contact.html">Customer Service</a></p> </div> <div class="clear"></div>   </div> <!-- end .container_16 --></div> <!-- end .bkginner --></div> <!-- end .bkgoutter --><div class="bkgfooter"> <!-- footer background --><div class="container_16"> <!-- 960 footer --> <div class="grid_16 footer">  <div class="prefix_1 grid_7 alpha">   <ul class="footerlist">    <li>Last modified:&nbsp; Tuesday, 14-Jul-2015 13:19:07 UTC</li>    <li><abbr title="Department of Commerce"><a href="http://www.doc.gov/">Dept. of Commerce</a></abbr> - <abbr title="National Oceanic and Atmospheric Administration"><a href="http://www.noaa.gov/">NOAA</a></abbr> - <abbr title="National Environmental, Satellite, Data and Information Service"><a href="http://www.nesdis.noaa.gov/">NESDIS</a></abbr> - <abbr title="National Centers For Environmental Information"><a href="http://www.ncei.noaa.gov/">NCEI</a></abbr></li>    <li><a href="/survey.html">NCEI, Maryland Office, Website Survey</a></li>    <li><img src="/media/images/common/extrnl_link2.gif" alt="External Link" style="float:left; margin:3px 5px 0 0;"/> Offsite Link Notification</li>   </ul>  </div>    <div class="prefix_2 grid_6 omega">   <div class="ficons rfloat">    <a href="https://twitter.com/NOAANCEIocngeo"><img src="/media/images/common/twitter3.gif" alt="Like us on Twitter" width="20" height="20" /></a>    <a href="http://www.facebook.com/NOAANCEI/"><img src="/media/images/common/facebook3.gif" alt="Like us on Facebook" width="20" height="20" /></a>    <a href="/rss/"><img src="/media/images/common/rssfeed-icon2.jpg" alt="RSS feed" width="20" height="20" /></a>   </div>      <ul class="footerlist">    <li><a href="mailto:[email protected]">[email protected]</a></li>    <li><a href="http://www.facebook.com/NOAANCEI/">Like us on Facebook</a> | <a href="https://twitter.com/NOAANCEIocngeo">Follow us on Twitter</a></li>    <li><a href="http://www.noaa.gov/privacy.html">Privacy Policy</a> - <a href="/about/disclaimer.html">Disclaimer</a> - <a href="http://www.cio.noaa.gov/services_programs/info_quality.html">Information Quality</a></li>    <li><a href="http://www.corporateservices.noaa.gov/%7Efoia/">Freedom of Information Act</a> (FOIA)</li>    <li><abbr title="U.S. Government's Official Web Portal"><a href="http://www.usa.gov/">USA.gov</a></abbr> - The U.S. Government's Web Portal</li>   </ul>  </div> <!-- end .prefix_1 .grid_7 --> </div> <!-- end .footer --> <div class="clear"></div></div> <!-- end 960 footer --></div> <!-- end .bkgfooter --><script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>  <!-- Previous Jquery version was 1.7.2, Revert back if any problems are found --><script text="text/javascript" src="//ajax.googleapis.com/ajax/libs/jqueryui/1.10.0/jquery-ui.min.js"></script><script type="text/javascript">   $(document).ready(function(){   $('.nojs').hide();  $(".stripeme tr:nth-child(odd)").addClass("alt");  //$("div.parallax").css("background-attachment","fixed"); // var $window = $(window); // $('div.parallax').each(function(){ // var $bgobj = $(this); // assigning the object  // // $(window).scroll(function() { // var yPos = -($window.scrollTop() / $bgobj.data('speed'));  //  // // Put together our final background position // var coords = '50% '+ yPos + 'px';  // // // Move the background // $bgobj.css({ backgroundPosition: coords  // }); // });  //}); });</script> </body></html>

Anything else we need to know?

No response

Environment

Details

C:\Users\kchudler\AppData\Local\miniconda3\lib\site-packages_distutils_hack_init_.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:39:05) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('English_United States', '1252')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2023.7.0
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.11.4
netCDF4: 1.6.0
pydap: None
h5netcdf: 1.3.0
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: 3.4.3
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.12.2
cupy: None
pint: 0.23
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: 23.1.0
pytest: 7.4.4
mypy: None
IPython: 8.17.2
sphinx: 7.2.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions