Record of correspondence with hdfhelp@ncsa.uiuc.edu

(problems with MFSD interface in NCSA HDF)


From marsa Thu Aug 17 13:53:48 1995
To: hdfhelp@ncsa.uiuc.edu
Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to

write a few SDS's in a single file and will only allow us to open a few
files at once.]
       To: hdfhelp@ncsa.uiuc.edu
       Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
write a few SDS's in a single file and will only allow us to open a few
files at once.]

       VERSION:
           HDF3.3 Release 4

       USER:
           Robert Marsa
           (512) 471-4700
           marsa@hoffmann.ph.utexas.edu

       AREA:
           netcdf/libsrc

       SYNOPSIS:
           The MFSD interface will only allow us to write a few SDS's
           in a single file and will only allow us to open a few files
           at once.

       MACHINE / OPERATING SYSTEM:
           Presumably all.  SGI/IRIX 5.2, Macintosh/A/UX 3.0.2,
           CRAY/UNICOS 8.0.4

       COMPILER:
           native ANSI cc, gcc

       DESCRIPTION:
          We have recently run into problems using the HDF MFSD interface.
          Through testing and examination of the source code, we have      
          identified at least two problems.
          
          1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
          file. This limitation seems to be enforced by SDcreate in mfsd.c.
          We don't properly understand this limitation.  If we create a
          series of SDS's of size 1 X 1 X 1, we can only write 1666 of
          them.  Likewise, we can only write 2500 SDS's of size 1 X 1 and
          5000 SDS's of size 1. However, if we write SDS's of non-unit
          dimension, we are limited even more severly in the number we can
          output.  Therefore, there may be another limitation at work
          which we haven't discovered.
          
          2) Only 32 (MAX_NC_OPEN) files may be open at one time.  This
          limitation seems to be enforced by NC_open in file.c.  We have
          programs which will involve more than 32 grid functions, all of  
          which may need to be output at the same time.  The work-around of
          closing the files between writes is unacceptably slow.

       REPEAT BUG BY:
          Here is a test program which will illustrate 1) above.
          When you run this with: sdtest 1 1000000
          you should see:
					test0:
					test1:
					.
					.
					.
					test2499:
					test2500:
					Can't create data set test2500

/************************************************************************/
/* sdtest.c

cc -I/usr/local/include -L/usr/local/lib sdtest.c -lnetcdf -ldf -o sdtest

*/

#include  
#include  
#include  
#include 

/* Illustrates use of SD HDF/NETCDF interface.

   Opens 'test.hdf' and appends 'nsteps' (arg 2) 'n' x 'n' (arg 1)
   scientific data sets, if possible.  Try

   sdtest 1 1000000

   to observe limits on how many data sets may be appended.

   Authors:  Robert Marsa, Matt Choptuik, August 1995

*/

main(int argc,char **argv)
{
   int32 rank;
   int32 shape[3],start[3];
   int32 ret,i,j,sf_id,sds_id;
   double time;
   double *data,*d;
   char nm[8];

   int n;
   int nsteps;

   if(argc<3 || !sscanf(argv[1],"%d",&n)|| !sscanf(argv[2],"%d",&nsteps)){
     fprintf(stderr,"Usage: sdtest  \n");
     exit(1);
   }

   shape[0]=shape[1]=shape[2]=n;
   start[0]=start[1]=start[2]=0;
   rank=2;
   time=1.0;
   data=(double *) malloc(shape[0]*shape[1]*sizeof(double));
   for(d=data,i=0;i

From sxu@ncsa.uiuc.edu  Mon Aug 21 20:36:31 1995
Posted-Date: Mon, 21 Aug 1995 16:41:48 -0500
Received-Date: Mon, 21 Aug 95 20:36:31 -0500
Received: from newton.ncsa.uiuc.edu by hoffmann.ph.utexas.edu (931110.SGI/5.51)
	id AA00622; Mon, 21 Aug 95 20:36:31 -0500
Received: from space.ncsa.uiuc.edu by newton.ncsa.uiuc.edu with SMTP id AA14122
  (5.65a/IDA-1.4.2 for marsa@hoffmann.ph.utexas.edu); Mon, 21 Aug 95 16:37:06 -0500
Received: (from sxu@localhost) by space.ncsa.uiuc.edu (8.6.11/8.6.11) id QAA14325; Mon, 21 Aug 1995 16:41:48 -0500
Date: Mon, 21 Aug 1995 16:41:48 -0500
Message-Id: <199508212141.QAA14325@space.ncsa.uiuc.edu>
To: marsa@hoffmann.ph.utexas.edu
Subject: Re: [netcdf/libsrc]: [The MFSD interface will only allow us to
From: hdfhelp@ncsa.uiuc.edu
Status: R

>From marsa@hoffmann.ph.utexas.edu  Thu Aug 17 14:01:18 1995
>To: hdfhelp@ncsa.uiuc.edu
>Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
>write a few SDS's in a single file and will only allow us to open a few
>files at once.]
>       VERSION:
>           HDF3.3 Release 4
>       USER:
>           Robert Marsa
>           (512) 471-4700
>           marsa@hoffmann.ph.utexas.edu
>       MACHINE / OPERATING SYSTEM:
>           Presumably all.  SGI/IRIX 5.2, Macintosh/A/UX 3.0.2,
>           CRAY/UNICOS 8.0.4
>       COMPILER:
>           native ANSI cc, gcc
>
>       DESCRIPTION:
>          We have recently run into problems using the HDF MFSD interface.
>          Through testing and examination of the source code, we have      
>          identified at least two problems.
>          
>          1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
>          file. This limitation seems to be enforced by SDcreate in mfsd.c.
>          We don't properly understand this limitation.  If we create a
>          series of SDS's of size 1 X 1 X 1, we can only write 1666 of
>          them.  Likewise, we can only write 2500 SDS's of size 1 X 1 and
>          5000 SDS's of size 1. 

You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS
in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the
dimension records. If all variables have the same 2 dimensions, only two
dimension records will be created in the file. It will not only solve the
limitation problem but also speed up the write process when closing the file. 

>          However, if we write SDS's of non-unit
>          dimension, we are limited even more severly in the number we can
>          output.  Therefore, there may be another limitation at work
>          which we haven't discovered.
>          

Thank you for bringing this problem to our attention. I need to trace down the
code and find out why the dimension sizes limit the total number of dimensions. 

>          2) Only 32 (MAX_NC_OPEN) files may be open at one time.  This
>          limitation seems to be enforced by NC_open in file.c.  We have
>          programs which will involve more than 32 grid functions, all of  
>          which may need to be output at the same time.  The work-around of
>          closing the files between writes is unacceptably slow.

MAX_NC_OPEN is also defined in netcdf.h. You may redefine it and recompile the
mfhdf side. 

Please let me know if redefining doesn't help.

Thanks.

Shiming Xu
SDG, NCSA

>
>       REPEAT BUG BY:
>          Here is a test program which will illustrate 1) above.
>          When you run this with: sdtest 1 1000000
>          you should see:
>					test0:
>					test1:
>					.
>					.
>					.
>					test2499:
>					test2500:
>					Can't create data set test2500
>
>/************************************************************************/
>/* sdtest.c
>
>cc -I/usr/local/include -L/usr/local/lib sdtest.c -lnetcdf -ldf -o sdtest
>
>*/
>
>#include  
>#include  
>#include  
>#include 
>
>/* Illustrates use of SD HDF/NETCDF interface.
>
>   Opens 'test.hdf' and appends 'nsteps' (arg 2) 'n' x 'n' (arg 1)
>   scientific data sets, if possible.  Try
>
>   sdtest 1 1000000
>
>   to observe limits on how many data sets may be appended.
>
>   Authors:  Robert Marsa, Matt Choptuik, August 1995
>
>*/
>
>main(int argc,char **argv)
>{
>   int32 rank;
>   int32 shape[3],start[3];
>   int32 ret,i,j,sf_id,sds_id;
>   double time;
>   double *data,*d;
>   char nm[8];
>
>   int n;
>   int nsteps;
>
>   if(argc<3 || !sscanf(argv[1],"%d",&n)|| !sscanf(argv[2],"%d",&nsteps)){
>     fprintf(stderr,"Usage: sdtest  \n");
>     exit(1);
>   }
>
>   shape[0]=shape[1]=shape[2]=n;
>   start[0]=start[1]=start[2]=0;
>   rank=2;
>   time=1.0;
>   data=(double *) malloc(shape[0]*shape[1]*sizeof(double));
>   for(d=data,i=0;i     for(j=0;j       *(d++)=i*j;
>   sf_id=SDstart("test.hdf",DFACC_CREATE);
>   for(i=0;i     sprintf(nm,"test%3d",i);
>     printf("%s: \n",nm);
>     sds_id=SDcreate(sf_id,nm,DFNT_FLOAT64,rank,shape);
>     if(sds_id==-1){
>       fprintf(stderr,"Can't create data set %s\n",nm);
>        exit(1);
>     }
>     if(SDwritedata(sds_id,start,NULL,shape,(VOIDP)data)==-1){
>       fprintf(stderr,"Can't write data: %d\n",i);
>         exit(1);
>     }
>     SDendaccess(sds_id);
>   }
>   SDend(sf_id);
>}
>/**********************************************************************/
>
>
>
>
>
>
>

From marsa Wed Aug 23 18:32:46 1995
To: hdfhelp@ncsa.uiuc.edu
Subject: [netcdf/libsrc] dimension counting problem

>>From marsa@hoffmann.ph.utexas.edu  Thu Aug 17 14:01:18 1995
>>To: hdfhelp@ncsa.uiuc.edu
>>Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
>>write a few SDS's in a single file and will only allow us to open a few
>>files at once.]
>>       VERSION:
>>           HDF3.3 Release 4
>>       USER:
>>           Robert Marsa
>>           (512) 471-4700
>>           marsa@hoffmann.ph.utexas.edu
>>       MACHINE / OPERATING SYSTEM:
>>           Presumably all.  SGI/IRIX 5.2, Macintosh/A/UX 3.0.2,
>>           CRAY/UNICOS 8.0.4
>>       COMPILER:
>>           native ANSI cc, gcc
>>
>>       DESCRIPTION:
>>          We have recently run into problems using the HDF MFSD interface.
>>          Through testing and examination of the source code, we have      
>>          identified at least two problems.
>>          
>>          1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
>>          file. This limitation seems to be enforced by SDcreate in mfsd.c.
>>          We don't properly understand this limitation.  If we create a
>>          series of SDS's of size 1 X 1 X 1, we can only write 1666 of
>>          them.  Likewise, we can only write 2500 SDS's of size 1 X 1 and
>>          5000 SDS's of size 1. 
>
>You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS
>in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the
>dimension records. If all variables have the same 2 dimensions, only two
>dimension records will be created in the file. It will not only solve the
>limitation problem but also speed up the write process when closing the file. 

Although the test program I sent you doesn't use SDsetdimname(), our "real"
programs do.  When we examine the file, we do see that there is only one
set of dimensions.  However, we still have the same limitations as if we
were writing many sets of dimensions.  I think you may be counting dimensions
for every SDS even though you are only creating one set.  This looks like a
legitimate bug.

We would prefer not to redefine MAX_NC_DIMS for two reasons:

1) this is not really a fix.  We have no idea what the maximum number of
SDS's we'll want is.

2) this would require everyone who uses our software to edit their HDF
headers and rebuild their HDF libraries.

>>          However, if we write SDS's of non-unit
>>          dimension, we are limited even more severly in the number we can
>>          output.  Therefore, there may be another limitation at work
>>          which we haven't discovered.
>>          
>
>Thank you for bringing this problem to our attention. I need to trace down the
>code and find out why the dimension sizes limit the total number of dimensions. 
>
>>          2) Only 32 (MAX_NC_OPEN) files may be open at one time.  This
>>          limitation seems to be enforced by NC_open in file.c.  We have
>>          programs which will involve more than 32 grid functions, all of  
>>          which may need to be output at the same time.  The work-around of
>>          closing the files between writes is unacceptably slow.
>
>MAX_NC_OPEN is also defined in netcdf.h. You may redefine it and recompile the
>mfhdf side. 

We don't want to redefine this for the same two reasons given above.  It should
be easy for the HDF routines to use some sort of dynamic data structure to
allow an unlimited number of files to be opened.

>Please let me know if redefining doesn't help.
>
>Thanks.
>
>Shiming Xu
>SDG, NCSA
>

Robert Marsa

------------------------------------------------------------------------- Wed Sep 6 From Ed Seidel at NCSA ------------------------------------------------------------------------ Mike, Thanks alot for the thoughtful and detailed response. I'll pass this on the folks at Austin. Ed >>Dear Mike, Shiming, et al, >> >> I talked recently with Matt Choptuik, Robert Marsa, and others at >>UT-Austin, who work with us in the black hole grand challenge. They say >>they have discovered some bugs in hdf routines, and have contacted hdfhelp >>without much response. I am sure hdfhelp is overwhelmed with >>correspondence, so I just wanted to check this out with you directly. Matt >>and Robert seemed to think there were some serious bugs, although I did not >>get the details yet (they seem to be documented in >>http://godel.ph.utexas.edu/Members/marsa/hdf.html). Have you all had a >>chance to check out their reports yet? >> >> Thanks alot, >> Ed > >Ed, > >I just talked to Shiming and George Velamparampil about these problems. >Here's the status. (Feel free to pass this info on to the folks at UT.) > >The problems >------------ >The problems are a result of the interplay between a number of things in >the library, including netCDF, variations among operating systems that HDF >runs under, and the way we allocate storage for Vdatas (which are used to >store dimension information). > >Whenever you create a dimension, memory is allocated to manage information >about that dimension. Unless the dimension is "1", the amount of memory is >about 35K. (It was set this high to make some the code a little simpler, >and at the time we didn't have any users who needed more than a few data >sets in a file. We're not yet sure what happens when a dimension is 1.) >So when people open 1,000 dimensions, 35 MEGABYTES of memory are allocated, >which plays havoc on some workstations. By putting a limit on the number >of dimensions that the library can anticipate, this problem is avoided. > >The limitation on the number of files has to be there, but the way our >different modules handle it are inconsistent and need to be improved. >Ideally, the limit should be whatever the host machine's OS limit is, and >this should be set dynamically, of possible. I'm not sure we'll be able to >do this for all OSs. George has been working on this problem. > >Fixing the problems >------------------- >We now understand that there are users, like the UT folks, for whom these >artificial limits are intolerable. We discussed both problems at an HDF >meeting in late August, after Shiming had communicated with the UT folks >about it, and decided then to try to fix them as best we can. > >We set a deadline of Sept 30 to have them fixed, hoping to release the >revised code with the next beta release of HDF 4.0, planned for Nov 1. If >that is too long for the UT people we can try to accelerate it, maybe by >adding a patch to HDF 3.3r4, but our plate is pretty full now and we'd >rather stick to the current schedule if possible. > >Mike
------------------------------------------------------------------------- Thurs Sep 7 From Mike Folk at NCSA, including Matt Choptuik's reply to previous message. ------------------------------------------------------------------------- At 5:33 PM 9/6/95, Matt Choptuik wrote: >Ed, Mike ... thanks for the info. However, it's not clear that all >the things we complained are being covered. We realize >that there will be a limit to the number of files which can >be open at any one time and we will have to work around this. Good. George has made some improvements to this situation. Shiming will soon be sending out a note to Robert Marsa explaining this. >The >dimensioning business is different though; as far as we can >discern (and as is documented at >http://godel.ph.utexas.edu/Members/marsa/hdf.html >), if we append CONSTANT RANK, CONSTANT SHAPE scientific >data sets to an .hdf file, then we *always* face an (unpredictable) limit to >how many data sets can be written, and according to Shiming, >this is not supposed to be the case (i.e. we DO use SDsetdimname() >but still encounter the problem). Either we aren't understanding >something or this is a genuine BUG, not a design feature. It's a bug, not a design feature. Shiming will say a bit more about how she plans to work on it in the note to Robert. >Also, on Cray vector machines, our codes which use the mfhdf interface >don't just encounter .hdf error returns, at some point, they actually >dump core so we suspect there's something else awry with the >Cray support. We didn't know about this Cray core dumping problem. (Or maybe we misunderstood an earlier message from you.) After we solve the previous problem on our local workstations, we will check it on a Cray. >We don't mind waiting until Sep 30 for a fix but >we urge the support group to get back in touch with US if possible >so that we can be reasonably certain that the problem does get fixed. Will do. Mike
------------------------------------------------------------------------- Fri, Sep 9 (approx) From Shiming Xu at NCSA (I believe) via Robert Marsa ------------------------------------------------------------------------- >From marsa@hoffmann.ph.utexas.edu Wed Aug 23 18:32:29 1995 >To: hdfhelp@ncsa.uiuc.edu >Subject: [netcdf/libsrc] dimension counting problem > >>>From marsa@hoffmann.ph.utexas.edu Thu Aug 17 14:01:18 1995 >>>To: hdfhelp@ncsa.uiuc.edu >>>Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to >>>write a few SDS's in a single file and will only allow us to open a few >>>files at once.] >>> VERSION: >>> HDF3.3 Release 4 >>> USER: >>> Robert Marsa >>> (512) 471-4700 >>> marsa@hoffmann.ph.utexas.edu >>> MACHINE / OPERATING SYSTEM: >>> Presumably all. SGI/IRIX 5.2, Macintosh/A/UX 3.0.2, >>> CRAY/UNICOS 8.0.4 >>> COMPILER: >>> native ANSI cc, gcc >>> >>> DESCRIPTION: >>> We have recently run into problems using the HDF MFSD interface. >>> Through testing and examination of the source code, we have >>> identified at least two problems. >>> >>> 1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single >>> file. This limitation seems to be enforced by SDcreate in mfsd.c. >>> We don't properly understand this limitation. If we create a >>> series of SDS's of size 1 X 1 X 1, we can only write 1666 of >>> them. Likewise, we can only write 2500 SDS's of size 1 X 1 and >>> 5000 SDS's of size 1. >> >>You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS >>in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the >>dimension records. If all variables have the same 2 dimensions, only two >>dimension records will be created in the file. It will not only solve the >>limitation problem but also speed up the write process when closing the file. > >Although the test program I sent you doesn't use SDsetdimname(), our "real" >programs do. When we examine the file, we do see that there is only one >set of dimensions. However, we still have the same limitations as if we >were writing many sets of dimensions. I think you may be counting dimensions >for every SDS even though you are only creating one set. This looks like a >legitimate bug. I did a quick check on some segments of the HDF code which I thought might cause the problem. Unfortunately none of them seemed to be wrong. I think I need more time to trace down the code to find out: 1. Why can't we create MAX_NC_DIMS of dims? 2. Why there is a difference between unit and non-unit dim sizes? 3. Why SDsetdimname doesn't solve the limitation problem? 4. Any factors other than the max dim number restrict the number of SDS. I have reported this problem to our Aug.23's HDF meeting. On Aug.30's HDF meeting we set a target date of Sept 30 for fix of this problem (along with other tasks). I will keep you posted if we find out anything sooner. >We would prefer not to redefine MAX_NC_DIMS for two reasons: > >1) this is not really a fix. We have no idea what the maximum number of >SDS's we'll want is. > >2) this would require everyone who uses our software to edit their HDF >headers and rebuild their HDF libraries. > >>> 2) Only 32 (MAX_NC_OPEN) files may be open at one time. This >>> limitation seems to be enforced by NC_open in file.c. We have >>> programs which will involve more than 32 grid functions, all of >>> which may need to be output at the same time. The work-around of >>> closing the files between writes is unacceptably slow. >> >>MAX_NC_OPEN is also defined in netcdf.h. You may redefine it and recompile the >>mfhdf side. > >We don't want to redefine this for the same two reasons given above. It should >be easy for the HDF routines to use some sort of dynamic data structure to >allow an unlimited number of files to be opened. George has been working on the max file number with another user and he got a solution yesterday. Below is a copy of his e-mail to that user. And, we just heard back from that user saying this fix worked. ------------------------------------------------------------- I've found the problem. It looks like there are 3 places in the source code where the max # of open files is specified. This is inconsistent and will be fixed in the next beta release. The quick fix is to change the following variales when you want to increase the number of open files. file variable current value ----- -------- ------------ hdf/src/hdf.h MAX_VFILE 16 hdf/src/hfile.h MAX_FILE 16 mfhdf/libsrc/netcdf.h MAX_NC_OPEN 32 I changed the above to 200(specific to SGI) on an SGI Indy(IRIX 5.3) and opened/created 197(3 for stdin,stdout,stderr) hdf files using the SDxx interface. I believe changing the above should fix it. Remember to do a full *clean* rebuild.
------------------------------------------------------------------------- Tue Sep 19 10:21:56 CDT 1995 From Shiming Xu at NCSA, via Robert Marsa ------------------------------------------------------------------------- >From sxu@ncsa.uiuc.edu Mon Sep 18 17:41:11 1995 Posted-Date: Mon, 18 Sep 1995 17:41:06 -0500 Received-Date: Mon, 18 Sep 95 17:41:11 -0500 To: marsa@hoffmann.ph.utexas.edu Subject: Re: limits on dims and vars From: hdfhelp@ncsa.uiuc.edu Status: R >>>> DESCRIPTION: >>>> We have recently run into problems using the HDF MFSD interface. >>>> Through testing and examination of the source code, we have >>>> identified at least two problems. >>>> >>>> 1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single >>>> file. This limitation seems to be enforced by SDcreate in mfsd.c. >>>> We don't properly understand this limitation. If we create a >>>> series of SDS's of size 1 X 1 X 1, we can only write 1666 of >>>> them. Likewise, we can only write 2500 SDS's of size 1 X 1 and >>>> 5000 SDS's of size 1. >>> >>>You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS >>>in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the >>>dimension records. If all variables have the same 2 dimensions, only two >>>dimension records will be created in the file. It will not only solve the >>>limitation problem but also speed up the write process when closing the file. >> >>Although the test program I sent you doesn't use SDsetdimname(), our "real" >>programs do. When we examine the file, we do see that there is only one >>set of dimensions. However, we still have the same limitations as if we >>were writing many sets of dimensions. I think you may be counting dimensions >>for every SDS even though you are only creating one set. This looks like a >>legitimate bug. > >I did a quick check on some segments of the HDF code which I thought might cause >the problem. Unfortunately none of them seemed to be wrong. >I think I need more time to trace down the code to find out: > Here are some results: >1. Why can't we create MAX_NC_DIMS of dims? >2. Why there is a difference between unit and non-unit dim sizes? I wrote a short program to create 5000 3D INT8 SDS' on SGI. For dimension sizes 1x1x1, 2x2x2, 20x20x20 and 30x30x30 the program created 1666 SDS and died at 1667th SDS when it reached the limit of 5000 dimensions. In other words, the dimension sizes didn't seem to make difference in terms of the maximum number of dimensions. I ran your program on SGI with rank=2 and rank=3. With different dim sizes the program always created 5000 dimensions. For rank=2 the program died at 2500 variable and for rank=3 it died at 1666. The only time my program died before 5000 dims was when dims were 50x50x50. The file would be bigger then 200MB while there were only 140MB available on my disk. >3. Why SDsetdimname doesn't solve the limitation problem? You were right about this, SDsetdimname did not help with the dim limits, even though it cut the final number of dimensions in HDF files. The reason is described below. HDF creates dimension structures and an array of pointers to the dimension structures dynamically. Every time a new SDS is created a number of rank (3 in my sample program) dimension structures and pointers are also created. When SDsetdimname is called to share dimensions, the duplicated dimension structures are freed. However, the pointers are not. They point to the corresponding dimension structures which represent each unique dimension. In short, the number of dimenstion pointers is not decreased and therefore the limit is still MAX_NC_DIMS no matter the dimensions are shared of not. I wasn't aware of the details and gave you a wrong suggestion. Sorry about it. A quick fix to this problem is to not check the MAX_NC_DIMS. Each pointer takes 4 bytes on 32-bit machines, 10,000 dimension pointers need 40k bytes. This will not be a problem for most machines. A better solution is to not create the duplicated dimension structures and the pointers at the first place. This can be done by defining each dimension and using those dimensions to create SDSs, similar to what the netCDF interface does. However, this requires some current functions to be changed. We need more discussions about this change. >4. Any factors other than the max dim number restrict the number of SDS. There is no other hard-coded limit on the number of dims. However, there are several factors which should be considered. In hdf files each dimension is stored as a vdata in a vgroup. All attributes (of the HDF file, of the SDS' in the file and of the dimensions) are also stored as vdatas. The maximum number of vdatas that can exist in an HDF file is 64k, which is the maximum of reference numbers that any one type of object can have. (A UINT16 is used for reference numbers, hence the 64K limit.) Each SDS is stored as a vgroup. The maximum number of vgroups that can exist in a file is also 64k, for the same reason as that for vdata. If you don't call VSxxxx to create any vdatas on your own, and if you don't write any attributes, the max number of dimensions will be 64k minus the total number of SDSs. If a dimension is assigned attributes, such as label/unit/format/scale/..., the dimension will be promoted as an SDS, or coordinate SDS. The total number of SDS, including the dataset SDS' and coordinate SDS' is limited by MAX_NC_VARS. Another factor is performance. The more vdatas to be written the longer the SDend and SDstart take to complete writing/reading the file. Once I created 1300 3D SDS', and SDend was 4 times faster when the 3 dims were shared by all SDS' than it was when the dimensions were not shared. We are working on this problem and hopefully it will be improved on in next release. I don't have any quantitative analysis on how the number of dims affects the perfomance due to the memory they take. If performance is a issue, you might want to experiment with it. In addition to above factors one more reason for setting limits is to facilitate writing applications and utilities, for example Fortran-77 utilities must allocate arrays statically by using a maximum array size. We may change the limit to 20,000. However, the existing utilities and tools will have problems to read new HDF files which contain more than 5000 SDSs. We feel at this time, for the reasons given, that the 5K limit for a single file is reasonable for most applications. If this creates an impossible situation for your group we still can increase it to 20,000. Now that we've given the reasons we have the 5000 limit, we'd like to hear your opinions. Thanks. Shiming Xu
------------------------------------------------------------------------- Tue Sep 19 13:40:04 CDT 1995 From Matt Choptuik and Robert Marsa ------------------------------------------------------------------------- From matt@godel.ph.utexas.edu Tue Sep 19 13:38:58 1995 To: sxu@ncsa.uiuc.edu, mfolk@ncsa.uiuc.edu, eseidel@ncsa.uiuc.edu, matt@infeld.ph.utexas.edu, marsa@hoffmann.ph.utexas.edu, richard@einstein.ph.utexas.edu, richard@helmholtz.ph.utexas.edu Subject: Re problems with MFSD interface in HDF Dear Shiming and Mike: We are disappointed, to say the least, with your prognosis. We feel it is intuitively obvious that having *any* limitation on the number of data objects one can write to a file is a pretty bad design decision. At a minimum it certainly isn't very forward looking. For example, we are in an era where simulations can last many tens or thousands or even millions of time-steps; a canonical way of viewing results is via video footage. At 30 frames a second you're saying that no one will ever want to make a video that lasts more than 3 minutes. The argument about utilities needing to have limits is very weak, since any properly designed utility (even in FORTRAN!!) can and should detect when its internal data structures are full. Also there are at least two separate issues here (I apologize for turning one gripe into two but I have raised the point in previous correspondence with Mike). The first is the limitations we've been discussing. The second, which is really more urgent, is that our programs on CRAYs are *core dumping* in the HDF routines and we can't correlate the crashes with, say, the number of data sets written, the way we can with the 5000 dimension limit. This is a serious problem since we've had many instances now where large two- and three-dimensional calculations have crapped out after a significant run time (but after having only appended a few 10's to few 100's of data sets) and our only work-around has been to write a single file per time step. We understand the need to design and debug on workstations but one of the principal "features" of HDF is supposed to be its machine-independence and CRAYs still provide most of our high-perfomance cycles. A couple of weeks ago Mike said We didn't know about this Cray core dumping problem. (Or maybe we misunderstood an earlier message from you.) After we solve the previous problem on our local workstations, we will check it on a Cray. so could we have someone there check this out on a Cray to see whether they can reproduce the problem and isolate what we've been running into? Perhaps it is just a case of the files being too large but some experiments we've done suggest that that's *not* the case. Thanks for keeping in touch, we'll do the same ... Regards, Matt Choptuik Robert Marsa
------------------------------------------------------------------------- Tue Sep 19 17:33:26 1995 From Mike Folk ------------------------------------------------------------------------- Right. Now that we understand the other limitations, we will turn our attention to the Cray problem. We'll keep you posted on our progress. Hopefully we'll have something positive to report by the end of next week. Mike
------------------------------------------------------------------------- Wed Sep 20 13:51:16 1995 From Shiming Xu at NCSA ------------------------------------------------------------------------ From sxu@ncsa.uiuc.edu Wed Sep 20 13:51:16 1995 Subject: Re: limits on dims and vars Hi, I ran your test program with HDF3.3r4patch04 libraries on SDSC's (San Diego Supercomputing Center) C90, running UNICOS 8.0.3.2, 8 CPU 256 MW. It created 1666 SDS' with dimension sizes of 1,2,3,4, and 20. No core dump or segmentation fault occurred. The display of my screen is attached below. (maxd is my test program. Makefile include both maxd and sdtest as targets. Therefore, 'make' makes both maxd and sdtest.) Which Cray are you using? We don't have Cray's any more at NCSA. I am trying to get an account of Cray-ymp on an HDF user's machine. If you are using YMP, I may try sdtest on that machine after the account is available. Another question is which version of HDF you are using? Precompiled HDF3.3r4p4 for C90 is available on the NCSA ftp server, in directory: /HDF/HDF3.3r4p4/HDF3.3r4p4.c90.tar.Z Thanks. Shiming Xu --------------------- display on my screen ---------------------- (output of sdtest 20 10000) test1650: test1651: test1652: test1653: test1654: test1655: test1656: test1657: test1658: test1659: test1660: test1661: test1662: test1663: test1664: test1665: test1666: Can't create data set test1666 c90-73% ls -l total 211552 drwx------ 2 u785 use310 4096 Sep 20 18:14 . drwx------ 4 u785 use310 4096 Sep 20 16:01 .. -rw------- 1 u785 use310 366 Sep 20 18:14 Makefile -rw------- 1 u785 use310 2267 Sep 20 16:00 max_dim.c -rw------- 1 u785 use310 3012 Sep 20 18:14 max_dim.hdf -rwx------ 1 u785 use310 741680 Sep 20 18:13 maxd -rwx------ 1 u785 use310 772848 Sep 20 18:14 sdtest -rw------- 1 u785 use310 1656 Sep 20 16:00 sdtest.c -rw------- 1 u785 use310 106645750 Sep 20 18:16 test.hdf c90-74% history 51 mkdir max_dim 52 mv *.c max_dim 53 mv M* max_dim 54 pwd 55 vi max_dim/Makefile 56 cd max_dim 57 vi max_dim.c 58 vi sdtest.c 59 make 60 pwd 61 vi Makefile 62 make 63 cc -g -o maxd max_dim.c -I/usr/tmp/u785/HDF3.3r4p4.c90/include -L/usr/tmp/u785/HDF3.3r4p4.c90/lib -lnetcdf -ldf -ljpeg 64 ls /usr/tmp/u785/HDF3.3r4p4.c90/lib 65 vi Makefile 66 make 67 maxd 68 sdtest 2 10000 69 sdtest 3 10000 70 sdtest 4 10000 71 sdtest 1 10000 72 sdtest 20 10000 73 ls -l 74 history c90-75%
------------------------------------------------------------------------- Wed Sep 20 14:09:53 CDT 1995 From Matt Choptuik at UT Austin ------------------------------------------------------------------------- We are running our programs on a J90 (CRAY J916/5-1024 ) here at Texas and on the C90 at Pittsburgh. In both cases we're using HDF3.3r4, which as far as I can tell from the ftp site is the most current non-alpha version. (The directory you mentioned HDF/HDF3.3r4p4 does not seem to exist on your ftp server) I'll be glad to set you up access to the J90 here if you want to check it out, it's on this machine that we're most concerned about getting things straightened out in the short term. Let me know how you wish to proceed. Regards, Matt Choptuik
------------------------------------------------------------------------- Wed Sep 20 15:00:00 CDT 1995 From Shiming Xu at NCSA ------------------------------------------------------------------------- >We are running our programs on a J90 (CRAY J916/5-1024 ) here at >Texas and on the C90 at Pittsburgh. In both cases we're using >HDF3.3r4, which as far as I can tell from the ftp site is the most >current non-alpha version. (The directory you mentioned > > HDF/HDF3.3r4p4 > >does not seem to exist on your ftp server) Sorry, that should be /HDF/HDF3.3r4/bin/HDF3.3r4p4.c90.tar.Z. Could you give it a try on C90 at Pittsburgh and let me know if it works? > I'll be glad to set you up access to >the J90 here if you want to check it out, it's on this machine that >we're most concerned about getting things straightened out in the >short term. Let me know how you wish to proceed. I haven't worked on J90 yet and have no idea how much difference it is between C90 and J90. If binaries for C90 also work on J90, could you try the precompiled code first? If not, either I give you the source code for HDF3.3r4p4, you install the library and run sdtest; or, you give me an account on your J90, I install the library and run the test. In either case, if there is still core dump or segmentation fault I would like to find out what causes the problem and fix the bug if there is one in HDF. Thanks. Shiming Xu
------------------------------------------------------------------------- Sun Sep 24 16:00:00 CDT 1995 From Shiming Xu at NCSA ------------------------------------------------------------------------- > I tried to use the C90-compiled distribution on our J90 but it's > a no go. If you can supply me with the source, I will install > it and test out the test program. You may get the source code from either one of the two places: The first place is our HDF ftp anonymous server, hdf.ncsa.uiuc.edu, in subdirectory: /pub/outgoing/sxu/3.3r4p4/33r4p4.src.tar.Z This file contains compressed archived source code for HDF3.3r4 patch04. All you need to do is to uncompress and un-tar the .tar.Z file and then compile hdf/ and mfhdf/. (FYI, I just checked out HDF3.3r4patch04 from our CVS, compiled and tested on my SGI IRIX5.3. Everything worked fine, and the test passed.) Another place is the NCSA ftp, ftp.ncsa.uiuc.edu, server. The patch files are in /HDF/HDF3.3r4/patches/, and the patched source programs are in /HDF/HDF3.3r4/patches/patchedsrc/. The /HDF/HDF3.3r4/patches/README and the first paragraph of the patch files explain how to use the patch files to patch for HDF3.3r4, or replace the current HDF3.3r4 files with the patched source files in /HDF/HDF3.3r4/patches/patchedsrc/. Let me know if you have problem downloading or compiling it. > As far as I can tell, there is > *no core-dumping problem* on the C90 at Pittsburgh (even without > using the patched version of 3.3r4). I'm theorizing that the > problems we had on that machine were possibly due to file-quota > violations and we assumed that they were related to the difficulty > we were having with the J90. Thank you for letting me know that HDF3.3r4 does not core dump on the C90 and that possibly file-quota violations were the cause of the problems you were having on J90. Now, we need to work out the limitation problem. I sent an e-mail to Mr. Robert Marsa last week Friday and haven't heard back from him yet. It seems I should cc the e-mail to you as well. I am attaching two relevant paragraphs below. Please let me know whether or not you can use the unlimited dimension instead of unlimited number or variables for your application. Thanks. Shiming Xu ------------- e-mail sent to marsa@hoffmann.ph.utexas.edu ---------- I understand the number of variables is a problem for your project. If you give me more details (such as: what kind of data you are collecting; what is defined as a variable; how you will view the data using what tools; how big the datasets are; etc.) may be we can figure out some solutions to reach your goal. In sdtest you use nsteps to control the number of variables. Does it mean that in your project you need to create a new variable for each time step? If it is the case do you think we can use the unlimited dimension to represent the time steps? -----------------------------------------------------------------------
------------------------------------------------------------------------- Mon Sep 25 16:00:00 CDT 1995 From Matt Choptuik at UT ------------------------------------------------------------------------- Dear Shiming: I've installed 3.3p4 on our J90 and my test program still core dumps (although it gets a little further than it did previously). I don't think it's a file-size problem since I can write a much larger file using unformatted FORTRAN output, for example, without any problem. I would appreciate if you take a look at things at this point. If you will send me a short list of machine names (full Internet addresses) and your login name on those machines, I will update the .rhosts entry on the 'guest' account so that you will be able to login via rlogin charon.cc.utexas.edu -l richard Then cd test_SD where you will find a test program 'tsd.c'. After you make tsd your should be able to see the core dump by running tsd foo 3 65 1024 which will attempt to output 1024 65x65x65 double sds's to foo.hdf, but will actually only write 40. As the makefile points out, the software was installed with prefix /hpcf/u0/ph/az/phaz337 from /hpcf/u0/ph/az/phaz337/install/33r4p4 just in case you want to check that I did things properly. Thanks ... Matt Choptuik
------------------------------------------------------------------------- Mon Sep 25 11:00:00 CDT 1995 From Shiming Xu at NCSA ------------------------------------------------------------------------- Matt, > I've installed 3.3p4 on our J90 and my test program still core dumps > (although it gets a little further than it did previously). I don't > think it's a file-size problem since I can write a much larger file > using unformatted FORTRAN output, for example, without any problem. > I would appreciate if you take a look at things at this point. > If you will send me a short list of machine names (full Internet addresses) > and your login name on those machines, I will update the .rhosts > entry on the 'guest' account so that you will be able to login via > > rlogin charon.cc.utexas.edu -l richard > > Then > > cd test_SD > > where you will find a test program 'tsd.c'. After you > > make tsd > > your should be able to see the core dump by running > > tsd foo 3 65 1024 > > which will attempt to output 1024 65x65x65 double sds's to foo.hdf, > but will actually only write 40. > Thank you for setting up the account for me. I'll log in J90 from: sxu@xongmao.ncsa.uiuc.edu sxu@fuga.ncsa.uiuc.edu or sxu@landrew.ncsa.uiuc.edu > As the makefile points out, the software was installed with prefix > /hpcf/u0/ph/az/phaz337 from /hpcf/u0/ph/az/phaz337/install/33r4p4 > just in case you want to check that I did things properly. By the way was the library compiled with '-g' option? Who should I contact if I need help to get things work on your machine, you or Marsa or Richard? Could you please give me the contact person's phone number, just in case? Thanks. Shiming
------------------------------------------------------------------------- Sun Sep 25 12:00:00 CDT 1995 From Matt Choptuik at UT ------------------------------------------------------------------------- Shiming: I've updated the .rhosts entry so you should be able to log into the account now. Please contact me, Matt Choptuik, if there are problems. Our phones are a little screwed up here right now since three of us have swapped offices, so it's best to phone Virginia at the Center for Relativity office (512) 471-1103 and have her transfer the call. Also, both libraries were compiled with the '-g' option and here's part of the traceback I get when running in the debugger ------------------------------------------------------------------------------ error exit instruction [signal SIGERR] in array$c.NC_arrayfill at line 205 in file array.c File "array.c" does not exist or does not contain text. (dbg173) (cdbx) where Currently stopped in NC_arrayfill at line 205 in file array.c -> NC_arrayfill was called by hdf_get_data at line 688 in putget.c (0p26430d) with: [ 1] lo 24695 --> "\026\001\201\304*\202*\366" [ 2] len = 1000000 [ 3] type = NC_DOUBLE hdf_get_data was called by hdf_get_vp_aid at line 812 in putget.c (0p27035c) with: [ 1] handle 417816 --> struct [ 2] vp 3988966 --> struct hdf_get_vp_aid was called by hdf_xdr_NCvdata at line 868 in putget.c (0p27151a) with: [ 1] handle 417816 --> struct [ 2] vp 3988966 --> struct hdf_xdr_NCvdata was called by NCvario at line 1497 in putget.c (0p31503c) with: [ 1] handle 417816 --> struct [ 2] vp 3988966 --> struct [ 3] where = 0 [ 4] type = NC_DOUBLE [ 5] count = 274625 [ 6] values 143188 --> 00 ------------------------------------------------------------------------------ Thanks, Matt Choptuik PS: Any info re HDF/netCDF documentation?
------------------------------------------------------------------------- Mon Sep 25 16:00:00 CDT 1995 From Shiming Xu at NCSA ------------------------------------------------------------------------- Matt, ====== Matt Choptuik's note of Sep 25: 'hdf documentation'====== > Shiming, one other thing. From time to time I peruse NCSA's ftp > site to see if there is more updated/complete documentation > on HDF/netCDF. I would appreciate if you could point me > to the most recent/complete manuals that there are on the > subject; it will certainly help us a lot as we continue to > build stuff on top of your libraries ... thanks ... Matt Choptuik > HDF implemented netCDF model in mfhdf/. NetCDF documentation is included in mfhdf/doc/ in our releases. However, the best place to look for the netCDF docs is: http://www.unidata.ucar.edu/packages/netcdf/guide.txn_toc.html which contains a newer version. There are some differences between netCDF API and mfhdf API. For example, netCDF requires dimensions to be defined before ncvardef uses the dims to define variables. While in HDF SDcreate uses dim-sizes to create dimensions. We are going to use netCDF approach in next generation of HDF. Another small example of the differences between netCDF and HDF is that MAX_NC_VARS is defined as 256 in netCDF, while it is defined as 5000 in HDF. You need to reference HDF documentation to use SD interface. The most recent HDF documentation is the HDF Reference Manual and the HDF User's Guide which is still a draft version. The final version will be ready very soon. The draft of HDF User's Guide is available on the NCSA ftp server, ftp.ncsa.uiuc.edu, in directory: /HDF/Documentation/HDF3.3/Users_Guide/HDF3.3_draft/ and the Reference Manual is in: /HDF/Documentation/HDF3.3/Ref_Manual/. I would be very glad to discuss with you about how to organize your application data to make better use of HDF. I did this with many users and most of the discussions were very fruitful for both parites. Thanks. Shiming Xu
------------------------------------------------------------------------- Tue Sep 26 From Shiming Xu at NCSA ------------------------------------------------------------------------- Matt, Could you please set write permission for me to change the 33r4p4 source code and recompile the library, i. e. write permission for /hpcf/u0/ph/az/phaz337/install/33r4p4/ and its subdirectories? Please also set up the right $(PATH) for me in ~/.cshrc or ~/.login file if any special path is required in compiling 33r4p4. Thanks. Shiming ------------------------------------------------------------------------- Tue Sep 26 From Shiming Xu at NCSA ------------------------------------------------------------------------- ====== Matt Choptuik's note of Sep 26: 'Re: HDF on J90's'====== > Shiming ... I'd rather not change the permissions on any of *my* (phaz337) > directories. Instead, I suggest you install in richard's directory > (i.e. use install prefixes of $HOME while logged in as richard). That's fine with me. It is extermely slow to work across network in day time. I am now testing memory allocation on my local machines. Will log on J90 after the peak (across network) is over. Thanks. Shiming > I just created install, lib, bin and include directories as richard > then did a > > cd install > zcat ~phaz337/install/33r4p4.src.tar.Z | tar xf - > > so if you log in as richard, then > > cd install/33r4p4 > > you can take the installation from there. Then you'll have to > > cd ~/test_SD > > edit the makefile and change > > INCLUDEDIR = /hpcf/u0/ph/az/phaz337/include > LIBDIR = /hpcf/u0/ph/az/phaz337/lib > > to > > INCLUDEDIR = /hpcf/u0/ph/az/richard/include > LIB = /hpcf/u0/ph/az/richard/lib > > and remake 'tsd' if you want to run the test program. > > Let me know if you have any problems, in particular, your path > should be OK ... > > Thanks > > Matt Choptuik > ======= end of Matt Choptuik's forwarded note ======
------------------------------------------------------------------------- Wed Sep 27 From Shiming Xu at NCSA ------------------------------------------------------------------------- Matt, Here are some preliminary results of my experiments on J90: 1. The actual failure happens at HDgetspace, a macro to malloc, at line 684 of mfhdf/libsrc/putget.c where len=100000: --------------------------------------------------------------- len = to_do * vp->szof; /* size of buffer for fill values */ ==> values = (Void *) HDgetspace(len); /* buffer to hold unconv fill vals */ ---------------------------------------------------------------- I added 4 lines after line 684 to check memory allocation error and stop the program if values is 0. Otherwise, it will cause core dump/segmentation fault when values is accessed. ------------------------------------------------------------------- if (values == NULL) { printf("Failed in malloc %d bytes in hdf_get_data\n", len); return FALSE; } -------------------------------------------------------------------- I will report this problem to HDF group to add error checks after HDgetspace calls. 2. It seems to me that we run out of memory on J90. I don't know what is the maximum memory consumption set for each process and the processes it spawns. (My guess is somewhere around 35 -- 40 MB). 'limit -v' said unlimited: ------------------------------------------ richard@charon(test_SD){26}% limit -v unlimited CPU seconds unlimited words of memory Session sockbuf limit 0 clicks ------------------------------------------ Could your system administrator help us to find out and to increase (if possible) the maximum memory? Experiments on different memory limits showed that the smaller the memory limit was the fewer number of variables tsd could create, starting from 4M words. I will continue studying on this memory allocation problem to see if there is anything we can do from HDF side. I will report the results to this week's HDF meeting and will let you know what we think. 3. I created 1000 65^3 double-precision SDSs successfully on J90 using DFSD interface, see tdfsd.c in ~richard/. DFSD doesn't use vdata/vgroup to implement SDS. Each new SDS requires only 1 reference number and two tags (DFTAG_SD and DFTAG_SDD) if the number type and dim sizes are the same for all SDSs. Therefore the total SDS in a file can be ~64k. That's all for today. Thanks. Shiming > (512) 471-1103 ------------------------------------------------------------------------- Thu Sep 28 From Shiming Xu at NCSA ------------------------------------------------------------------------- Matt: More on core dump: 1. The current implementation of HDF3.3r4p4 allocates a buffer when it is needed and free the buffer when the job is done. This isn't a problem on some systems but it is a problem on J90. After writing a certain number of variables the system can't allocate big buffers and tsd dies. 2. A quick fix is to hold the big buffer until all variables have been written out when the file is closed. I have made the changes in richard/install/mfhdf/libsrc and installed the modified library in richard/lib and richard/include. 3. This fix works only if you open the file, write out all variables, and then close the file. Each time the file is opened the big buffer will be allocated. If the file is opened for too many times you will still run out of memory and the program will fail. In HDF4.0 we will use an ANSI call to free all buffers when exiting the process. That takes care of the file open problem. 4. As mentioned in Mike's previous e-mail, each Vdata needs 35k bytes. If the dimensions are not shared, 3000 dimensions will need 105MB! The system can't allocate that many buffers and we will run out of memory again. To solve this problem I added SDsetdimname in richard/tsd.c to let all vars share the three dimensions. 5. With the above changes, tsd now created 1000 SDS to foo.hdf on J90. 6. The above changes will be wrapped in HDF4.0 release. 7. HDF uses a 32-bit integer to represent length and offset (in the file) for each object. This implies that the largest size of a file is the maximum value of a 32-bit integer which is ~2GB. The size of foo.hdf almost hits the limit. -rw-r----- 1 richard phaz 2197169915 Sep 28 18:12 foo.hdf The same limit applies to unlimited dimension variables. Please try the new version of the library and let me know if there is any problem. Thanks. Shiming Xu