last update: 21 August 2002

CSC 320 Advanced File Structures 4 cr.

      This course will elaborate on basic data structures and access algorithms as they pertain to data stored on disk, and will present additional algorithms pertinent to the problem of rapid data storage/retrieval when information is stored on a persistent-storage device. Attention will be paid to implementation considerations and space/time tradeoffs, and to the choice of appropriate structures for the solution of typical problems. Three lecture hours and three hours of scheduled laboratory per week, plus extensive laboratory work outside of class.
      Prerequisites:   CSC 260 with grade of C+ or higher.

Goals:
      The purpose of this course is to develop students' understanding of data organization and retrieval mechanisms in the context of non-memory-based storage media.   The goals of this course are:
  • CG1: to develop an understanding of the physical factors governing non-memory-based storage and retrieval of data;
  • CG2: to present a catalog of the most common file-based data structures and their standard implementations;
  • CG3: to develop the skills and knowledge necessary to critically analyze, select and implement file structures.
Upon completion of the course, a student should be able to select an appropriate structure to match the requirements of a given problem, implement the structure if necessary, and use it in an appropriate way to solve the problem.

Objectives:
      Upon successful completion of the course, the student will have:
  • CO1: demonstrated knowledge of the fundamental concepts of file input and output;
  • CO2: demonstrated knowledge of classic data organization structures and retrieval and optimization techniques;
  • CO3: implemented structures "from scratch" and utilized implemented structures via libraries;
  • CO4: performed order-of-magnitude analysis on structures and retrieval algorithms for evaluation and selection;
  • CO5: developed and implemented a plan for unit testing and integration;
  • CO6: utilized the UML modeling language and a CASE tool to enhance development, implementation and documentation;
  • CO7: participated in and contributed to group projects involving problem analysis, design specification and selection, implementation, integration, and validation;
  • CO8: produced documentation, including formal class diagrams and rigorous test set specification and results;
  • CO9: presented the results of a project in class.
Topics:
  • review of proper program structure utilizing the design techniques of object-oriented design, modularization and stepwise refinement;
  • general review of programming language features
    • detailed review of sequential I/O
    • table processing
    • creation of simple sequential files
  • physical file structures
    • blocking
    • buffering
    • paging
    • address calculation
  • data validation considerations
    • generation of test data for specific condition testing and for volume tests
  • sequential file updating
  • review of algorithm analysis/big-O
  • non-sequential file types
  • indexing techniques
  • hashing techniques
  • tree structures
    • binary trees
    • b-trees
    • b-tree variants
  • inter- and intra-file organizations
    • linked lists
    • chaining
    • pointers
  • solution design criteria
  • file life cycles — chronolgy of
    • creating
    • using
    • archiving/deleting files
  • database systems as file managers
  • discussion of database query implementation
    • relational database systems
    • object-relational systems

      The emphasis of the course is on the various algorithms that allow rapid data access and the techniques necessary to produce efficient implementations. Considerable time will be spent discussing how to evaluate performance, both in a theoretical format and by using test executions of the program, and how to determine the appropriate structure(s) for a particular problem. Extensive laboratory work, group discussion time and group projects conducted as part of the scheduled laboratory sessions are an integral component of the course, serving to reinforce the concepts and techniques presented in lecture and to give students experience with group software development projects. There will be four to seven programming projects. Knowledge of problem analysis, structured design techniques, basic programming techniques and data structures is assumed (see prerequisite).

      All programs must conform to departmental guidelines for program design and implementation, and all lab reports must conform to guidelines announced in class. Regardless of numeric average, a student will not be eligible for a passing grade unless he or she has submitted a lab report for every programming assignment.

      The course grade will be determined using the following approximate weights: programming projects — 40%; midterm — 20%; final exam — 25%; homework — 15%.



Bibliography:
  • Cooper.   Design of Library Automation Systems: File Structures, Data Structures, and Tools.   John Wiley, 1996.
  • Folk & Zoellick.   File Structures: An Object-Oriented Approach.   Addison-Wesley, 1998. ISBN: 0-201-87401-6
  • Garcia-Molina, Hector; Ullman, Jeffrey; Widom, Jennifer.   Database System Implementation.   Prentice-Hall, 2000. ISBN: 0-13-040264-8
  • Giampaolo.   Practical File System Design with the Be File System.   Morgan Kaufman, 1999.
  • Wiederhold, Gio.   File Organization for Database Design.   McGraw-Hill, 1987.


Salem State Home Computer Science Home Faculty Computer Studies Major Flow Sheet
Computer Studies Minor Courses Course Sequence Diagram Computer Laboratories