Wednesday, January 25, 2012

Week 4 Reading Notes

Wikipedia article - Database
  • Database = "organized collection of data for one or more purposes," usually digital
  • = "organized to model relevant aspects of reality"
  • = data and data structures, NOT database management system (DBMS)
  • DBMS = complex software system, meets usage requirements
  • DBMSs: Oracle, IBM DB2, Microsoft SQL Server, Postgre SQL, MySQL, SQLite
  • DBMS standards: SQL, ODBC
  • Database contents can be: bibliographic, document-text, statistical, or multimedia objects
  • Database application areas include: accounting, music, compositions, movies, banking, manufacturing, and insurance
  • History: 1st gen. = navigational (hierarchical and Codasyl models)
  • 2nd gen. = relational (in SQL language) and entity-relationship model
  • 3rd gen. = post-relational or NoSQL (Object database and XML database)
  • People involved: DBMS developers, application developers and database administrators, and application's end-users
Database types:
  • Active = "event-driven architecture which can respond to conditions both inside and outside the database"
  • Cloud = database and most DBMS are "in the cloud"
  • Data warehouse = archive data from operational databases and outside sources (retrieving/analyzing/mining data, transforming/loading/managing data)
  • Distributed = "allows distinct DBMS instances to cooperate"
  • Document-oriented = stores, manages, edits, and retrieves documents
  • Embedded = tightly integrated with application software
  • End-user = developed by end-users (documents, spreadsheets, presentations)
  • Federated (multi-database) = integrated database comprised of several distinct databases
  • Graph = NoSQL, uses graph structures to represent and store info.
  • Hypermedia = World Wide Web acts as a database
  • In-memory = resides primarily in main memory
  • Knowledge base = specifically for knowledge management
  • Operational = stores data about operations of organization
  • Parallel = improves performance through parallelization
  • (Also: Real-time, Spatial, Temporal, and Unstructured-data database)
  • Functional requirements: defining data structure, manipulating data, protecting data, describing processes
  • Operational requirements: availability, performance, isolation between users, recovery, backup, data independence
  • DBMS components: external interfaces, language engines, query optimizers, database engine, storage engine, transaction engine, DBMS management and operation component
There was a lot of this wikipedia article that I didn't understand, but I tried to take notes on the parts that seemed important to me and made some sort of sense. Even if I don't understand the specific technical details, I did gain a better understand of exactly how we define a database and the different things they are used for. I also now understand the difference between a database and a DBMS.


Wikipedia article - Entity-relationship model
  • ER model = "abstract and conceptual representation of data"
  • Conceptual schema or semantic data model, top-down, creates ER diagrams
  • Model defines interaction between entities, relationships, and attributes
  • Relationships: expressed as a single verb implying direction or as a noun
  • Roles: define who does what in relationship
  • Cardinalities: ???
  • Semantic modeling of ER "adopts the more natural view that the real world consists of entities and relationships"
Diagramming conventions
  • Rectangles = entities
  • Diamonds = relationships
  • Line = connects entities to the relationships they participate in
  • Double line = participation constraint, totality, or surjectivity (all entities in at least one relationship in set)
  • Arrow = key constraint, injectivity (each entity in at most one relationship in set)
  • Thick line = bijectivity (each entity in exactly one relationship in set)
  • Underlined name of attribute = attribute is key (two different entities or relationships always have different values for attribute)
Alternative = Crow's Foot notation

Limitations of ER model:
  • only a relational structure, assumes info. can be represented in relations
  • cannot handle changes to information easily
  • difficulty in "integrating pre-existing information sources that already define their own data representations in detail"

I'm confused about what cardinalities are, since the article didn't include a definition. I'm guessing it has something to do with cardinal directions because previously the article was talking about the direction of relationship between entities. I think I basically understand the ER model and would be able to point to the different components of a diagram. I don't know if I can cannot this abstract representation with how the database functions, however.


3 Normal Forms Database Tutorial
  • Database normalization process = puts data in state that will make it usable to answer questions (can be used to keep track of a stack of invoices)
  • 3 normal forms:
  • NF1 = No repeating elements or groups of elements.
  • NF2 = No partial dependencies on a concatenated key.
  • NF3 = No dependencies on non-key attributes.
  • NF1: No atomicity (Row cannot contain repeating groups of similar data), Need each row to have unique identifier (Primary Key)
  • Primary Key with two or more columns = concatenated primary key
  • NF2: "for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence"
  • If fails NF2, take out half of concatenated primary key and make own table
  • If make more concatenated keys, test for NF2 again
  • NF3: If column relies on non-key attribute, create foreign key (column that points to the primary key in another table)
I have to say that I'm having trouble wrapping my head around this process of data normalization. I think that maybe going through it myself in a hands-on way would be a big help. Otherwise I only barely understand the basic steps of the process.

No comments:

Post a Comment