Wednesday, January 25, 2012

Week 4 Reading Notes

Wikipedia article - Database
  • Database = "organized collection of data for one or more purposes," usually digital
  • = "organized to model relevant aspects of reality"
  • = data and data structures, NOT database management system (DBMS)
  • DBMS = complex software system, meets usage requirements
  • DBMSs: Oracle, IBM DB2, Microsoft SQL Server, Postgre SQL, MySQL, SQLite
  • DBMS standards: SQL, ODBC
  • Database contents can be: bibliographic, document-text, statistical, or multimedia objects
  • Database application areas include: accounting, music, compositions, movies, banking, manufacturing, and insurance
  • History: 1st gen. = navigational (hierarchical and Codasyl models)
  • 2nd gen. = relational (in SQL language) and entity-relationship model
  • 3rd gen. = post-relational or NoSQL (Object database and XML database)
  • People involved: DBMS developers, application developers and database administrators, and application's end-users
Database types:
  • Active = "event-driven architecture which can respond to conditions both inside and outside the database"
  • Cloud = database and most DBMS are "in the cloud"
  • Data warehouse = archive data from operational databases and outside sources (retrieving/analyzing/mining data, transforming/loading/managing data)
  • Distributed = "allows distinct DBMS instances to cooperate"
  • Document-oriented = stores, manages, edits, and retrieves documents
  • Embedded = tightly integrated with application software
  • End-user = developed by end-users (documents, spreadsheets, presentations)
  • Federated (multi-database) = integrated database comprised of several distinct databases
  • Graph = NoSQL, uses graph structures to represent and store info.
  • Hypermedia = World Wide Web acts as a database
  • In-memory = resides primarily in main memory
  • Knowledge base = specifically for knowledge management
  • Operational = stores data about operations of organization
  • Parallel = improves performance through parallelization
  • (Also: Real-time, Spatial, Temporal, and Unstructured-data database)
  • Functional requirements: defining data structure, manipulating data, protecting data, describing processes
  • Operational requirements: availability, performance, isolation between users, recovery, backup, data independence
  • DBMS components: external interfaces, language engines, query optimizers, database engine, storage engine, transaction engine, DBMS management and operation component
There was a lot of this wikipedia article that I didn't understand, but I tried to take notes on the parts that seemed important to me and made some sort of sense. Even if I don't understand the specific technical details, I did gain a better understand of exactly how we define a database and the different things they are used for. I also now understand the difference between a database and a DBMS.


Wikipedia article - Entity-relationship model
  • ER model = "abstract and conceptual representation of data"
  • Conceptual schema or semantic data model, top-down, creates ER diagrams
  • Model defines interaction between entities, relationships, and attributes
  • Relationships: expressed as a single verb implying direction or as a noun
  • Roles: define who does what in relationship
  • Cardinalities: ???
  • Semantic modeling of ER "adopts the more natural view that the real world consists of entities and relationships"
Diagramming conventions
  • Rectangles = entities
  • Diamonds = relationships
  • Line = connects entities to the relationships they participate in
  • Double line = participation constraint, totality, or surjectivity (all entities in at least one relationship in set)
  • Arrow = key constraint, injectivity (each entity in at most one relationship in set)
  • Thick line = bijectivity (each entity in exactly one relationship in set)
  • Underlined name of attribute = attribute is key (two different entities or relationships always have different values for attribute)
Alternative = Crow's Foot notation

Limitations of ER model:
  • only a relational structure, assumes info. can be represented in relations
  • cannot handle changes to information easily
  • difficulty in "integrating pre-existing information sources that already define their own data representations in detail"

I'm confused about what cardinalities are, since the article didn't include a definition. I'm guessing it has something to do with cardinal directions because previously the article was talking about the direction of relationship between entities. I think I basically understand the ER model and would be able to point to the different components of a diagram. I don't know if I can cannot this abstract representation with how the database functions, however.


3 Normal Forms Database Tutorial
  • Database normalization process = puts data in state that will make it usable to answer questions (can be used to keep track of a stack of invoices)
  • 3 normal forms:
  • NF1 = No repeating elements or groups of elements.
  • NF2 = No partial dependencies on a concatenated key.
  • NF3 = No dependencies on non-key attributes.
  • NF1: No atomicity (Row cannot contain repeating groups of similar data), Need each row to have unique identifier (Primary Key)
  • Primary Key with two or more columns = concatenated primary key
  • NF2: "for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence"
  • If fails NF2, take out half of concatenated primary key and make own table
  • If make more concatenated keys, test for NF2 again
  • NF3: If column relies on non-key attribute, create foreign key (column that points to the primary key in another table)
I have to say that I'm having trouble wrapping my head around this process of data normalization. I think that maybe going through it myself in a hands-on way would be a big help. Otherwise I only barely understand the basic steps of the process.

Thursday, January 19, 2012

Week 3 Reading Notes

Introduction to Metadata: Setting the Stage
  • Metadata = data about data, describes information object
  • Necessary for "identification, representation, interoperability, technical management, performance, and use of data contained in an information system"
  • Metadata should tell us: content (what it contains, intrinsic), context (who what where why how, extrinsic), and structure (formal set of associations with other objects, like MARC records)
  • In libraries: indexes, abstracts, bibliographic records
  • Automated: metadata mining, Web crawling
  • Need for standardization of metadata
  • Was emphasis on structure and context --> now more emphasis on content with new tech.
  • Metadata is more than description and resource discovery - also object behavior, function and use, relationships, and management over time
  • NEW: user-created metadata, tagging, folksonomies
Types of metadata:
  • Administrative
  • Descriptive
  • Preservation
  • Technical
  • Use
Attributes and characteristics:
  • Source of metadata - internal (intrinsic, original creator) vs. external (outside source)
  • Method of metadata creation - automatic vs. manual
  • Nature of metadata - nonexpert vs. expert
  • Status - static vs. dynamic, long-term vs. short-term
  • Structure - structured vs. unstructured
  • Semantics - controlled (standardized) vs. uncontrolled
  • Level - collection-level vs. item-level
Phases: Creation/reuse --> Organizing/describing --> Validation --> Searching/Retrieval --> Disposition --> [repeat]
  • Metadata: not necessarily digital, more than description, comes from variety of sources, continues to accrue, one object's metadata can be another's data
  • Important for: increased accessibility, retention of context, expanding use, learning from metadata, system development and enhancement, multiversioning, legal issues, preservation and persistence
The main question I was left with after reading this chapter was: How much metadata is too much? This chapter talked about all the advantages of having all kinds of different types of metadata for information objects, but how much is too much? We use metadata to cope with information overload of data, so what about information overload of metadata? Aren't we just repeating the same problem?


An Overview of the Dublin Core Data Model
  • Dublin Core Metadata Initiative (DCMI) = "international effort designed to foster consensus across disciplines for the discovery-oriented description of diverse resources in an electronic environment" or standardized metadata architecture
DCMI Requirements:
  • Internationalization
  • Modularization/Extensibility - namespaces used for independent definitions of same term
  • Element Identity - specific, standardized definitions of terms
  • Semantic Refinement - richer semantic definitions
  • Identification of encoding schemes - less ambiguity
  • Specification of controlled vocabularies - "allow for additional understanding of contextual information"
  • Identification of structured compound values
Need to describe properties of resources

This project, while some of it is beyond my comprehension, makes sense to me as an effort to standardize the way we use metadata. The benefits of standardization in many aspects of information management is pretty self-evident to me, so this project sounds like a great idea.


EndNote X5: Introduction
  • EndNote = bibliographic software program
Can be used to:
  • Import citations from saved literature searches - hyperlinks to location on Web
  • Develop a personal library of references - catalog and track information, save notes
  • Create and format citations for papers and publications - automatic formatting
Flexible, customizable

This program sounds extremely helpful for someone who is dealing with a lot of resources for a job or a particular project. Having a way to organize and catalog these different resources would aid in accessing them in the future.

Wednesday, January 18, 2012

Week 2 Reading Notes

  • Apologies! I confused Weeks 1 and 2, so I took notes on the Week 1 readings and labeled them as Week 2. I know this is now late, but Week 2 notes are coming!


Wikipedia article - "Computer Hardware"
  • Hardware = "collection of physical elements that comprise a computer system" (performs input and output, stores data, manages all of these tasks together)
  • History: separate manual actions --> punchcards --> stored-program computers
  • Tied to history of computer data storage
  • History: analog computers (mechanical circuit models for electrical circuits) --> digital (no models or analogs)
  • Mainframe computers, minicomputers, microcomputers (personal computers)
  • Von Neumann architecture = processing unit with arithmetic logic unit and processor registers, control unit with instruction register and program counter, memory to store data and instructions, external mass storage, and input and output mechanisms
There is definitely a lot that I do not know about computer hardware, but I do understand the difference between it and computer software. It is also helpful to have learned that the specific tasks that hardware performs can be summarized as input/output, storage, and management.



Wikipedia article - "Computer Software"
  • Software = "a collection of computer programs and related data that provides the instructions for telling a computer what to do and how to do it"
  • Programs, procedures, algorithms, and its documentation
  • "Cannot be touched"
  • Software used to always be bundled with hardware, now software is its own business
  • Software licensing issues, patents
  • Types of software: system, programming, application, middleware, teachware, testware, firmware, shrinkware, device drivers
  • Types of architecture: platform, application, user-written
  • Free software license = recipients can modify and redistribute software
I had previously thought of software just in terms of "application" software, so now I understand that the term encompasses much more than that. Again, there is still a ton that I do not know about computer software, but I think that I do have a good basic understanding of what I need to know.


Digitization: Is It Worth It?
  • Digitization = "the conversion of analog media to digital form"
  • Downsides: long process, complex, many things can go wrong
  • Cost-benefit analysis? Pay for conversion itself but also assembling material, copyright licenses, machine upkeep, editing, cataloging, managing
  • Upsides: increasing access, preservation, increase visibility of institution
  • Digitizing vs. not digitizing: if item is in demand enough, digitizing becomes cost-effective
  • Digitizing vs. acquiring new materials: depends on many factors - digitizing is not always the right choice
  • CONCLUSION: Each case should be treated separately.
I think this article makes an important point that it's dangerous to jump into digitization projects without first considering whether it is the best decision for that particular circumstance. Digitization is not always the right decision and many factors need to be considered first.


European Libraries Face Problems in Digitalizing
  • European Digital Library as a competitor to Google Books project
  • Huge cost of digitizing materials
  • Originally had only limited public funds, now seeking private alliances
  • Business model?
  • Who controls the future of digital records and writings?
  • Alliance with Google itself?
  • Charging for access to copyrighted materials? Low-quality for free but higher quality for a fee?
I hadn't previously considered the idea that mass digitization of information/knowledge by one institution would create a centralization in the control of digital records in one institution and one country. Google is a very successful and well-liked company, but I can completely understand why its Google Books project made other countries nervous. Hopefully the solution is that many countries and institutions can work together to preserve and provide access to information.


Wikipedia article - "Data Compression"
  • Data compression (or source coding or bit-rate reduction) = "encoding information using fewer bits than the original representation would use," makes use of redundancy
  • Reduces consumption of hard disk space or transmission bandwidth
  • Must be decompressed to be used (may need certain hardware)
  • Lossless compression = no error, but slow compression/decompression (Lempel-Ziv or LV)
  • Lossy compression = some error (depending on how much error is acceptable), faster compression/decompression
  • Data compression comes from theories of machine learning and data differencing
  • Audio data compression - considers the design and function of the human ear
  • Video data compression - considers the design and function of the human eye
I remember reading a little about this topic in The Information: A History, a Theory, a Flood that we read in LIS 2000 last semester. At that time I was amazed at the way that data compression makes use of redundancy, mostly because we hardly notice this redundancy in our daily lives. Once you start looking for redundancy, though, you see that it is everywhere. I am someone who is very interested in psychology, so the way that data compression algorithms have to consider the way the human brain and body work is fascinating to me.

Thursday, January 12, 2012

Week 1 Reading Notes

OCLC 2004 Information Format Trends: Content, not Containers
Major points:
  • "Unbundling" of content - no longer contained in books, journals, etc., not format-dependent
  • Desire for a variety of formats means "processes of acquisition, organization, and delivery of content" need to adapt
  • Transition from traditional print publishing --> e-books and self-publishing
  • Intellectual property issues becoming more of a concern
  • Text is still critical, but multimedia content is booming
  • Digital content explosion in email, SMS text messaging, and webpages
  • Mobile devices --> consumers aren't tied to a computer for access
  • Social communication - "many to many"
  • Increased expectation for "just-in-time" delivery of content
  • "Microcontent" such as ringtones developing
  • Rise of "social publishing" such as wikis and blogs
  • More consumers getting DVDs by mail or pay-per-view
  • Scholarly materials also experiencing massive transition to digital materials (e-books and also print-on-demand)
  • Academic and research libraries spend significant amount on e-journal collection subscriptions
  • Growth of the Open Access movement
  • Greater emphasis on placing information in context and building knowledge from information, rather than just finding information
This article really shows some of the major trends in the information field and how the changes we see today were already evident in 2004, which is now eight years in the past. There are a lot of questions raised here that we still haven't answered. In what formats should libraries make content available? How do we translate content into new formats as tastes and technology change? How will intellectual property law play out in the digital environment? Specifically, how can we balance the rights of consumers of intellectual property with the need to encourage and incentivize its creation?


Lied Library @ four years: technology never stands still
New systems implemented over four years:
  • "Off-site access to licensed electronic resources"
  • "Digital content management system" for development and access
  • Virtual reference software
  • A to Z Title List for "better access to the library's electronic serials"
  • Link resolver software "to help provide more seamless searching of library electronic resources"
  • Laptop checkout program
  • Electronic reserves system
  • Internet 2 access grid within the library
Existing systems that have grown/changed:
  • Media distribution system, print cost recovery system --> now encompass other users
  • Various vendor software updates
  • Printing system implemented override print queue
  • Migrated to one-card system for printing
  • Other adjustments to printing system
  • Integrated library system - "additional user licenses and new central site hardware"
  • Synchronization of patron records with one-card system and student records
  • "Digital Library Assistant devices for stacks management"
  • Computer replacement project
Continuing challenges:
  • Hardware and software maintenance costs
  • Computing resources management
  • Fighting spread of malicious software
  • Physical space limitations
  • Security
  • Hardware/software issues or glitches
  • Maintaining or growing sufficient staff for information technology services
  • Building an institutional repository
  • Continuing to replace computers
  • Expanding instruction sessions
  • Enhancing network infrastructure
  • Further RFID tagging
  • Changes in library leadership
I thought this was a great example of an academic library's changes over the past few years and how they have adapted to changing technology. Obviously there are new challenges ahead for academic libraries, but the sense I got from this article was that establishing a culture where technological change is met and embraced is a huge part of a library continuing to thrive today.


Information Literacy and Information Technology Literacy: New Components in the Curriculum for a Digital Culture
Information technology literacy is an understanding of...
  • "the technology infrastructure"
  • "the tools technology provides and their interaction with this infrastructure"
  • "the legal, social, economic, and public policy issues that shape the development" of infrastructure/applications/technologies
Information literacy is about...
  • "content and communication"
  • "authoring, information finding and organization, the research process, and information analysis, assessment, and evaluation"
  • in the form of "text, images, video, computer simulations, multi-media interactive works"
  • for the purpose of "news, art, entertainment, education, research and scholarship, advertising, politics, commerce" and business or personal documents and records
In information technology literacy, students need to know...
  • how to use word processing, spreadsheets, computers in general, web browsers, email systems, possibly a programming language (short-term skills, quickly become dated)
  • "how technologies, systems, and infrastructure work" at a "superficial descriptive level" and/or a "more detailed analytic or engineering level" (more crucial than skills with specific tools)
  • troubleshooting, problem-solving, and debugging of software
  • "graphic display of quantitative information"
  • "construction, analysis, and use of simulations"
  • basics of "computing, telecommunications, broadcasting, publishing, electrical power distribution, transportation, and financial infrastructure"
  • "history, economics, social and public policy issues"
In information literacy, students need to know...
  • "authoring and critical and analytic reading (including the assessment of purpose, bias, accuracy, and quality)" in text AND visual and multimedia communication
  • that digital forms are more fluid and that computers can manipulate what were once "factual records" such as images
  • searching systems as well as cataloguing, abstracting, indexing, rating
  • the importance of information accessibility, visibility, and impact
  • "the limitations of both digital information resources and...searching techniques"
  • how information resources and technological and economic structures interrelate
  • what information resources are appropriate for specific information needs
  • legal, social, economic, and ethical issues surrounding intellectual property
  • issues of privacy, "information authenticity, provenance and integrity, documentation and archiving" and records management and construction
In other classes I've learned about some of the skills and competencies associated with information literacy, but information technology literacy is something I hadn't heard of before. There are a lot of questions in how to best teach these new literacies to both young people and adults, as well as what the average person will or will not find useful. The article touches on some of these questions. Another question is whether these are skills/competencies or general comprehension/knowledge or specific attitudes that we want to engender through education. Regardless, as someone who wants to work with young people in particular, I agree that this topic is critical.