LIS 2600 Course Blog: January 2012

Monday, January 30, 2012

Week 3 Lab - Introducing Zotero

http://www.citeulike.org/user/tscherping/library

Wednesday, January 25, 2012

Week 4 Reading Notes

Wikipedia article - Database

Database = "organized collection of data for one or more purposes," usually digital
= "organized to model relevant aspects of reality"
= data and data structures, NOT database management system (DBMS)

DBMS = complex software system, meets usage requirements
DBMSs: Oracle, IBM DB2, Microsoft SQL Server, Postgre SQL, MySQL, SQLite
DBMS standards: SQL, ODBC

Database contents can be: bibliographic, document-text, statistical, or multimedia objects
Database application areas include: accounting, music, compositions, movies, banking, manufacturing, and insurance

History: 1st gen. = navigational (hierarchical and Codasyl models)
2nd gen. = relational (in SQL language) and entity-relationship model
3rd gen. = post-relational or NoSQL (Object database and XML database)

People involved: DBMS developers, application developers and database administrators, and application's end-users

Database types:

Active = "event-driven architecture which can respond to conditions both inside and outside the database"
Cloud = database and most DBMS are "in the cloud"
Data warehouse = archive data from operational databases and outside sources (retrieving/analyzing/mining data, transforming/loading/managing data)
Distributed = "allows distinct DBMS instances to cooperate"
Document-oriented = stores, manages, edits, and retrieves documents
Embedded = tightly integrated with application software
End-user = developed by end-users (documents, spreadsheets, presentations)
Federated (multi-database) = integrated database comprised of several distinct databases
Graph = NoSQL, uses graph structures to represent and store info.
Hypermedia = World Wide Web acts as a database
In-memory = resides primarily in main memory
Knowledge base = specifically for knowledge management
Operational = stores data about operations of organization
Parallel = improves performance through parallelization
(Also: Real-time, Spatial, Temporal, and Unstructured-data database)

Functional requirements: defining data structure, manipulating data, protecting data, describing processes
Operational requirements: availability, performance, isolation between users, recovery, backup, data independence

DBMS components: external interfaces, language engines, query optimizers, database engine, storage engine, transaction engine, DBMS management and operation component

There was a lot of this wikipedia article that I didn't understand, but I tried to take notes on the parts that seemed important to me and made some sort of sense. Even if I don't understand the specific technical details, I did gain a better understand of exactly how we define a database and the different things they are used for. I also now understand the difference between a database and a DBMS.

Wikipedia article - Entity-relationship model

ER model = "abstract and conceptual representation of data"
Conceptual schema or semantic data model, top-down, creates ER diagrams
Model defines interaction between entities, relationships, and attributes

Relationships: expressed as a single verb implying direction or as a noun
Roles: define who does what in relationship
Cardinalities: ???

Semantic modeling of ER "adopts the more natural view that the real world consists of entities and relationships"

Diagramming conventions

Rectangles = entities
Diamonds = relationships
Line = connects entities to the relationships they participate in
Double line = participation constraint, totality, or surjectivity (all entities in at least one relationship in set)
Arrow = key constraint, injectivity (each entity in at most one relationship in set)
Thick line = bijectivity (each entity in exactly one relationship in set)
Underlined name of attribute = attribute is key (two different entities or relationships always have different values for attribute)

Alternative = Crow's Foot notation

Limitations of ER model:

only a relational structure, assumes info. can be represented in relations
cannot handle changes to information easily
difficulty in "integrating pre-existing information sources that already define their own data representations in detail"

I'm confused about what cardinalities are, since the article didn't include a definition. I'm guessing it has something to do with cardinal directions because previously the article was talking about the direction of relationship between entities. I think I basically understand the ER model and would be able to point to the different components of a diagram. I don't know if I can cannot this abstract representation with how the database functions, however.

3 Normal Forms Database Tutorial

Database normalization process = puts data in state that will make it usable to answer questions (can be used to keep track of a stack of invoices)
3 normal forms:
NF1 = No repeating elements or groups of elements.
NF2 = No partial dependencies on a concatenated key.
NF3 = No dependencies on non-key attributes.

NF1: No atomicity (Row cannot contain repeating groups of similar data), Need each row to have unique identifier (Primary Key)
Primary Key with two or more columns = concatenated primary key
NF2: "for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence"
If fails NF2, take out half of concatenated primary key and make own table
If make more concatenated keys, test for NF2 again
NF3: If column relies on non-key attribute, create foreign key (column that points to the primary key in another table)

I have to say that I'm having trouble wrapping my head around this process of data normalization. I think that maybe going through it myself in a hands-on way would be a big help. Otherwise I only barely understand the basic steps of the process.

Friday, January 20, 2012

Week 2 Lab

Jing video: http://screencast.com/t/uVgRcQXxvWW

Jing screen capture: http://www.flickr.com/photos/74178035@N03/6731663829/

Thursday, January 19, 2012

Week 3 Reading Notes

Introduction to Metadata: Setting the Stage

Metadata = data about data, describes information object
Necessary for "identification, representation, interoperability, technical management, performance, and use of data contained in an information system"
Metadata should tell us: content (what it contains, intrinsic), context (who what where why how, extrinsic), and structure (formal set of associations with other objects, like MARC records)
In libraries: indexes, abstracts, bibliographic records
Automated: metadata mining, Web crawling
Need for standardization of metadata
Was emphasis on structure and context --> now more emphasis on content with new tech.
Metadata is more than description and resource discovery - also object behavior, function and use, relationships, and management over time
NEW: user-created metadata, tagging, folksonomies

Types of metadata:

Administrative
Descriptive
Preservation
Technical
Use

Attributes and characteristics:

Source of metadata - internal (intrinsic, original creator) vs. external (outside source)
Method of metadata creation - automatic vs. manual
Nature of metadata - nonexpert vs. expert
Status - static vs. dynamic, long-term vs. short-term
Structure - structured vs. unstructured
Semantics - controlled (standardized) vs. uncontrolled
Level - collection-level vs. item-level

Phases: Creation/reuse --> Organizing/describing --> Validation --> Searching/Retrieval --> Disposition --> [repeat]

Metadata: not necessarily digital, more than description, comes from variety of sources, continues to accrue, one object's metadata can be another's data
Important for: increased accessibility, retention of context, expanding use, learning from metadata, system development and enhancement, multiversioning, legal issues, preservation and persistence

The main question I was left with after reading this chapter was: How much metadata is too much? This chapter talked about all the advantages of having all kinds of different types of metadata for information objects, but how much is too much? We use metadata to cope with information overload of data, so what about information overload of metadata? Aren't we just repeating the same problem?

An Overview of the Dublin Core Data Model

Dublin Core Metadata Initiative (DCMI) = "international effort designed to foster consensus across disciplines for the discovery-oriented description of diverse resources in an electronic environment" or standardized metadata architecture

DCMI Requirements:

Internationalization
Modularization/Extensibility - namespaces used for independent definitions of same term
Element Identity - specific, standardized definitions of terms
Semantic Refinement - richer semantic definitions
Identification of encoding schemes - less ambiguity
Specification of controlled vocabularies - "allow for additional understanding of contextual information"
Identification of structured compound values

Need to describe properties of resources

This project, while some of it is beyond my comprehension, makes sense to me as an effort to standardize the way we use metadata. The benefits of standardization in many aspects of information management is pretty self-evident to me, so this project sounds like a great idea.

EndNote X5: Introduction

EndNote = bibliographic software program

Can be used to:

Import citations from saved literature searches - hyperlinks to location on Web
Develop a personal library of references - catalog and track information, save notes
Create and format citations for papers and publications - automatic formatting

Flexible, customizable

This program sounds extremely helpful for someone who is dealing with a lot of resources for a job or a particular project. Having a way to organize and catalog these different resources would aid in accessing them in the future.

Wednesday, January 18, 2012

Week 2 Reading Notes

Apologies! I confused Weeks 1 and 2, so I took notes on the Week 1 readings and labeled them as Week 2. I know this is now late, but Week 2 notes are coming!

Wikipedia article - "Computer Hardware"

Hardware = "collection of physical elements that comprise a computer system" (performs input and output, stores data, manages all of these tasks together)
History: separate manual actions --> punchcards --> stored-program computers
Tied to history of computer data storage
History: analog computers (mechanical circuit models for electrical circuits) --> digital (no models or analogs)
Mainframe computers, minicomputers, microcomputers (personal computers)
Von Neumann architecture = processing unit with arithmetic logic unit and processor registers, control unit with instruction register and program counter, memory to store data and instructions, external mass storage, and input and output mechanisms

There is definitely a lot that I do not know about computer hardware, but I do understand the difference between it and computer software. It is also helpful to have learned that the specific tasks that hardware performs can be summarized as input/output, storage, and management.

Wikipedia article - "Computer Software"

Software = "a collection of computer programs and related data that provides the instructions for telling a computer what to do and how to do it"
Programs, procedures, algorithms, and its documentation
"Cannot be touched"
Software used to always be bundled with hardware, now software is its own business
Software licensing issues, patents
Types of software: system, programming, application, middleware, teachware, testware, firmware, shrinkware, device drivers
Types of architecture: platform, application, user-written
Free software license = recipients can modify and redistribute software

I had previously thought of software just in terms of "application" software, so now I understand that the term encompasses much more than that. Again, there is still a ton that I do not know about computer software, but I think that I do have a good basic understanding of what I need to know.

Digitization: Is It Worth It?

Digitization = "the conversion of analog media to digital form"
Downsides: long process, complex, many things can go wrong
Cost-benefit analysis? Pay for conversion itself but also assembling material, copyright licenses, machine upkeep, editing, cataloging, managing
Upsides: increasing access, preservation, increase visibility of institution
Digitizing vs. not digitizing: if item is in demand enough, digitizing becomes cost-effective
Digitizing vs. acquiring new materials: depends on many factors - digitizing is not always the right choice
CONCLUSION: Each case should be treated separately.

I think this article makes an important point that it's dangerous to jump into digitization projects without first considering whether it is the best decision for that particular circumstance. Digitization is not always the right decision and many factors need to be considered first.

European Libraries Face Problems in Digitalizing

European Digital Library as a competitor to Google Books project
Huge cost of digitizing materials
Originally had only limited public funds, now seeking private alliances
Business model?
Who controls the future of digital records and writings?
Alliance with Google itself?
Charging for access to copyrighted materials? Low-quality for free but higher quality for a fee?

I hadn't previously considered the idea that mass digitization of information/knowledge by one institution would create a centralization in the control of digital records in one institution and one country. Google is a very successful and well-liked company, but I can completely understand why its Google Books project made other countries nervous. Hopefully the solution is that many countries and institutions can work together to preserve and provide access to information.

Wikipedia article - "Data Compression"

Data compression (or source coding or bit-rate reduction) = "encoding information using fewer bits than the original representation would use," makes use of redundancy
Reduces consumption of hard disk space or transmission bandwidth
Must be decompressed to be used (may need certain hardware)
Lossless compression = no error, but slow compression/decompression (Lempel-Ziv or LV)
Lossy compression = some error (depending on how much error is acceptable), faster compression/decompression
Data compression comes from theories of machine learning and data differencing
Audio data compression - considers the design and function of the human ear
Video data compression - considers the design and function of the human eye

I remember reading a little about this topic in The Information: A History, a Theory, a Flood that we read in LIS 2000 last semester. At that time I was amazed at the way that data compression makes use of redundancy, mostly because we hardly notice this redundancy in our daily lives. Once you start looking for redundancy, though, you see that it is everywhere. I am someone who is very interested in psychology, so the way that data compression algorithms have to consider the way the human brain and body work is fascinating to me.

Thursday, January 12, 2012

Week 1 Reading Notes

OCLC 2004 Information Format Trends: Content, not Containers

Major points:

"Unbundling" of content - no longer contained in books, journals, etc., not format-dependent
Desire for a variety of formats means "processes of acquisition, organization, and delivery of content" need to adapt
Transition from traditional print publishing --> e-books and self-publishing
Intellectual property issues becoming more of a concern
Text is still critical, but multimedia content is booming
Digital content explosion in email, SMS text messaging, and webpages
Mobile devices --> consumers aren't tied to a computer for access
Social communication - "many to many"
Increased expectation for "just-in-time" delivery of content
"Microcontent" such as ringtones developing
Rise of "social publishing" such as wikis and blogs
More consumers getting DVDs by mail or pay-per-view
Scholarly materials also experiencing massive transition to digital materials (e-books and also print-on-demand)
Academic and research libraries spend significant amount on e-journal collection subscriptions
Growth of the Open Access movement
Greater emphasis on placing information in context and building knowledge from information, rather than just finding information

This article really shows some of the major trends in the information field and how the changes we see today were already evident in 2004, which is now eight years in the past. There are a lot of questions raised here that we still haven't answered. In what formats should libraries make content available? How do we translate content into new formats as tastes and technology change? How will intellectual property law play out in the digital environment? Specifically, how can we balance the rights of consumers of intellectual property with the need to encourage and incentivize its creation?

Lied Library @ four years: technology never stands still

New systems implemented over four years:

"Off-site access to licensed electronic resources"
"Digital content management system" for development and access
Virtual reference software
A to Z Title List for "better access to the library's electronic serials"
Link resolver software "to help provide more seamless searching of library electronic resources"
Laptop checkout program
Electronic reserves system
Internet 2 access grid within the library

Existing systems that have grown/changed:

Media distribution system, print cost recovery system --> now encompass other users
Various vendor software updates
Printing system implemented override print queue
Migrated to one-card system for printing
Other adjustments to printing system
Integrated library system - "additional user licenses and new central site hardware"
Synchronization of patron records with one-card system and student records
"Digital Library Assistant devices for stacks management"
Computer replacement project

Continuing challenges:

Hardware and software maintenance costs
Computing resources management
Fighting spread of malicious software
Physical space limitations
Security
Hardware/software issues or glitches
Maintaining or growing sufficient staff for information technology services
Building an institutional repository
Continuing to replace computers
Expanding instruction sessions
Enhancing network infrastructure
Further RFID tagging
Changes in library leadership

I thought this was a great example of an academic library's changes over the past few years and how they have adapted to changing technology. Obviously there are new challenges ahead for academic libraries, but the sense I got from this article was that establishing a culture where technological change is met and embraced is a huge part of a library continuing to thrive today.

Information Literacy and Information Technology Literacy: New Components in the Curriculum for a Digital Culture

Information technology literacy is an understanding of...

"the technology infrastructure"
"the tools technology provides and their interaction with this infrastructure"
"the legal, social, economic, and public policy issues that shape the development" of infrastructure/applications/technologies

Information literacy is about...

"content and communication"
"authoring, information finding and organization, the research process, and information analysis, assessment, and evaluation"
in the form of "text, images, video, computer simulations, multi-media interactive works"
for the purpose of "news, art, entertainment, education, research and scholarship, advertising, politics, commerce" and business or personal documents and records

In information technology literacy, students need to know...

how to use word processing, spreadsheets, computers in general, web browsers, email systems, possibly a programming language (short-term skills, quickly become dated)
"how technologies, systems, and infrastructure work" at a "superficial descriptive level" and/or a "more detailed analytic or engineering level" (more crucial than skills with specific tools)
troubleshooting, problem-solving, and debugging of software
"graphic display of quantitative information"
"construction, analysis, and use of simulations"
basics of "computing, telecommunications, broadcasting, publishing, electrical power distribution, transportation, and financial infrastructure"
"history, economics, social and public policy issues"

In information literacy, students need to know...

"authoring and critical and analytic reading (including the assessment of purpose, bias, accuracy, and quality)" in text AND visual and multimedia communication
that digital forms are more fluid and that computers can manipulate what were once "factual records" such as images
searching systems as well as cataloguing, abstracting, indexing, rating
the importance of information accessibility, visibility, and impact
"the limitations of both digital information resources and...searching techniques"
how information resources and technological and economic structures interrelate
what information resources are appropriate for specific information needs
legal, social, economic, and ethical issues surrounding intellectual property
issues of privacy, "information authenticity, provenance and integrity, documentation and archiving" and records management and construction

In other classes I've learned about some of the skills and competencies associated with information literacy, but information technology literacy is something I hadn't heard of before. There are a lot of questions in how to best teach these new literacies to both young people and adults, as well as what the average person will or will not find useful. The article touches on some of these questions. Another question is whether these are skills/competencies or general comprehension/knowledge or specific attitudes that we want to engender through education. Regardless, as someone who wants to work with young people in particular, I agree that this topic is critical.