LIS 2600 Course Blog: 2012

Saturday, April 14, 2012

Week 13 Lab

URL of wiki user page: http://liswiki.org/wiki/User:Tgs11

Week 14 Reading Notes

No Place to Hide website

"Where the data revolution meets the needs of national security, there is no place to hide."

Ways we can be tracked electronically:

Credit card records
Surveillance cameras
WiFi
Subway/MetroCards
Satellite navigation systems in cars
Card swipes on copiers/vending machines/ATMs
Clocking in at work
E-Z Passes at toll booths
Internet browsing/shopping/email
TiVo
ID/face/iris/fingerprint scan to access building
Phone calls

RFID = radio frequency identification

-getting cheaper and smaller, can hold more info

Monitoring by companies, law enforcement, or private investigators

Companies using RFID: car manufacturers, gas stations, Walmart, Defense Department, FDA, casinos, jails, schools

RFIDs can:

increase efficiency
fight credit card fraud, other security issues
improve customer relationship management, marketing

Controversy surrounding RFIDs and the info they gather

Tagging could eventually extend to...everything?

-No more anonymous transactions

"Why worry if you have nothing to hide?"

"We have nothing to worry about, until they make a mistake."

Trading privacy for security

Verint surveillance systems

-including government wiretapping

Goal of some companies to get people used to surveillance

This chapter was really eye-opening and kind of scaring to think about. I can understand the desire to record information about people, especially for reasons of security, but I also think that people do have a right to privacy. While to some degree, it's true that if you have nothing to hide you have nothing to worry about, what happens when the surveillance recordings make a mistake? Or what if the government takes a turn for the Orwellian and have cut off any way for citizens to resist? That thought does scare me, that we put so much power into the hands of people and organizations that might not use it wisely and responsibly. There should still be a way to opt out - someone should be working on tech that will increase privacy, not decrease it.

Total "Terrorism" Information Awareness (TIA)

EPIC = Electronic Privacy Information Center

Data mining in federal agencies

Defense Advanced Research Projects Agency (DARPA) making tracking system TIA

-designed to give law enforcement private data without warrant

-captures "information signatures"

TIA = grand database that includes:

financial records
medical records
communication records
travel records
intelligence data

Identifies and tracks individuals across multiple info sources

TIA = no longer being funded, agency shut down

-could still be similar government projects in the future

This TIA project sounds pretty creepy, and while I'm glad it's no longer being funded, I do agree that the government won't necessarily abandon the idea of recording all information on people if they think it will improve security. Like I said before, though I do believe that people have a right to privacy. Giving the government too much ability to track its citizens could just lead to an abuse of power where the government has too much control.

MyTurn: Protecting Privacy Rights in Libraries

Laws protecting privacy of library records (in 40 states)

-can only be shared with judicial order or warrant

VT law says parents get library records of children under 16

Children can have various needs to keep info from parents

-child abuse

-drug abuse

-health questions parents won't answer

Police officers in a particular case tried to take computers without a warrant

-Brooke Bennett investigation

-librarians want to help but won't break legally-binding policy

Library supports:

right to privacy
right to open inquiry
freedom of speech
freedom to receive information

This is one of those issues that gets me so upset because so many people are so ignorant about the values of the library and the way they work. The woman who wrote the letter that this blog post is responding to thought that library records should be able to be seized by the police for any reason. The fact that the library is standing its ground on issues of privacy and confidentiality gives me hope after reading the first two articles this week. The library is still one place where a person can trust that his or her actions are private, are not being monitored, and will not be used against him/her.

Saturday, April 7, 2012

Week 13 Reading Notes

Content Nation: Surviving and Thriving as Social Media Changes Our Work, Our Lives, and Our Future

Social media: highly scalable and accessible communications tech, helps individuals to easily publish and influence others

-Similar: Web 2.0, user-generated content, social networking

Scalable and accessible tech
Individual people communicate with other groups of individuals
Enables influence

Types of social media:

Personal publishing - blogs - individuals tell stories to others
Collaborate publishing - wikis - multiple people collaborate on common document for themselves and/or others
Social-network publishing - Facebook and LinkedIn - people find other people
Feedback and discussions - Amazon - share info and opinions on a topic with others
Aggregation and filtering - YouTube and Flickr - aggregate collections of content from various sources
Widgets and mashups - add value to social media by creating complementary content
Personal markets and marketing - Craigslist and eBay - find people with goods and services, create market

Social media does not eliminate human nature, just gives a new way to express itself

Goal of social media = influence over others

Conflict can arise, however:

*"Order can come from people who collaborate to enforce mutually accepted standards of behavior."

-Each site has own standards

-Need to follow standards in order to influence opinions of others

Content:

The "stuff"
Requires an audience
Value is contextual
= "info and experiences in contexts that provide value to audiences"
Comes from many different sources
Social media makes distribution easier
Exponentially more every day

Aggregation in social media: different from traditional model, distribution not a competitive barrier

-aggregation can now be highly focused (New Aggregation)

-content not indexed but can be reinvented

Brands, Affinity, Endorsements:

-value through marketable relationships as well as marketable content

-Ex. blogs that become popular, build reputation = influence over others

-affinity = more important because more options

Timing: part of context, different formats have different value

-long tail = consistent popularity in small groups

-long snout = popularity in some groups while still in development (social media)

Social Media Secrets

Ability to scale efforts independently = important
Understanding people > understanding technology
Law of the campfire, not law of the jungle
Valuable to create new contexts for content
Not mass production, but mass contextualization
Direct contact with others who value your insights
Valuable to people who want to be ahead of other people

This article was a fascinating account of how to effectively use social media. I'd never thought of influencing others as the main goal of social media, but it does make sense. I think there are a lot of valuable insights here for someone who wants to publish through social media or for a corporation that wants to understand how to use social media. Corporations, which are used traditional models of publishing and advertising need this information the most because social media has its own model that is in some ways radically different.

Using a Wiki to Manage a Library Instruction Program

Wiki can:

create better info sharing
facilitate collaboration in creation of resources
efficiently divide workload

Many websites where you can set up wikis, invite by email

Wikis have been used by librarians to: "manage public services information, collaborate on and keep track of reference questions, and assess databases"

-stores info in central location

Used in library instruction program at ETSU

-teaches how to use website, find journal articles, critical thinking about info sources, evaluating websites

Learned more about:

specifics of the classes' needs
unforeseen directions of assignment
preferences of professor
housekeeping issues
broken web links

Second use: centralized resource collaboration tool

-information handouts, subject resource guides, how-to instructions

This is good background on how wikis could be useful to librarians, but I think that their uses are probably endless and there are more than is covered in this article. Mostly I use wikis for working on group projects when it is difficult to keep getting together to work. Really any project where a group of people has to come up with one final deliverable is a good time to use a wiki. I think people will be using them more and more, in and out of libraries. The other benefit to them is that they require almost no training at all to use.

Creating the Academic Library Folksonomy

Social tagging enables: quickly find disparate info, store bookmarks and access them anywhere, see what others are reading, find unexpected resources

Social tagging = create bookmarks/tags for websites and save them online, including subject keywords (such as del.icio.us)
Folksonomy = taxonomy created by ordinary folks, users create own controlled vocab

Library: has catalog, but can't catalog the internet

use tagging to guide users to helpful resources online
can tag articles in licensed databases, too
tagging can bring to light resources that are harder to search for

PennTags at UPenn = students bookmark quality websites and share
-Stanford uses content management software Drupal

Connotea and CiteULike intended for academics, pull bib info

Subject specialists can begin social tagging process
-can use Librarians' Internet Index or C&RL News Internet Resources columns

Risks of social tagging:

Spagging or spam tagging
Users with bad intentions tagging inappropriate content
No specific controlled vocabulary among users

This was a good, brief introduction into the ways that libraries can use social tagging at universities. I think that this could be very useful to academic libraries, and I'm curious if, since it was written, there are more universities that utilize something like this. Obviously social tagging has its downsides, but it works particularly well with internet sites, which librarians can't feasibly catalog anyway. Just as there are a multitude of ways that libraries can use wikis in their program, so there are also many ways that tagging could be useful, especially if you creatively set up the tagging system, such as giving subject specialists administrative control.

Jimmy Wales on the Birth of Wikipedia - TED

Radical encyclopedias: Brittanica vs. Wikipedia

Goal = give everyone in the world access to the sum of human knowledge (free encyclopedia)

Staffed by volunteers
Wiki software
Freely licensed
Funded by public
Many languages, only 1/3 traffic to English
One employee
Cost: $5000/month

More accurate than more traditional encyclopedias

Neutral point-of-view

-vandalism is bigger problem than controversy

Policies and software maintain quality

-despite allowing edits by anonymous users (minority of edits)

-Request for Deletion page

-volunteer administrators

Administration: part consensus, democracy, aristocracy, monarchy (Jimmy Wales can change the rules), NOT anarchists

I already knew a bit about Wikipedia, but it was interesting to hear about it from the founder himself. Since this video is from 2005, I wonder how much of the information has changed, besides things like the number of pages or pageviews or things like that. I think that, in general, the world is better off with a source like Wikipedia than if it didn't exist, but I think that people need to understand the best way to use it, when to use it, and what parts are the most trustworthy. I think this is true of any information source, though.

Thursday, March 29, 2012

Week 12 Reading Notes

Web Search Engines: Part 1

Indexing Web done by Microsoft, Google, Yahoo

Reject low-value automated content
Ignore Web-accessible data
No access to restricted content

Large search engines have many data centers around the world (clusters of commodity PCs, servers)

crawling
indexing
query processing
snippet generation
link-graph computations
result caching
insertion of advertising content

400 terabytes of info to crawl

Crawling mechanism = queue of URLs, beginning with "seed"

Issues in crawling:

Speed - use of internal parallelism
Politeness - not bombing requests on one server
Excluded content - communicate with robots.txt file
Duplicate content - identify identical content with different URLs
Continuous crawling - priority queue based on what is current and changing
Spam rejection - manual AND automated analysis

Crawlers are highly complex and need to adapt

Web Search Engines: Part 2

Indexing algorithms use inverted files (concatenation of posting lists for each distinct term)

-Scans text for indexable terms and gives number

-Inverts by sorting into term number order

Issues with indexing:

Scaling up - to be efficient
Term lookup - all languages plus new terms
Compression - takes less storage
Phrases - precompute posting lists or create sublists
Anchor text - link text provides info on destination
Link popularity score - derived from frequency of incoming links
Query-independent score - based on link popularity, URL brevity, spam score, frequency of clicks

Average query length = 2.3 words

-Need more than simple-query processor

Ways to speed things up:

Skipping - irrelevant postings
Early termination - once postings left are of little value
Clever assignment of document numbers - decreasing query-independent score
Caching - reduces cost of answering queries

These two articles provided an interesting view of how search engines work. I didn't realize that they actually index the pages that they crawl, but it makes sense. It's funny to think about the fact that computer scientists used to think this wouldn't be possible, back when there was only a fraction of the information on the Internet that we have today. I can't believe the amount of information that a search engine has to deal with, and while they may not be perfect yet, I can see how these techniques and algorithms make things fast and efficient and fairly accurate.

White Paper: The Deep Web: Surfacing Hidden Value

Deep Web = buried too far down (on dynamically generated sites) for standard search engines to find it

-need to be static and linked to other sites to be found

Deep Web content = in searchable databases, only produces results in response to a search

-7,500 terabytes of info

BrightPlanet = makes dozens of direct queries simultaneously with multi-thread technology

Search engines: either author submits his/her site or engine "crawls" docs by moving between hyperlinks
Google: crawls and indexes based on popularity of sites
--If search engines depend on linkages, they'll never get to the deep Web

Factors in deep Web development:

Database technology
Commercialization through directories and e-commerce
Dynamic serving of web pages

BrightPlanet = directed query engine, gets at deep Web

Deep Web = 10x greater amount of content than rest of Web

Deep Web site qualifications: 43,348 URLs

Database types in deep Web:

Topic databases
Internal site
Publications
Shopping/auction
Classifieds
Portals
Library
Yellow and white pages
Calculators
Jobs
Message or chat
General search

Deep Web content:

Deep Web docs are27% smaller than surface Web docs
Deep Web sites are much larger than surface Web sites
Deep Web sites have about 50% more traffic than surface Web sites
97.4% of deep Web is publicly available
Deep Web may be higher quality than surface Web
Deep Web growing faster than surface Web

Needs to be a way to search info in deep Web = BrightPlanet?

This article was very interesting because I had previously heard a bit about the deep Web, but what I had read previously made the deep Web sound like a sinister place where most of the content was illegal and strange. Now I understand that the deep Web is mostly databases related to different organizations that are "buried" because there are no links that connect them to the rest of the Web, so conventional search engines cannot find them in the usual way. I hope that we keep developing our ability to access and search the deep Web because it sounds like there is a lot of information there that could be useful to people.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting

Open Archives Protocol for Metadata Harvesting = OAI-PMH

Federates access to diverse e-print archives through metadata harvesting and aggregation
Released in 2001, used by content management systems
Mission: "develop and promote interoperability standards that aim to facilitate the efficient dissemination of content"
Uses XML, HTTP, and Dublin Core standards
Data providers or repositories provide metadata
Service providers or harvesters harvest the metadata
Can provide access to invisible/deep Web

Notable community- or domain-specific services:

Open Language Archive Community

Sheet Music Consortium

National Science Digital Library

Comprehensive, searchable registry of OAI repositories

-more informative, searchable, and complete than in the past

-machine processing option

Future work of OAI registry:

Enhance descriptions of repositories for search
Provide automated maintenance of registry
Delegate creation/maintenance of collections
Improve view of search results

Extensible Repository Resource Locators = ERRoLs, "cool URLs," lead to content and services relating to an OAI repository

-simple mechanism to access OAI data

Challenges for OAI community:

Metadata variation
Metadata formats
OAI data provider implementation practices
Communication issues

Future directions: best practices, static repository gateway, Mod_oai Project, OAI-rights, controlled vocabularies and OAI, SRW/U-to-OAI gateway to the ERRoL service

If I understood this article correctly, this seems like another project attempting to get to the useful data that is in the deep Web, just like the BrightPlanet mentioned in the last article. This sounds like it is a open collaboration, however, between the people who have the data and the people who want to access it or give access to others. This seems like a good strategy and will be useful to people who want to use the deep Web data that is currently unavailable. There's still a lot of this I don't understand, but I hope that I'm generally correct in the overall idea that this article is presenting.

Lab 11

Google Scholar query:

~virtual reference "digital libraries"

[also specified articles published between 2008 and 2012]

Google Scholar screenshot:

Web of Knowledge query:

Topic=(virtual reference) OR Topic=(digital libraries) AND Year Published=(2008-2012)

Web of Knowledge screenshot:

Wednesday, March 28, 2012

Week 11 Reading Notes

Digital Libraries: Challenges and Influential Work

Current info environment includes: "full-text repositories maintained by commercial and professional society publishers; preprint servers and Open Archive Initiative (OAI) provider sites; specialized Abstracting and Indexing (A & I) services; publisher and vendor vertical portals; local, regional, and national online catalogs; Web search and metasearch engines; local e-resource registries and digital content databases; campus institutional repository systems; and learning management systems"

Need more than access for digital library work - need federated search

History of digital libraries:

Digital Libraries Initiative (DLI-1), 1994
DLI-2, 1998
University-led projects
Development strongly influenced by evolution of Internet
Search interoperability and federated searching

Federation solutions: aggregated search or broadcast searching against remote resources
-Google, Google Scholar, OAI = aggregated/harvested
-Ex Libris Metalib, Endeavor Encompass, and WebFeat = broadcast search
-can be complementary

Metadata searching vs. full-text searching?

This article was a good, brief introduction to the issues surrounding federated search and why such a mechanism is necessary. I can't imagine how complicated it is to try to design a search that will encompass all of the different resources available online. It seems like it would be impossible to design something that would work with all the different systems that exist, but I also see the need for it in order to provide the best possible in digital library services.

Dewey Meet Turing: Librarians, Computer Scientists, and the Digital Libraries Initiative

DLI led to development of Google, as well as CareMedia and many others

Computer scientists: expected their research to impact daily lives
Librarians: expected grant money and impact on scholarship

Expected to be collaboration between computer scientists and librarians, but World Wide Web got in the way
-variety of media, larger collection, different access methods
-blurred consumers/producers of info
-split up collections over the world and under different owners

Computer scientists embraced changes Web created
Librarians felt threat to their traditional practice

Problems for librarians:

Loss of cohesive "collections"
High prices of journal publishers
Copyright issues
Dead links

Librarians expected more collection development
Computer scientists feel librarians too nitpicky about metadata
-However, core function of librarianship remains
-Notion of collections in reemerging (hubs)
-Opportunities for direct connections between librarians and scholarly authors

This article provided an interesting account of the tensions between librarians and computer scientists involved in the DLI. I can understand how these two professions planned to work together to create digital libraries, but that the Internet changed everything, as it has in so many areas. I can see how computer scientists and librarians have different perspectives and goals, but I also am glad that the author sees hope for the future of these professions working together and also of the practice of collection development.

Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age

Libraries taking more active role in promoting scholarship and scholarly communication

Supporting this strategy:

Lower online storage costs
Open archives metadata harvesting
Free, publicly accessible journal articles

MIT and DSpace institutional repository system

Institutional repository = set of services university offers for management and dissemination of digital materials created by institution and community members
-Preservation
-Organization
-Access/distribution

Contains:

Intellectual works by faculty and students
Documentation of activities of institution
Experimental and observation data

Scholarly publishing = specific example of scholarly communication

Authorship in digital medium
-traditional journal articles or new forms

Institutional repositories can help scholars with system administration activities and content curation
-problem with preservation

Traditional publishing = new supplementary datasets and analysis tools

Institutional repositories can:

enhance access
encourage new forms of scholarly communication
maintain stewardship of data
preserve supplemental info
curate records of institutional activity

Potential dangers:

Institutions could take control instead of scholars
Weighed down with policy
Lack of institutional commitment
Technical problems

Need infrastructure standards in: preservable formats, identifiers, and rights documentation and management

Future developments:
Consortial or cluster institutional repositories
Curatorial and policy control
Federating institutional repositories
Community or public repositories

I like the way that this article outlines the opportunities and responsibilities of an institutional repository. It seems to me that every institution such as a university should have such a repository in order to organize and preserve digital information that could be important in the future. It would be against an institution's mission to lose some of its vital records and/or intellectual work and have to reinvent the wheel all the time or have a limited knowledge of past activities. It will be interesting to see what happens in the future of institutional repositories and if the author of this article is correct.

Sunday, March 18, 2012

Week 9 Lab

URL: http://www.pitt.edu/~tgs11/lab9.html

Week 10 Reading Notes

Introduction to XML (IBM)

XML = Extensible Markup Language, can create own tags, machine can read it

XML based on SGML

Tags = text between brackets
Elements = starting tag, ending tag, everything in between
Attributes = name-value pair inside starting tag

XML can:

simplify data interchange
enable smart code
enable smart searches

3 kinds of XML documents:

Invalid docs = don't follow syntax rules of element or DTD
Valid docs = follows both XML and DTD rules
Well-formed docs = follow XML syntax rules but don't have DTD rules

Need a single root element
Elements can't overlap
End tags required
Elements are case sensitive
Attributes must have quoted values
XML declarations
Also: comments, processing instructions, entities

Use namespaces to specify tags

DTD = document type definitions, specifies basic structure of XML doc
-some elements must appear, must appear in a certain order
-elements must contain text
-use of certain symbols

DTD can:

define which attributes are required
define default values for attributes
list all of valid values for given attribute

XML schemas:
use XML syntax
support datatypes
are extensible
have more expressive power

Programming interfaces:

Document Object Model
Simple API for XML (SAX)
JDOM
Java API for XML Parsing (JAXP)

XML Standards = determined by w3
-XML schema: primer, doc structures, data types
-XSL, XSLT, XPath = formatting standards
-XLink and XPointer = linking and referencing standards

Web services: SOAP, WSDL, UDDI

This was a good overview of XML. I have also learned about XML in other classes, and I can see how it would be very useful and could lead to the goal of the semantic web. I can also see how it could be complicated, though, and that standardization still needs to be clarified. There also seems to be a need to get all organizations from all over the world to agree on these standards, and that can be a difficult compromise to reach.

A survey of XML standards: Part 1

Core XML technologies that are standards

XML
XML 1.0 (2nd ed.) = builds on Unicode
XML 1.1 = first revision
-Recommended intros/tutorials
-References

Catalogs
XML Catalogs = governed by RFC 2396: Uniform Resource Identifiers, RFC 2141: Uniform Resource Names
-entity
-entity catalog
-system identifiers
-URIs
-URNs
-public identifiers
OASIS Open Catalog
-Recommended intros/tutorials

XML Namespaces
Namespaces in XML 1.0
-XHTML
Namespaces in XML 1.1
-Resource Directory Description Language (RDDL)
-RDF
-TAG
-XLink
-Recommended intros/tutorials
-References

XML Base
XML Base
-Recommended intros/tutorials

XInclude
XML Inclusions (XInclude) 1.0
-Recommended intros/tutorials

XML Infoset
XML Information Set
-information items
-Recommended intros/tutorials

Canonical XML (c14n)
Canonical XML Version 1.0
-Exclusive XML Canonicalization Version 1.0

XPath
XML Path Language (XPath) 1.0
-XSLT
-W3C XML schema
-Recommended intros/tutorials

XPointer
XPointer Framework
-xpointer() scheme
-element() scheme
-xmins() scheme
-FIXptr
-Recommended intros/tutorials

XLink
XML Linking Language (XLink) 1.0
-HLink
-simple links
-extended links
-linkbases
-Recommended intros/tutorials
-References

RELAX NG
RELAX NG
-XML schema
RELAX NG Compact Syntax
-Document Schema Definition Languages (DSDL)
-Recommended intros/tutorials
-References

W3C XML schema
XML Schema Part 1: Structures
XML Schema Part 2: Datatypes
-Recommended intros/tutorials
-References

Schematron
Schematron Assertion Language 1.5
-Recommended intros/tutorials
-References

Standards made by:

W3C
International Organization for Standardization (ISO)
Organization for the Advancement of Structured Information Standards (OASIS)
Internet Engineering Taskforce (IETF)
XML community

This article has a lot of information on XML standards, and I think it would be very useful if I needed to get more in-depth with XML and/or its standards. The tutorials and references seem very useful, but I would need to go back to the page to get the names and links of each one. This would be a good reference for the future.

XML Schema Tutorial (w3schools.com)

XML schema = describes structure of XML document
= is XML-based alternative to DTD
= also XML Schema Definition (XSD)

Need to know:

HTML/XHTML
XML and XML Namespaces
Basic understanding of DTD

XML Schema defines:

elements that can appear in a document
attributes that can appear in a document
which elements are child elements
order of child elements
number of child elements
whether an element is empty or can include text
data types for elements and attributes
default and fixed values for elements and attributes

XML Schema is W3C recommendation

If data types supported, it is easier to:

describe allowable document content
validate the correctness of data
work with data from a database
define data facets (restrictions on data)
define data patterns (data formats)
convert data between different data types

XML schemas use XML syntax because: don't need to learn new language, can use XML editor and parser, can manipulate with XML DOM, can transform with XSLT

XML schemas secure data communication
-are extensible
-well-formed is not enough

Well-formed:

it must begin with the XML declaration
it must have one unique root element
start-tags must have matching end-tags
elements are case sensitive
all elements must be closed
all elements must be properly nested
all attribute values must be quoted
entities must be used for special characters

XML documents can have reference to DTD or XML Schema

Note element is complex type, other elements are simple types

< "schema" > element is root of every XML schema
-may contain some attributes
-doc can reference XML schema
-can specify default namespace
-can use schemaLocation attribute

XSD simple elements, attributes, and restrictions/facets
-only text

XSD complex types = contains other elements/attributes
-empty elements have no content
-can contain only elements
-complex text only
-can be mixed text and other
-order, occurrence, and group indicators
-any or anyAttribute
-substitution

Data Types

string
date
numeric
misc.

Schema references

I really like these W3C tutorials, and this one gave me a lot of good information about XML schemas. I would be able to go back and get more deep information about the topic, but these notes will help me remember what is covered in the tutorial and some basic definitions. I think XML schemas could be very useful, and I don't totally understand them still, but it would be a good topic of investigation for the future.

Tuesday, March 6, 2012

Week 8 Lab

URL: http://www.pitt.edu/~tgs11/index.html

Screenshot: http://www.flickr.com/photos/74178035@N03/6960004967/

Monday, March 5, 2012

Week 9 Reading Notes

HTML5 Tutorial - w3schools

HTML5 = new standard for HTML, cooperation between the World Wide Web Consortium (W3C) and the Web Hypertext Application Technology Working Group (WHATWG)

Rules for HTML5:

New features should be based on HTML, CSS, DOM, and JavaScript
Reduce the need for external plugins (like Flash)
Better error handling
More markup to replace scripting
HTML5 should be device independent
The development process should be visible to the public

!DOCTYPE html
html
head
body

New features:

The canvas element for 2D drawing
The video and audio elements for media playback (also source, embed, track)
Support for local storage
New content-specific elements, like article, footer, header, nav, section (and more)
New form controls, like calendar, date, time, email, url, search (datalist, keygen, output)
Removed: acronym, applet, basefont, big, center, dir, font, frame, frameset, noframes, strike, tt, u

Defines a new element which specifies a standard way to embed a video/movie on a web page: the video element
-Certain browsers support
-Also has methods, properties, and events

Defines a new element which specifies a standard way to embed an audio file on a web page: the audio element
-Certain browsers support
-Control attribute adds audio controls, like play, pause, and volume

Drag and drop is part of the standard, and any element can be draggable

Canvas element used to draw graphics, on the fly, on web page (usually JavaScript)
-only a container for graphics
-several methods for drawing paths, boxes, circles, characters, and adding images

SVG=

Stands for Scalable Vector Graphics
Used to define vector-based graphics for the Web
Defines the graphics in XML format
Graphics do NOT lose any quality if they are zoomed or resized
Every element and every attribute in SVG files can be animated
W3C recommendation

SVG advantages:

Images can be created and edited with any text editor
Images can be searched, indexed, scripted, and compressed
Images are scalable
Images can be printed with high quality at any resolution
Images are zoomable (and the image can be zoomed without degradation)

Geolocation API is used to get the geographical position of a user
-position not available unless user approves it
-getCurrentPosition

Web pages can store data locally within the user's browser
-stored in key/value pairs, and a web page can only access data stored by itself

Application cache = a web application is cached, and accessible without an internet connection
Advantages:

Offline browsing - users can use the application when they're offline
Speed - cached resources load faster
Reduced server load - the browser will only download updated/changed resources from the server

Web Worker = executing scripts in an HTML page, the page becomes unresponsive until the script is finished
Server-sent Event = web page automatically gets updates from a server (onopen, onmessage, onerror)

HTML5 Forms, Reference, Tags

It's very interesting to see the updates to HTML in HTML5. It really seems like this new standard takes into account the way that the internet works today and will create more flexibility for HTML developers in the present and future. It does seem overwhelming since I don't feel like I have a grasp of the previous standard yet, but hopefully I can learn what I need to about HTML5.

HTML5 - Wikipedia page

HTML5 = language for structuring and presenting content, originally proposed by Opera Software
=Fifth revision, still in development in March 2012
=response to the observation that the HTML and XHTML in common use on the World Wide Web are a mixture of features introduced by various specifications

Many new features, like video, audio, canvas elements
Designed for multimedia graphical content

APIs and DOM are fundamental part
No longer based on SGML

New APIs:

canvas element for immediate mode 2D drawing. See Canvas 2D API Specification 1.0 specification
Timed media playback
Offline Web Applications
Document editing
Drag and drop
Cross document messaging
Browser history management
Mime Type and protocol handler registration
Microdata
Web Storage, a key-value pair storage framework that provides behaviour similar to Cookies but with larger storage capacity and improved API

XHTML5 = XML serialization of HTML5

HTML5 = flexible in handling incorrect syntax, new error handling

Differences between old and new:

New parsing rules
Inline SVG and MathML
Many new elements
New types of form controls
New attributes
Global attributes
Depracated elements dropped

This Wikipedia article gave a good background on what HTML5 is, although it did repeat some of the previous article from w3schools. Some of the terms are more complicated than I am able to understand, but the differences that exist between the previous standard and this new one are clear and fairly easy to understand. It's interesting to learn about the future of HTML as it's being developed.

XHTML - w3schools

XHTML=

stands for EXtensible HyperText Markup Language
almost identical to HTML 4.01
stricter and cleaner version of HTML
HTML defined as an XML application
W3C Recommendation of January 2000
supported by all major browsers

XML = markup language where documents must be marked up correctly and "well-formed"

Important differences from HTML:

XHTML elements must be properly nested
XHTML elements must always be closed
XHTML elements must be in lowercase
XHTML documents must have one root element

More syntax rules:

Attribute names must be in lower case
Attribute values must be quoted
Attribute minimization is forbidden
The XHTML DTD defines mandatory elements

!DOCTYPE is mandatory, not an XHTML tag; instruction to the web browser about what version of the markup language the page is written in
-then head and body

This article gave me a good introduction to XHTML, although now I feel a bit overwhelmed with all the different markup languages that we've covered in our readings. I know that they all have differences that set them apart from the others, but I think that for someone who doesn't have experience with them, they all start to look the same. I think that maybe if I started using them I would have a better understanding of the differences between them. As I understand it, XHTML is like a combination of XML and HTML.

Saturday, February 25, 2012

Week 7 Lab

http://pitt.edu/~tgs11/index.html

Thursday, February 23, 2012

Week 8 Reading Notes

CSS Tutorial - w3schools.com

CSS = Cascading Style Sheets
CSS: defines how to display HTML elements (HTML not meant for formatting)

Two parts of CSS "rule" - a selector and one or more declarations

h1 {color:blue; font-size:12px;}

h1 = selector

color:blue = declaration

color = property

blue = value

font-size:12px = declaration

font-size = property

12px = value

ID selector: specifies style for single unique element, defined by #

Class selector: specifies style for group of elements, defined by .

Ways to insert CSS:

External style sheet = ideal when style applied to many pages

Internal style sheet = ideal when single document has unique style

Inline style = least useful, mixes content with presentation

Multiple styles = cascade into one (ordered by inline style, internal style sheet, external style sheet, browser default)

Background color = body {background-color:#b0c4de;}

Background image = body {background-image:url('paper.gif');}

Color property specified by:
a HEX value - like "#ff0000"
an RGB value - like "rgb(255,0,0)"
a color name - like "red"

Font properties: serif vs. san-serif, font families (generic family or font family), font style (normal, italic, oblique), font size (absolute or relative)

Four links states:

a:link - a normal, unvisited link

a:visited - a link the user has visited

a:hover - a link when the user mouses over it

a:active - a link the moment it is clicked

Can also style links by text decoration and background color

List options: ordered vs. unordered, shapes or image as line item marker

Table options: table borders, collapse borders, width and height, text alignment, table padding, table color

CSS Box Model = box that wraps around HTML elements, and it consists of: margins, borders, padding, and the actual content

Advanced CSS options: grouping/nesting, dimension, display, positioning, floating, align, pseudo-class, pseudo-element, navigation bar, image gallery, image opacity, image sprites, media types, attribute selectors

Next: Learn JavaScript (dynamic instead of static)

This tutorial was very helpful and will be good to refer back to, the same way that the HTML tutorial from this site is a helpful introduction. A lot of the different options seem complex, but if I were to actually create a product with them, I would probably be less overwhelmed because I would be picking and choosing what elements I want to see in my final product.

CSS Tutorial: Starting with HTML + CSS (W3)

Step One: Write HTML

In Notepad (Windows) or TextEdit (Mac)

Paste or write HTML

Save

Open in browser

Step Two: Add some colors

Start with style sheet imbedded in HTML (later create CSS and HTML in separate files)

Add colors by name or hexadecimal code

Step Three: Add fonts

Set font for body and heading

Try different font names if some don't work

Step Four: Add navigation bar

Menu is ul list at top

Use "padding-left" to move body text

Adjust position of menu and body text

Step Five: Styling links

Add background to items

Add color to links

Specify colors for links visited and not visited

Step Six: Add horizontal line

Add horizontal rule to separate text from signature at bottom

Use "border-top" to add dotted line

Step Seven: Put style sheet in separate file

Separate files so several pages can all point to one style file

Create new empty plain text file

Replace style with something like link rel="stylesheet" href="mystyle.css"

This is a great step-by-step to creating a site with HTML and CSS. I think I will referring to this for future labs and assignments for this class, as well as any time I want to build a website in the future. I like that the page even includes instructions on how to save the files and what applications to use to write the code in.

Chapter 2: CSS

HTML: mark up document structure of elements

CSS: gives creator control over style of elements

Ways to create CSS: Normal text editor, Dedicated tool

rule = statement about one stylistic aspect of one or more elements (incl. selector and declaration)

declaration = part of the rule that sets forth what the effect will be (incl. property and value)

CSS formally described in CSS1 and CSS2 from W3C

Ways to "glue" style sheet to document:

Apply the basic, document-wide style sheet for the document by using the style element

Apply a style sheet to an individual element using the style attribute (inserted inside HTML)

Link an external style sheet to the document using the link element

Import a style sheet using the CSS @import notation

Must use CSS-enhanced browser - however each browser may display differently

Tree structures = elements have parents and children

Inheritance = property values are transferred to descendents (can override)

(Some elements don't inherit, like background property)

Common tasks with CSS: fonts, margins, links

Cascading = style sheets come in a series, designer's style sheet has precendence and then user's and then browser's default

This article repeated some of what the previous articles/tutorials explained about CSS, but this resource did offer more detailed explanations. This will be good to refer back to if I have trouble understanding the general theories behind the different elements of CSS and also how CSS operates as a whole. This resource will also be helpful if while creating my website, the product does not look the way that I meant it to. I can reread some of these explanations about CSS and determine what I did wrong.

Saturday, February 18, 2012

Week 6 Lab

Screenshot: http://www.flickr.com/photos/74178035@N03/6898406179/

Barcode: 2012-1231279

Friday, February 17, 2012

Week 7 Reading Notes

HTML Tutorial - w3schools.com

HTML = Hypertext Markup Language, uses markup tags, governed by World Wide Web Consortium (W3C)

Tags: surrounded by angle brackets, come in pairs (start/opening tag and end/closing tag), ex.

HTML documents = web pages

Headings: h1 to h6 tags
Paragraphs: p
Images: img
Line break: br [empty, not in a pair]
Links: a href = "" [attribute]
Bold: b
Italics: i
Table rows: tr
Table data: td
Unordered list (bullets): ul
Ordered list: ol

Tags can be nested, not case sensitive

Do not use tag

Style HTML with Cascading Style Sheets (CSS)

Colors defined with hexadecimal notation (HEX) or 147 color names

I'd used some html in the past to create simple webpages when I was younger, but it's been a while, so this was a good refresher. It's also interesting to hear about the evolution of HTML and how it is developed and governed.

Webmonkey HTML Cheatsheet Guide

Basic tags: html, head, body

Header tags: title

Text tags: pre, h1, h6, tt, cite, em, strong

Formatting: p, p align, br, blockquote, dl, dt, dd, ol, li, ul, div align

Forms: form, option, select multiple name, select name, textarea name, input type: "checkbox/radio/submit/image"

Graphical elements: img src, hr, hr size, hr width, hr noshade

Links: a href, a href: mailto, a name

This looks like a very helpful cheatsheet that boils down the basic tags you need to know to write HTML. I'm assuming there will be a future assignment that involves creating an HTML document, so I will be sure to remember this as a useful resource.

Beyond HTML: Developing and Re-imagining Library Web Guides in a Content Management System

Content management system designed to manage 30 web-based research guides - new system developed with MySQL and ASP

Georgia State University (GSU) Library

Before: FrontPage maintained by single librarian, with 15 liaison librarians developing web guides, later more liaisons added

Result: Inconsistency, some with web design experience and some not

Content management (CM) = process of collecting, managing, and publishing content

CMS: content is disconnected from design and layout, content can be resource links or text or images or files, content can be reused

Control
Customization and context
Complexity

Steps toward CMS environment: commercial option, open source option, in-house option

GSU Library CMS technology = MySQL database of resource tables, metadata tables, and personnel metadata tables, usable CMS research guide template for students and librarians

Move to CMS = success
Systems can be expanded to other departments (committee websites, intranet)
Some still need further training
Templates now, but in future users will use raw content

While I didn't understand some of the more technical aspects of this paper, I think I understand the basic concept that GSU libraries developed their own CMS for various reasons. Their new system made things more consistent and of more benefit to users. If I understand correctly, this article is meant to show how to standardize the use of HTML among large groups of people, with the template set up for use by liaisons.

Saturday, February 11, 2012

Week 6 Reading Notes

"How Internet Infrastructure Works"

Infrastructure = interconnected networks

The Internet Society oversees policies and protocols

Home computer connected to internet service provider (ISP)
Work computer connected to local area network (LAN)

Internet = network of networks

Point of Presence Pop = company has as place for local users to access network (phone number, line)
Network Access Points (NAPs) = network connected through them

ISPs agree to interconnect so all users can communicate with all others

Routers determine where to send info, make sure it gets to destination
Routers insure info doesn't go where unnecessary (more efficient)

NSFNET (National Science Foundation) = first high-speed backbone (fiber optic trunk line), 1987

Internet Protocol (IP) =

computer language (dotted decimal for humans, binary for computer)
IP Address = four numbers (octets)
2^32 possibilities
separated into classes
two sections (Net and Host/Node)

IP Domain Name System = text names map to IP addresses
Uniform Resource Locator (URL) = contains domain name, used by humans and translated by computer, each must be unique
DNS Server = looks for IP address, caches to be more efficient

Server = provides services to other machines (Web, email, FTP, etc.), static IP address, unlike home computer

Client = used to connect to services

Ports = where services are available

Hypertext Transfer Protocol (HTTP) = protocol for Web

I liked that this article had very accessible language and was easy to understand for a non-techie person. I also liked that it included examples to illustrate the machines and systems it was explaining.

"Dismantling Integrated Library Systems"

Integrated Library System (ILS) = once a useful tool for everyday library management, now out-of-date and unable to be extended

"Interoperability": sought after, but may be mostly a myth

Need to be able to distinguish between different integrated system products
Need to appeal to Internet-savvy users
Need federated searching capabilities, portals, metasearch tools, reference linking software, RFIDs, and digital asset management systems

Ex. of integrated systems: Voyager system and Oracle relational database management system (RDBMS), Millennium system and INNOPAC, Taos, Unicorn

Vendors sell new products and new technologies, but libraries want a system that can adapt
Better systems = higher costs and libraries don't spend much on ILS updates (even open source takes money for development and training)
Libraries turn to web-based or home-grown solutions? not integrated

Future = integration, either maintain large systems and trust vendors OR dismantle and reintegrate

I can definitely see the desire and need to integrate all aspects of library management into one system. I can also see, however, that vendors might not be willing to work towards this goal if they can make more money in other ways. It seems that libraries will just have to make integration a priority, embrace new technologies, and perhaps spend more money to make it happen.

Sergey Brin and Larry Page on Google - TED

There were a few major points that I took away from this video. First, I thought that the model of the earth that showed Google searches being performed at a certain time, as well as the visual of one second of those search queries, really gave me a better idea of the scope of the internet around the world. Google is obviously one of the highest-trafficked sites online, and seeing how it is used and how everything is connected through using Google was kind of mind-boggling. Second, I'd heard before about Google's policy that 20% of the time their employees work on their own projects, and it seems like it's been really successful for them. I really like that policy and think more places should implement it. Finally, I was intrigued by the idea that Google is attempting to create a "smarter search" run by artificial intelligence. I don't know a lot about AI, but I do know that's it's very complicated, so this project could take a long time. I can't help but wonder, though, what an AI search will eventually be like.

A Few Thoughts on the Google Books Library Project

"Google's initiative will not make books obsolete; it will make the information in them more widely available."

Internet has created new expectation - we want to click and find info
Search engines are not perfect but we used them constantly
Need to be a professional to research at a library
If not online, doesn't exist
If only physical, then obsolete
Ideas are essential, not paper books

If books not online = don't exist, we need old books online so human knowledge doesn't include only from after digital revolution

This article is a very positive look at the goal of the Google Books Project. I agree with the author that, eventually, information that is only available in hardcopy is basically nonexistent for the majority of information seekers. Although this article does briefly mention the challenges that face this project, I do think it comes across as somewhat idealistic. I do think that this project needs to be done, but I don't know if, practically speaking, it is feasible in the near future.

Friday, February 10, 2012

Week 5 Lab

Task 1

Screenshot: http://www.flickr.com/photos/74178035@N03/6864300559/

Task 2

URL: http://www.pitt.edu/~tgs11/

Saturday, February 4, 2012

Week 5 Reading Notes

Wikipedia article - Local Area Network

Local area network (LAN) = computer network that connects in limited area (home, school, lab), with higher rates of data transfer, small geographic area, no need for leased tele lines

Common current technology: ethernet over twisted pair cabling or wi-fi
Developed in 1970s
Uses coaxial cables, especially twisted pair (shielded and unshielded), structured cabling, fiber-optic cabling
Switched ethernet is most common Data Link Layer (at least one switch connected to internet)
Internet Protocol (TCP/IP) is common
Larger LANs use spanning tree protocol
Alternatives are metropolitan area network (MAN) or wide area network (WAN)

I had heard of people having "LAN parties" and figured that was a configuration of devices connected to each other, but now I see the appeal of it. If the data transfer rate is high and the participants can keep their activities contained to their group, then that does sound like a good idea.

Wikipedia article - Computer Network

Computer network = collection of hardware/computers connected by communication channels that share resources and information, defined by ability to send/receive data

Public switched telephone network (PSTN) is computer-controlled

Computer network properties:

facilitate communications
permit sharing of files/data/info
share network and computing resources
may be insecure
may interfere with other tech
may be difficult to set up

Wired technologies: twisted pair, coaxial cables, ITU-T G.hn, optical fiber cable

Wireless technologies: terrestrial microwave, communications satellites, cellular and PCS systems, wireless LANs, infrared communication, global area network (GAN)

Ethernet: connectionless protocols used in LANs
Internet Protocol Suite: TCP/IP - defines the addressing, identification, and routing specification
SONET/SDH: "standardized multiplexing protocols that transfer multiple digital bit streams over optical fiber using lasers"
Asynchronous Transfer Mode: switching technique that encodes data into small, fixed-sized cells
Network programming: computer programs that communicate across a network

Personal area network (PAN): network close to one person
Local area network (LAN): network in a limited geographic area
Home network: residential LAN
Storage area network (SAN): provides access to consolidated, block level data storage
Campus network: made of interconnection of LANs in a limited geographic area
Backbone network: provides path for exchange of info between LANs or subnetworks
Metropolitan area network (MAN): spans a city or large campus
Wide area network (WAN): covers a large geographic area
Enterprise private network: interconnects various company sites
Virtual private network (VPN): "some of links between nodes are carried by open connections or virtual circuits in some larger network"
Internetwork: connection of multiple networks with routers

Common layouts: bus network, star network, ring network, mesh network, fully connected network

Basic hardware components:

Network interface cards
Repeaters and hubs
Bridges
Switches
Routers
Firewalls

Network performance = service quality of tele product from customer perspective

Network security = policies to prevent and monitor unauthorized behaviors relating to network

Network resilience = "ability to provide and maintain acceptable level of service in the face of faults and challenges to normal operation"

I think I understand the basic idea of computer networks, but there's still a lot I don't understand. I know there are different types of networks for different purposes, but I dont think I understand how the networks work exactly or how to set one up exactly.

Management of RFID in Libraries

RFID = Radio Frequency Identifier, computer chip and antenna printed on paper, barcode read with electromagnetic field

Many uses

Benefits:

amount of info carried
range in which it can be read
frequency of radio waves
small size
low cost
can read whole shelf of books without picking up individually
checkout can read stack of books at once
increase efficiency of circulation, inventory
facilitates security, can read if books checked out
performs many different functions
greater potential to gather statistics
possibility to sort items
more efficiency with self-checkout

Downsides:

privacy issues
not highly secure, can be blocked by certain materials, can be removed
cannot help with reshelving part of process
less human interaction with self-checkout
may not work with less sturdy or thinner or odd-shaped items
technology being developed not geared towards library use

Need to consider ROI, user satisfaction

At the library I volunteered at in Minnesota, RFID tags were implemented, and I saw the library grow accustomed to using them. They were good for sorting items as they were returned to the library and for checking in multiple books at a time. I think they might also have helped with security measures, although the alarms seemed to be set off frequently. I didn't notice any problem checking out thin children's books at all.

Thursday, February 2, 2012

Week 4 Lab

TASK 1

Journal impact factor for "ANNU REV INFORM SCI" for the year 2007:

1.963

Screenshot:

SQL command:

SELECT jcr_year, j_abbr, j_if FROM isi_jcr_report_isls i

WHERE j_abbr="ANNU REV INFORM SCI"

HAVING jcr_year=2007;

TASK 2

Journals with impact factor greater than 1 in year 2008:

ANNU REV INFORM SCI
GOV INFORM Q
INFORM MANAGE-AMSTER
INFORM PROCESS MANAG
INFORM SOC
INFORM SYST J
INFORM SYST RES
INT J GEOGR INF SCI
INT J INFORM MANAGE
J AM MED INFORM ASSN
J AM SOC INF SCI TEC
J ASSOC INF SYST
J COMPUT-MEDIAT COMM
J DOC
J GLOB INF MANAG
J HEALTH COMMUN
J INF SCI
J INF TECHNOL
J INFORMETR
J MANAGE INFORM SYST
J MED LIBR ASSOC
LIBR INFORM SCI RES
MIS QUART
ONLINE INFORM REV
PORTAL-LIBR ACAD
SCIENTOMETRICS
TELECOMMUN POLICY

Screenshot:

SQL command:

SELECT jcr_year, j_abbr, j_if FROM isi_jcr_report_isls i

WHERE jcr_year=2008

HAVING j_if>1;

Monday, January 30, 2012

Week 3 Lab - Introducing Zotero

http://www.citeulike.org/user/tscherping/library

Wednesday, January 25, 2012

Week 4 Reading Notes

Wikipedia article - Database

Database = "organized collection of data for one or more purposes," usually digital
= "organized to model relevant aspects of reality"
= data and data structures, NOT database management system (DBMS)

DBMS = complex software system, meets usage requirements
DBMSs: Oracle, IBM DB2, Microsoft SQL Server, Postgre SQL, MySQL, SQLite
DBMS standards: SQL, ODBC

Database contents can be: bibliographic, document-text, statistical, or multimedia objects
Database application areas include: accounting, music, compositions, movies, banking, manufacturing, and insurance

History: 1st gen. = navigational (hierarchical and Codasyl models)
2nd gen. = relational (in SQL language) and entity-relationship model
3rd gen. = post-relational or NoSQL (Object database and XML database)

People involved: DBMS developers, application developers and database administrators, and application's end-users

Database types:

Active = "event-driven architecture which can respond to conditions both inside and outside the database"
Cloud = database and most DBMS are "in the cloud"
Data warehouse = archive data from operational databases and outside sources (retrieving/analyzing/mining data, transforming/loading/managing data)
Distributed = "allows distinct DBMS instances to cooperate"
Document-oriented = stores, manages, edits, and retrieves documents
Embedded = tightly integrated with application software
End-user = developed by end-users (documents, spreadsheets, presentations)
Federated (multi-database) = integrated database comprised of several distinct databases
Graph = NoSQL, uses graph structures to represent and store info.
Hypermedia = World Wide Web acts as a database
In-memory = resides primarily in main memory
Knowledge base = specifically for knowledge management
Operational = stores data about operations of organization
Parallel = improves performance through parallelization
(Also: Real-time, Spatial, Temporal, and Unstructured-data database)

Functional requirements: defining data structure, manipulating data, protecting data, describing processes
Operational requirements: availability, performance, isolation between users, recovery, backup, data independence

DBMS components: external interfaces, language engines, query optimizers, database engine, storage engine, transaction engine, DBMS management and operation component

There was a lot of this wikipedia article that I didn't understand, but I tried to take notes on the parts that seemed important to me and made some sort of sense. Even if I don't understand the specific technical details, I did gain a better understand of exactly how we define a database and the different things they are used for. I also now understand the difference between a database and a DBMS.

Wikipedia article - Entity-relationship model

ER model = "abstract and conceptual representation of data"
Conceptual schema or semantic data model, top-down, creates ER diagrams
Model defines interaction between entities, relationships, and attributes

Relationships: expressed as a single verb implying direction or as a noun
Roles: define who does what in relationship
Cardinalities: ???

Semantic modeling of ER "adopts the more natural view that the real world consists of entities and relationships"

Diagramming conventions

Rectangles = entities
Diamonds = relationships
Line = connects entities to the relationships they participate in
Double line = participation constraint, totality, or surjectivity (all entities in at least one relationship in set)
Arrow = key constraint, injectivity (each entity in at most one relationship in set)
Thick line = bijectivity (each entity in exactly one relationship in set)
Underlined name of attribute = attribute is key (two different entities or relationships always have different values for attribute)

Alternative = Crow's Foot notation

Limitations of ER model:

only a relational structure, assumes info. can be represented in relations
cannot handle changes to information easily
difficulty in "integrating pre-existing information sources that already define their own data representations in detail"

I'm confused about what cardinalities are, since the article didn't include a definition. I'm guessing it has something to do with cardinal directions because previously the article was talking about the direction of relationship between entities. I think I basically understand the ER model and would be able to point to the different components of a diagram. I don't know if I can cannot this abstract representation with how the database functions, however.

3 Normal Forms Database Tutorial

Database normalization process = puts data in state that will make it usable to answer questions (can be used to keep track of a stack of invoices)
3 normal forms:
NF1 = No repeating elements or groups of elements.
NF2 = No partial dependencies on a concatenated key.
NF3 = No dependencies on non-key attributes.

NF1: No atomicity (Row cannot contain repeating groups of similar data), Need each row to have unique identifier (Primary Key)
Primary Key with two or more columns = concatenated primary key
NF2: "for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence"
If fails NF2, take out half of concatenated primary key and make own table
If make more concatenated keys, test for NF2 again
NF3: If column relies on non-key attribute, create foreign key (column that points to the primary key in another table)

I have to say that I'm having trouble wrapping my head around this process of data normalization. I think that maybe going through it myself in a hands-on way would be a big help. Otherwise I only barely understand the basic steps of the process.