LIS 2600 Course Blog
Saturday, April 14, 2012
Week 14 Reading Notes
No Place to Hide website
"Where the data revolution meets the needs of national security, there is no place to hide."
Ways we can be tracked electronically:
- Credit card records
- Surveillance cameras
- WiFi
- Subway/MetroCards
- Satellite navigation systems in cars
- Card swipes on copiers/vending machines/ATMs
- Clocking in at work
- E-Z Passes at toll booths
- Internet browsing/shopping/email
- TiVo
- ID/face/iris/fingerprint scan to access building
- Phone calls
RFID = radio frequency identification
-getting cheaper and smaller, can hold more info
Monitoring by companies, law enforcement, or private investigators
Companies using RFID: car manufacturers, gas stations, Walmart, Defense Department, FDA, casinos, jails, schools
RFIDs can:
- increase efficiency
- fight credit card fraud, other security issues
- improve customer relationship management, marketing
Controversy surrounding RFIDs and the info they gather
Tagging could eventually extend to...everything?
-No more anonymous transactions
"Why worry if you have nothing to hide?"
"We have nothing to worry about, until they make a mistake."
Trading privacy for security
Verint surveillance systems
-including government wiretapping
Goal of some companies to get people used to surveillance
This chapter was really eye-opening and kind of scaring to think about. I can understand the desire to record information about people, especially for reasons of security, but I also think that people do have a right to privacy. While to some degree, it's true that if you have nothing to hide you have nothing to worry about, what happens when the surveillance recordings make a mistake? Or what if the government takes a turn for the Orwellian and have cut off any way for citizens to resist? That thought does scare me, that we put so much power into the hands of people and organizations that might not use it wisely and responsibly. There should still be a way to opt out - someone should be working on tech that will increase privacy, not decrease it.
Total "Terrorism" Information Awareness (TIA)
EPIC = Electronic Privacy Information Center
Data mining in federal agencies
Defense Advanced Research Projects Agency (DARPA) making tracking system TIA
-designed to give law enforcement private data without warrant
-captures "information signatures"
TIA = grand database that includes:
- financial records
- medical records
- communication records
- travel records
- intelligence data
Identifies and tracks individuals across multiple info sources
TIA = no longer being funded, agency shut down
-could still be similar government projects in the future
This TIA project sounds pretty creepy, and while I'm glad it's no longer being funded, I do agree that the government won't necessarily abandon the idea of recording all information on people if they think it will improve security. Like I said before, though I do believe that people have a right to privacy. Giving the government too much ability to track its citizens could just lead to an abuse of power where the government has too much control.
MyTurn: Protecting Privacy Rights in Libraries
Laws protecting privacy of library records (in 40 states)
-can only be shared with judicial order or warrant
VT law says parents get library records of children under 16
Children can have various needs to keep info from parents
-child abuse
-drug abuse
-health questions parents won't answer
Police officers in a particular case tried to take computers without a warrant
-Brooke Bennett investigation
-librarians want to help but won't break legally-binding policy
Library supports:
- right to privacy
- right to open inquiry
- freedom of speech
- freedom to receive information
This is one of those issues that gets me so upset because so many people are so ignorant about the values of the library and the way they work. The woman who wrote the letter that this blog post is responding to thought that library records should be able to be seized by the police for any reason. The fact that the library is standing its ground on issues of privacy and confidentiality gives me hope after reading the first two articles this week. The library is still one place where a person can trust that his or her actions are private, are not being monitored, and will not be used against him/her.
Saturday, April 7, 2012
Week 13 Reading Notes
Content Nation: Surviving and Thriving as Social Media Changes Our Work, Our Lives, and Our Future
Social media: highly scalable and accessible communications tech, helps individuals to easily publish and influence others
-Similar: Web 2.0, user-generated content, social networking
- Scalable and accessible tech
- Individual people communicate with other groups of individuals
- Enables influence
Types of social media:
- Personal publishing - blogs - individuals tell stories to others
- Collaborate publishing - wikis - multiple people collaborate on common document for themselves and/or others
- Social-network publishing - Facebook and LinkedIn - people find other people
- Feedback and discussions - Amazon - share info and opinions on a topic with others
- Aggregation and filtering - YouTube and Flickr - aggregate collections of content from various sources
- Widgets and mashups - add value to social media by creating complementary content
- Personal markets and marketing - Craigslist and eBay - find people with goods and services, create market
Social media does not eliminate human nature, just gives a new way to express itself
Goal of social media = influence over others
Conflict can arise, however:
*"Order can come from people who collaborate to enforce mutually accepted standards of behavior."
-Each site has own standards
-Need to follow standards in order to influence opinions of others
Content:
- The "stuff"
- Requires an audience
- Value is contextual
- = "info and experiences in contexts that provide value to audiences"
- Comes from many different sources
- Social media makes distribution easier
- Exponentially more every day
Aggregation in social media: different from traditional model, distribution not a competitive barrier
-aggregation can now be highly focused (New Aggregation)
-content not indexed but can be reinvented
Brands, Affinity, Endorsements:
-value through marketable relationships as well as marketable content
-Ex. blogs that become popular, build reputation = influence over others
-affinity = more important because more options
Timing: part of context, different formats have different value
-long tail = consistent popularity in small groups
-long snout = popularity in some groups while still in development (social media)
Social Media Secrets
- Ability to scale efforts independently = important
- Understanding people > understanding technology
- Law of the campfire, not law of the jungle
- Valuable to create new contexts for content
- Not mass production, but mass contextualization
- Direct contact with others who value your insights
- Valuable to people who want to be ahead of other people
This article was a fascinating account of how to effectively use social media. I'd never thought of influencing others as the main goal of social media, but it does make sense. I think there are a lot of valuable insights here for someone who wants to publish through social media or for a corporation that wants to understand how to use social media. Corporations, which are used traditional models of publishing and advertising need this information the most because social media has its own model that is in some ways radically different.
Using a Wiki to Manage a Library Instruction Program
Wiki can:
- create better info sharing
- facilitate collaboration in creation of resources
- efficiently divide workload
Many websites where you can set up wikis, invite by email
Wikis have been used by librarians to: "manage public services information, collaborate on and keep track of reference questions, and assess databases"
-stores info in central location
Used in library instruction program at ETSU
-teaches how to use website, find journal articles, critical thinking about info sources, evaluating websites
Learned more about:
- specifics of the classes' needs
- unforeseen directions of assignment
- preferences of professor
- housekeeping issues
- broken web links
Second use: centralized resource collaboration tool
-information handouts, subject resource guides, how-to instructions
This is good background on how wikis could be useful to librarians, but I think that their uses are probably endless and there are more than is covered in this article. Mostly I use wikis for working on group projects when it is difficult to keep getting together to work. Really any project where a group of people has to come up with one final deliverable is a good time to use a wiki. I think people will be using them more and more, in and out of libraries. The other benefit to them is that they require almost no training at all to use.
Creating the Academic Library Folksonomy
Social tagging enables: quickly find disparate info, store bookmarks and access them anywhere, see what others are reading, find unexpected resources
Social tagging = create bookmarks/tags for websites and save them online, including subject keywords (such as del.icio.us)
Folksonomy = taxonomy created by ordinary folks, users create own controlled vocab
Library: has catalog, but can't catalog the internet
-Stanford uses content management software Drupal
Connotea and CiteULike intended for academics, pull bib info
Subject specialists can begin social tagging process
-can use Librarians' Internet Index or C&RL News Internet Resources columns
Risks of social tagging:
Creating the Academic Library Folksonomy
Social tagging enables: quickly find disparate info, store bookmarks and access them anywhere, see what others are reading, find unexpected resources
Social tagging = create bookmarks/tags for websites and save them online, including subject keywords (such as del.icio.us)
Folksonomy = taxonomy created by ordinary folks, users create own controlled vocab
Library: has catalog, but can't catalog the internet
- use tagging to guide users to helpful resources online
- can tag articles in licensed databases, too
- tagging can bring to light resources that are harder to search for
-Stanford uses content management software Drupal
Connotea and CiteULike intended for academics, pull bib info
Subject specialists can begin social tagging process
-can use Librarians' Internet Index or C&RL News Internet Resources columns
Risks of social tagging:
- Spagging or spam tagging
- Users with bad intentions tagging inappropriate content
- No specific controlled vocabulary among users
This was a good, brief introduction into the ways that libraries can use social tagging at universities. I think that this could be very useful to academic libraries, and I'm curious if, since it was written, there are more universities that utilize something like this. Obviously social tagging has its downsides, but it works particularly well with internet sites, which librarians can't feasibly catalog anyway. Just as there are a multitude of ways that libraries can use wikis in their program, so there are also many ways that tagging could be useful, especially if you creatively set up the tagging system, such as giving subject specialists administrative control.
Jimmy Wales on the Birth of Wikipedia - TED
Radical encyclopedias: Brittanica vs. Wikipedia
Goal = give everyone in the world access to the sum of human knowledge (free encyclopedia)
- Staffed by volunteers
- Wiki software
- Freely licensed
- Funded by public
- Many languages, only 1/3 traffic to English
- One employee
- Cost: $5000/month
More accurate than more traditional encyclopedias
Neutral point-of-view
-vandalism is bigger problem than controversy
Policies and software maintain quality
-despite allowing edits by anonymous users (minority of edits)
-Request for Deletion page
-volunteer administrators
Administration: part consensus, democracy, aristocracy, monarchy (Jimmy Wales can change the rules), NOT anarchists
I already knew a bit about Wikipedia, but it was interesting to hear about it from the founder himself. Since this video is from 2005, I wonder how much of the information has changed, besides things like the number of pages or pageviews or things like that. I think that, in general, the world is better off with a source like Wikipedia than if it didn't exist, but I think that people need to understand the best way to use it, when to use it, and what parts are the most trustworthy. I think this is true of any information source, though.
Thursday, March 29, 2012
Week 12 Reading Notes
Web Search Engines: Part 1
Indexing Web done by Microsoft, Google, Yahoo
- Reject low-value automated content
- Ignore Web-accessible data
- No access to restricted content
Large search engines have many data centers around the world (clusters of commodity PCs, servers)
- crawling
- indexing
- query processing
- snippet generation
- link-graph computations
- result caching
- insertion of advertising content
400 terabytes of info to crawl
Crawling mechanism = queue of URLs, beginning with "seed"
Issues in crawling:
- Speed - use of internal parallelism
- Politeness - not bombing requests on one server
- Excluded content - communicate with robots.txt file
- Duplicate content - identify identical content with different URLs
- Continuous crawling - priority queue based on what is current and changing
- Spam rejection - manual AND automated analysis
Crawlers are highly complex and need to adapt
Web Search Engines: Part 2
Indexing algorithms use inverted files (concatenation of posting lists for each distinct term)
-Scans text for indexable terms and gives number
-Inverts by sorting into term number order
Issues with indexing:
- Scaling up - to be efficient
- Term lookup - all languages plus new terms
- Compression - takes less storage
- Phrases - precompute posting lists or create sublists
- Anchor text - link text provides info on destination
- Link popularity score - derived from frequency of incoming links
- Query-independent score - based on link popularity, URL brevity, spam score, frequency of clicks
Average query length = 2.3 words
-Need more than simple-query processor
Ways to speed things up:
- Skipping - irrelevant postings
- Early termination - once postings left are of little value
- Clever assignment of document numbers - decreasing query-independent score
- Caching - reduces cost of answering queries
These two articles provided an interesting view of how search engines work. I didn't realize that they actually index the pages that they crawl, but it makes sense. It's funny to think about the fact that computer scientists used to think this wouldn't be possible, back when there was only a fraction of the information on the Internet that we have today. I can't believe the amount of information that a search engine has to deal with, and while they may not be perfect yet, I can see how these techniques and algorithms make things fast and efficient and fairly accurate.
White Paper: The Deep Web: Surfacing Hidden Value
Deep Web = buried too far down (on dynamically generated sites) for standard search engines to find it
-need to be static and linked to other sites to be found
Deep Web content = in searchable databases, only produces results in response to a search
-7,500 terabytes of info
BrightPlanet = makes dozens of direct queries simultaneously with multi-thread technology
- Search engines: either author submits his/her site or engine "crawls" docs by moving between hyperlinks
- Google: crawls and indexes based on popularity of sites
- --If search engines depend on linkages, they'll never get to the deep Web
Factors in deep Web development:
- Database technology
- Commercialization through directories and e-commerce
- Dynamic serving of web pages
BrightPlanet = directed query engine, gets at deep Web
Deep Web = 10x greater amount of content than rest of Web
Deep Web site qualifications: 43,348 URLs
Database types in deep Web:
- Topic databases
- Internal site
- Publications
- Shopping/auction
- Classifieds
- Portals
- Library
- Yellow and white pages
- Calculators
- Jobs
- Message or chat
- General search
Deep Web content:
- Deep Web docs are27% smaller than surface Web docs
- Deep Web sites are much larger than surface Web sites
- Deep Web sites have about 50% more traffic than surface Web sites
- 97.4% of deep Web is publicly available
- Deep Web may be higher quality than surface Web
- Deep Web growing faster than surface Web
Needs to be a way to search info in deep Web = BrightPlanet?
This article was very interesting because I had previously heard a bit about the deep Web, but what I had read previously made the deep Web sound like a sinister place where most of the content was illegal and strange. Now I understand that the deep Web is mostly databases related to different organizations that are "buried" because there are no links that connect them to the rest of the Web, so conventional search engines cannot find them in the usual way. I hope that we keep developing our ability to access and search the deep Web because it sounds like there is a lot of information there that could be useful to people.
Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting
Open Archives Protocol for Metadata Harvesting = OAI-PMH
- Federates access to diverse e-print archives through metadata harvesting and aggregation
- Released in 2001, used by content management systems
- Mission: "develop and promote interoperability standards that aim to facilitate the efficient dissemination of content"
- Uses XML, HTTP, and Dublin Core standards
- Data providers or repositories provide metadata
- Service providers or harvesters harvest the metadata
- Can provide access to invisible/deep Web
Notable community- or domain-specific services:
Open Language Archive Community
Sheet Music Consortium
National Science Digital Library
Comprehensive, searchable registry of OAI repositories
-more informative, searchable, and complete than in the past
-machine processing option
Future work of OAI registry:
- Enhance descriptions of repositories for search
- Provide automated maintenance of registry
- Delegate creation/maintenance of collections
- Improve view of search results
Extensible Repository Resource Locators = ERRoLs, "cool URLs," lead to content and services relating to an OAI repository
-simple mechanism to access OAI data
Challenges for OAI community:
- Metadata variation
- Metadata formats
- OAI data provider implementation practices
- Communication issues
Future directions: best practices, static repository gateway, Mod_oai Project, OAI-rights, controlled vocabularies and OAI, SRW/U-to-OAI gateway to the ERRoL service
If I understood this article correctly, this seems like another project attempting to get to the useful data that is in the deep Web, just like the BrightPlanet mentioned in the last article. This sounds like it is a open collaboration, however, between the people who have the data and the people who want to access it or give access to others. This seems like a good strategy and will be useful to people who want to use the deep Web data that is currently unavailable. There's still a lot of this I don't understand, but I hope that I'm generally correct in the overall idea that this article is presenting.
Lab 11
~virtual reference "digital libraries"
[also specified articles published between 2008 and 2012]
Google Scholar screenshot:

Web of Knowledge query:
Topic=(virtual reference) OR Topic=(digital libraries) AND Year Published=(2008-2012)
Web of Knowledge screenshot:

Wednesday, March 28, 2012
Week 11 Reading Notes
Digital Libraries: Challenges and Influential Work
Current info environment includes: "full-text repositories maintained by commercial and professional society publishers; preprint servers and Open Archive Initiative (OAI) provider sites; specialized Abstracting and Indexing (A & I) services; publisher and vendor vertical portals; local, regional, and national online catalogs; Web search and metasearch engines; local e-resource registries and digital content databases; campus institutional repository systems; and learning management systems"
Need more than access for digital library work - need federated search
History of digital libraries:
-Google, Google Scholar, OAI = aggregated/harvested
-Ex Libris Metalib, Endeavor Encompass, and WebFeat = broadcast search
-can be complementary
Metadata searching vs. full-text searching?
This article was a good, brief introduction to the issues surrounding federated search and why such a mechanism is necessary. I can't imagine how complicated it is to try to design a search that will encompass all of the different resources available online. It seems like it would be impossible to design something that would work with all the different systems that exist, but I also see the need for it in order to provide the best possible in digital library services.
Dewey Meet Turing: Librarians, Computer Scientists, and the Digital Libraries Initiative
DLI led to development of Google, as well as CareMedia and many others
Computer scientists: expected their research to impact daily lives
Librarians: expected grant money and impact on scholarship
Expected to be collaboration between computer scientists and librarians, but World Wide Web got in the way
-variety of media, larger collection, different access methods
-blurred consumers/producers of info
-split up collections over the world and under different owners
Computer scientists embraced changes Web created
Librarians felt threat to their traditional practice
Problems for librarians:
Computer scientists feel librarians too nitpicky about metadata
-However, core function of librarianship remains
-Notion of collections in reemerging (hubs)
-Opportunities for direct connections between librarians and scholarly authors
This article provided an interesting account of the tensions between librarians and computer scientists involved in the DLI. I can understand how these two professions planned to work together to create digital libraries, but that the Internet changed everything, as it has in so many areas. I can see how computer scientists and librarians have different perspectives and goals, but I also am glad that the author sees hope for the future of these professions working together and also of the practice of collection development.
Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age
Libraries taking more active role in promoting scholarship and scholarly communication
Supporting this strategy:
Institutional repository = set of services university offers for management and dissemination of digital materials created by institution and community members
-Preservation
-Organization
-Access/distribution
Contains:
Authorship in digital medium
-traditional journal articles or new forms
Institutional repositories can help scholars with system administration activities and content curation
-problem with preservation
Traditional publishing = new supplementary datasets and analysis tools
Institutional repositories can:
Future developments:
Consortial or cluster institutional repositories
Curatorial and policy control
Federating institutional repositories
Community or public repositories
I like the way that this article outlines the opportunities and responsibilities of an institutional repository. It seems to me that every institution such as a university should have such a repository in order to organize and preserve digital information that could be important in the future. It would be against an institution's mission to lose some of its vital records and/or intellectual work and have to reinvent the wheel all the time or have a limited knowledge of past activities. It will be interesting to see what happens in the future of institutional repositories and if the author of this article is correct.
Current info environment includes: "full-text repositories maintained by commercial and professional society publishers; preprint servers and Open Archive Initiative (OAI) provider sites; specialized Abstracting and Indexing (A & I) services; publisher and vendor vertical portals; local, regional, and national online catalogs; Web search and metasearch engines; local e-resource registries and digital content databases; campus institutional repository systems; and learning management systems"
Need more than access for digital library work - need federated search
History of digital libraries:
- Digital Libraries Initiative (DLI-1), 1994
- DLI-2, 1998
- University-led projects
- Development strongly influenced by evolution of Internet
- Search interoperability and federated searching
-Google, Google Scholar, OAI = aggregated/harvested
-Ex Libris Metalib, Endeavor Encompass, and WebFeat = broadcast search
-can be complementary
Metadata searching vs. full-text searching?
This article was a good, brief introduction to the issues surrounding federated search and why such a mechanism is necessary. I can't imagine how complicated it is to try to design a search that will encompass all of the different resources available online. It seems like it would be impossible to design something that would work with all the different systems that exist, but I also see the need for it in order to provide the best possible in digital library services.
Dewey Meet Turing: Librarians, Computer Scientists, and the Digital Libraries Initiative
DLI led to development of Google, as well as CareMedia and many others
Computer scientists: expected their research to impact daily lives
Librarians: expected grant money and impact on scholarship
Expected to be collaboration between computer scientists and librarians, but World Wide Web got in the way
-variety of media, larger collection, different access methods
-blurred consumers/producers of info
-split up collections over the world and under different owners
Computer scientists embraced changes Web created
Librarians felt threat to their traditional practice
Problems for librarians:
- Loss of cohesive "collections"
- High prices of journal publishers
- Copyright issues
- Dead links
Computer scientists feel librarians too nitpicky about metadata
-However, core function of librarianship remains
-Notion of collections in reemerging (hubs)
-Opportunities for direct connections between librarians and scholarly authors
This article provided an interesting account of the tensions between librarians and computer scientists involved in the DLI. I can understand how these two professions planned to work together to create digital libraries, but that the Internet changed everything, as it has in so many areas. I can see how computer scientists and librarians have different perspectives and goals, but I also am glad that the author sees hope for the future of these professions working together and also of the practice of collection development.
Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age
Libraries taking more active role in promoting scholarship and scholarly communication
Supporting this strategy:
- Lower online storage costs
- Open archives metadata harvesting
- Free, publicly accessible journal articles
Institutional repository = set of services university offers for management and dissemination of digital materials created by institution and community members
-Preservation
-Organization
-Access/distribution
Contains:
- Intellectual works by faculty and students
- Documentation of activities of institution
- Experimental and observation data
Authorship in digital medium
-traditional journal articles or new forms
Institutional repositories can help scholars with system administration activities and content curation
-problem with preservation
Traditional publishing = new supplementary datasets and analysis tools
Institutional repositories can:
- enhance access
- encourage new forms of scholarly communication
- maintain stewardship of data
- preserve supplemental info
- curate records of institutional activity
- Institutions could take control instead of scholars
- Weighed down with policy
- Lack of institutional commitment
- Technical problems
Future developments:
Consortial or cluster institutional repositories
Curatorial and policy control
Federating institutional repositories
Community or public repositories
I like the way that this article outlines the opportunities and responsibilities of an institutional repository. It seems to me that every institution such as a university should have such a repository in order to organize and preserve digital information that could be important in the future. It would be against an institution's mission to lose some of its vital records and/or intellectual work and have to reinvent the wheel all the time or have a limited knowledge of past activities. It will be interesting to see what happens in the future of institutional repositories and if the author of this article is correct.
Sunday, March 18, 2012
Subscribe to:
Comments (Atom)